Files

1.5 KiB

Architecture Guidelines

The codebase should be organized around a few clear layers.

1. Core domain logic

Pure Python logic for:

  • identifying papers
  • computing paths
  • importing PDFs
  • updating metadata
  • converting PDFs to Markdown
  • rendering summaries
  • rebuilding the index

This layer should be testable without the CLI.

2. CLI layer

Thin wrappers around the core domain logic.

The CLI should:

  • parse arguments
  • call core functions
  • format output
  • handle exit codes

The CLI should not contain deep business logic.

3. Optional integrations

External systems should live in integration modules, for example:

  • MinerU wrapper
  • filesystem watch integration
  • ripgrep integration
  • LLM provider integration

Keep these adapters isolated.

4. Optional AI layer

The AI summarization layer should be behind a stable abstraction.

For example:

  • load prompt template
  • load paper markdown
  • load optional profile / vocabulary
  • call provider
  • validate structured output
  • write summary.json
  • render summary.md

Avoid leaking provider-specific behavior into unrelated modules.

Component boundaries

Avoid hidden coupling:

  • search should not depend on LLM code
  • import should not require summarization
  • reindex should not assume a specific converter
  • render-summary should not require calling AI again

Prefer explicit data flow:

  • import creates or updates metadata
  • convert creates paper.md
  • summarize creates summary.json
  • render-summary creates summary.md
  • reindex rebuilds SQLite from files