# Architecture Guidelines The codebase should be organized around a few clear layers. ## 1. Core domain logic Pure Python logic for: - identifying papers - computing paths - importing PDFs - updating metadata - converting PDFs to Markdown - rendering summaries - rebuilding the index This layer should be testable without the CLI. ## 2. CLI layer Thin wrappers around the core domain logic. The CLI should: - parse arguments - call core functions - format output - handle exit codes The CLI should not contain deep business logic. ## 3. Optional integrations External systems should live in integration modules, for example: - MinerU wrapper - filesystem watch integration - ripgrep integration - LLM provider integration Keep these adapters isolated. ## 4. Optional AI layer The AI summarization layer should be behind a stable abstraction. For example: - load prompt template - load paper markdown - load optional profile / vocabulary - call provider - validate structured output - write `summary.json` - render `summary.md` Avoid leaking provider-specific behavior into unrelated modules. ## Component boundaries Avoid hidden coupling: - `search` should not depend on LLM code - `import` should not require summarization - `reindex` should not assume a specific converter - `render-summary` should not require calling AI again Prefer explicit data flow: - `import` creates or updates metadata - `convert` creates `paper.md` - `summarize` creates `summary.json` - `render-summary` creates `summary.md` - `reindex` rebuilds SQLite from files