Files
paperlib/dev-docs/architecture.md
T

74 lines
1.5 KiB
Markdown

# Architecture Guidelines
The codebase should be organized around a few clear layers.
## 1. Core domain logic
Pure Python logic for:
- identifying papers
- computing paths
- importing PDFs
- updating metadata
- converting PDFs to Markdown
- rendering summaries
- rebuilding the index
This layer should be testable without the CLI.
## 2. CLI layer
Thin wrappers around the core domain logic.
The CLI should:
- parse arguments
- call core functions
- format output
- handle exit codes
The CLI should not contain deep business logic.
## 3. Optional integrations
External systems should live in integration modules, for example:
- MinerU wrapper
- filesystem watch integration
- ripgrep integration
- LLM provider integration
Keep these adapters isolated.
## 4. Optional AI layer
The AI summarization layer should be behind a stable abstraction.
For example:
- load prompt template
- load paper markdown
- load optional profile / vocabulary
- call provider
- validate structured output
- write `summary.json`
- render `summary.md`
Avoid leaking provider-specific behavior into unrelated modules.
## Component boundaries
Avoid hidden coupling:
- `search` should not depend on LLM code
- `import` should not require summarization
- `reindex` should not assume a specific converter
- `render-summary` should not require calling AI again
Prefer explicit data flow:
- `import` creates or updates metadata
- `convert` creates `paper.md`
- `summarize` creates `summary.json`
- `render-summary` creates `summary.md`
- `reindex` rebuilds SQLite from files