74 lines
1.5 KiB
Markdown
74 lines
1.5 KiB
Markdown
# Architecture Guidelines
|
|
|
|
The codebase should be organized around a few clear layers.
|
|
|
|
## 1. Core domain logic
|
|
|
|
Pure Python logic for:
|
|
|
|
- identifying papers
|
|
- computing paths
|
|
- importing PDFs
|
|
- updating metadata
|
|
- converting PDFs to Markdown
|
|
- rendering summaries
|
|
- rebuilding the index
|
|
|
|
This layer should be testable without the CLI.
|
|
|
|
## 2. CLI layer
|
|
|
|
Thin wrappers around the core domain logic.
|
|
|
|
The CLI should:
|
|
|
|
- parse arguments
|
|
- call core functions
|
|
- format output
|
|
- handle exit codes
|
|
|
|
The CLI should not contain deep business logic.
|
|
|
|
## 3. Optional integrations
|
|
|
|
External systems should live in integration modules, for example:
|
|
|
|
- MinerU wrapper
|
|
- filesystem watch integration
|
|
- ripgrep integration
|
|
- LLM provider integration
|
|
|
|
Keep these adapters isolated.
|
|
|
|
## 4. Optional AI layer
|
|
|
|
The AI summarization layer should be behind a stable abstraction.
|
|
|
|
For example:
|
|
|
|
- load prompt template
|
|
- load paper markdown
|
|
- load optional profile / vocabulary
|
|
- call provider
|
|
- validate structured output
|
|
- write `summary.json`
|
|
- render `summary.md`
|
|
|
|
Avoid leaking provider-specific behavior into unrelated modules.
|
|
|
|
## Component boundaries
|
|
|
|
Avoid hidden coupling:
|
|
|
|
- `search` should not depend on LLM code
|
|
- `import` should not require summarization
|
|
- `reindex` should not assume a specific converter
|
|
- `render-summary` should not require calling AI again
|
|
|
|
Prefer explicit data flow:
|
|
|
|
- `import` creates or updates metadata
|
|
- `convert` creates `paper.md`
|
|
- `summarize` creates `summary.json`
|
|
- `render-summary` creates `summary.md`
|
|
- `reindex` rebuilds SQLite from files |