1.5 KiB
1.5 KiB
Architecture Guidelines
The codebase should be organized around a few clear layers.
1. Core domain logic
Pure Python logic for:
- identifying papers
- computing paths
- importing PDFs
- updating metadata
- converting PDFs to Markdown
- rendering summaries
- rebuilding the index
This layer should be testable without the CLI.
2. CLI layer
Thin wrappers around the core domain logic.
The CLI should:
- parse arguments
- call core functions
- format output
- handle exit codes
The CLI should not contain deep business logic.
3. Optional integrations
External systems should live in integration modules, for example:
- MinerU wrapper
- filesystem watch integration
- ripgrep integration
- LLM provider integration
Keep these adapters isolated.
4. Optional AI layer
The AI summarization layer should be behind a stable abstraction.
For example:
- load prompt template
- load paper markdown
- load optional profile / vocabulary
- call provider
- validate structured output
- write
summary.json - render
summary.md
Avoid leaking provider-specific behavior into unrelated modules.
Component boundaries
Avoid hidden coupling:
searchshould not depend on LLM codeimportshould not require summarizationreindexshould not assume a specific converterrender-summaryshould not require calling AI again
Prefer explicit data flow:
importcreates or updates metadataconvertcreatespaper.mdsummarizecreatessummary.jsonrender-summarycreatessummary.mdreindexrebuilds SQLite from files