paperlib/dev-docs/architecture.md

# Architecture Guidelines

The codebase should be organized around a few clear layers.

## 1. Core domain logic

Pure Python logic for:

- identifying papers
- computing paths
- importing PDFs
- updating metadata
- converting PDFs to Markdown
- rendering summaries
- rebuilding the index

This layer should be testable without the CLI.

## 2. CLI layer

Thin wrappers around the core domain logic.

The CLI should:

- parse arguments
- call core functions
- format output
- handle exit codes

The CLI should not contain deep business logic.

## 3. Optional integrations

External systems should live in integration modules, for example:

- MinerU wrapper
- filesystem watch integration
- ripgrep integration
- LLM provider integration

Keep these adapters isolated.

## 4. Optional AI layer

The AI summarization layer should be behind a stable abstraction.

For example:

- load prompt template
- load paper markdown
- load optional profile / vocabulary
- call provider
- validate structured output
- write `summary.json`
- render `summary.md`

Avoid leaking provider-specific behavior into unrelated modules.

## Component boundaries

Avoid hidden coupling:

- `search` should not depend on LLM code
- `import` should not require summarization
- `reindex` should not assume a specific converter
- `render-summary` should not require calling AI again

Prefer explicit data flow:

- `import` creates or updates metadata
- `convert` creates `paper.md`
- `summarize` creates `summary.json`
- `render-summary` creates `summary.md`
- `reindex` rebuilds SQLite from files