# AGENTS.md ## Project overview `paperlib` is a local-first paper library engine with a CLI. **Key point**: `paperlib` is **not** primarily an AI app. AI summarization is optional enrichment. The project must remain useful without LLM configuration. ## Critical design principles 1. **Local-first**: User data lives locally. Prefer plain files + SQLite over opaque state. 2. **CLI-first**: The CLI is the primary interface. Python API is secondary. 3. **JSON files are source of truth**: Per-paper JSON files are durable truth. SQLite is rebuildable index/cache. 4. **AI is optional**: Core workflows (import/convert/index/list/show/search) work without AI. 5. **Machine-readable**: Commands support `--json` output for automation. ## Development commands - **Testing**: `uv run pytest` (specific: `uv run pytest tests/test_models.py`) - **Linting**: `uv run ruff check src/` - **Formatting**: `uv run ruff format` - **CLI testing**: `uv run paperlib --help` or `uv run paperlib init .tmp/test-lib` **Always use `uv run` for Python commands. Use `./.tmp` for test libraries (it's tmpfs).** ## Current CLI commands **Implemented**: - `init` - Initialize library - `status` - Show library config - `list` - List papers - `show` - Show paper details - `search` - Search papers - `import` - Import papers (PDF/arXiv) - `convert` - Convert PDFs to Markdown (MinerU) - `reindex` - Rebuild search index **Planned**: `import-dir`, `watch`, `doctor`, `open`, `print-path`, `summarize`, `render-summary`, `export` ## Critical constraints ### What paperlib IS - PDF import and local storage - PDF → Markdown conversion - Metadata files and search indexing - CLI for all operations - Optional AI summarization ### What paperlib is NOT - Web UI or daemon - Multi-user service - Cloud-first design - Vector database requirement - Autonomous research assistant ### File format stability Changes to `meta.json` or `summary.json` schemas are breaking changes. Must update schema version and consider migration. ### Module boundaries - `search` should not depend on LLM code - `import` should not require summarization - `reindex` should work from files alone - Keep AI behind clean interfaces ## Git commits Format: `": "` where scope is `feat|fix|docs|style|refactor|test|perf|update` First line ≤88 chars, second line empty. ## When you need details - **Architecture**: See `dev-docs/architecture.md` - **Data model**: See `dev-docs/data-model.md` - **AI integration**: See `dev-docs/ai-guidelines.md` - **Code style**: See `dev-docs/coding-guidelines.md` ## Decision heuristics When uncertain, prefer the option that is: - more local-first - more inspectable - easier to test - less coupled to AI - more stable for scripts - less magical