paperlib/AGENTS.md

# AGENTS.md

## Project overview

`paperlib` is a local-first paper library engine with a CLI.

**Key point**: `paperlib` is **not** primarily an AI app. AI summarization is optional enrichment. The project must remain useful without LLM configuration.

## Critical design principles

1. **Local-first**: User data lives locally. Prefer plain files + SQLite over opaque state.
2. **CLI-first**: The CLI is the primary interface. Python API is secondary.
3. **JSON files are source of truth**: Per-paper JSON files are durable truth. SQLite is rebuildable index/cache.
4. **AI is optional**: Core workflows (import/convert/index/list/show/search) work without AI.
5. **Machine-readable**: Commands support `--json` output for automation.

## Development commands

- **Testing**: `uv run pytest` (specific: `uv run pytest tests/test_models.py`)
- **Linting**: `uv run ruff check src/`
- **Formatting**: `uv run ruff format`
- **CLI testing**: `uv run paperlib --help` or `uv run paperlib init .tmp/test-lib`

**Always use `uv run` for Python commands. Use `./.tmp` for test libraries (it's tmpfs).**

## Current CLI commands

**Implemented**:
- `init` - Initialize library
- `status` - Show library config
- `list` - List papers
- `show` - Show paper details
- `search` - Search papers
- `import` - Import papers (PDF/arXiv)
- `convert` - Convert PDFs to Markdown (MinerU)
- `reindex` - Rebuild search index

**Planned**: `import-dir`, `watch`, `doctor`, `open`, `print-path`, `summarize`, `render-summary`, `export`

## Critical constraints

### What paperlib IS
- PDF import and local storage
- PDF → Markdown conversion
- Metadata files and search indexing
- CLI for all operations
- Optional AI summarization

### What paperlib is NOT
- Web UI or daemon
- Multi-user service
- Cloud-first design
- Vector database requirement
- Autonomous research assistant

### File format stability
Changes to `meta.json` or `summary.json` schemas are breaking changes. Must update schema version and consider migration.

### Module boundaries
- `search` should not depend on LLM code
- `import` should not require summarization
- `reindex` should work from files alone
- Keep AI behind clean interfaces

## Git commits
Format: `"<scope>: <subject>"` where scope is `feat|fix|docs|style|refactor|test|perf|update`
First line ≤88 chars, second line empty.

## When you need details

- **Architecture**: See `dev-docs/architecture.md`
- **Data model**: See `dev-docs/data-model.md`
- **AI integration**: See `dev-docs/ai-guidelines.md`
- **Code style**: See `dev-docs/coding-guidelines.md`

## Decision heuristics

When uncertain, prefer the option that is:
- more local-first
- more inspectable
- easier to test
- less coupled to AI
- more stable for scripts
- less magical