Files
paperlib/AGENTS.md
T

2.7 KiB

AGENTS.md

Project overview

paperlib is a local-first paper library engine with a CLI.

Key point: paperlib is not primarily an AI app. AI summarization is optional enrichment. The project must remain useful without LLM configuration.

Critical design principles

  1. Local-first: User data lives locally. Prefer plain files + SQLite over opaque state.
  2. CLI-first: The CLI is the primary interface. Python API is secondary.
  3. JSON files are source of truth: Per-paper JSON files are durable truth. SQLite is rebuildable index/cache.
  4. AI is optional: Core workflows (import/convert/index/list/show/search) work without AI.
  5. Machine-readable: Commands support --json output for automation.

Development commands

  • Testing: uv run pytest (specific: uv run pytest tests/test_models.py)
  • Linting: uv run ruff check src/
  • Formatting: uv run ruff format
  • CLI testing: uv run paperlib --help or uv run paperlib init .tmp/test-lib

Always use uv run for Python commands. Use ./.tmp for test libraries (it's tmpfs).

Current CLI commands

Implemented:

  • init - Initialize library
  • status - Show library config
  • list - List papers
  • show - Show paper details
  • search - Search papers
  • import - Import papers (PDF/arXiv)
  • convert - Convert PDFs to Markdown (MinerU)
  • reindex - Rebuild search index

Planned: import-dir, watch, doctor, open, print-path, summarize, render-summary, export

Critical constraints

What paperlib IS

  • PDF import and local storage
  • PDF → Markdown conversion
  • Metadata files and search indexing
  • CLI for all operations
  • Optional AI summarization

What paperlib is NOT

  • Web UI or daemon
  • Multi-user service
  • Cloud-first design
  • Vector database requirement
  • Autonomous research assistant

File format stability

Changes to meta.json or summary.json schemas are breaking changes. Must update schema version and consider migration.

Module boundaries

  • search should not depend on LLM code
  • import should not require summarization
  • reindex should work from files alone
  • Keep AI behind clean interfaces

Git commits

Format: "<scope>: <subject>" where scope is feat|fix|docs|style|refactor|test|perf|update First line ≤88 chars, second line empty.

When you need details

  • Architecture: See dev-docs/architecture.md
  • Data model: See dev-docs/data-model.md
  • AI integration: See dev-docs/ai-guidelines.md
  • Code style: See dev-docs/coding-guidelines.md

Decision heuristics

When uncertain, prefer the option that is:

  • more local-first
  • more inspectable
  • easier to test
  • less coupled to AI
  • more stable for scripts
  • less magical