paperlib
A local-first paper library engine with a CLI for managing academic papers.
paperlib is designed to import PDF papers into a structured local library, convert PDFs into Markdown using external converters, maintain stable per-paper metadata files, and provide a searchable index database. It offers optional AI-based structured summaries while remaining useful even without AI features.
Key Features
- Local-first: All data lives locally in the paper library directory
- CLI-first: All important workflows accessible from the command line
- JSON source of truth: Per-paper metadata files with rebuildable SQLite index
- AI-optional: Core workflows work without LLM configuration
- Machine-readable:
--jsonoutput for automation and integration - Stable interfaces: Designed for scripts and higher-level tools
Installation
# Install with uv (recommended)
uv add paperlib
# Or with pip
pip install paperlib
Quick Start
# Initialize a paper library
paperlib init
# Import a local PDF
paperlib import --pdf paper.pdf --title "My Research Paper"
# Import from arXiv
paperlib import --arxiv 2212.06340
# List all papers
paperlib list
# Show paper details
paperlib show <paper-id>
# Convert PDFs to Markdown (requires MinerU)
paperlib convert
# Search papers
paperlib search "machine learning"
# Rebuild search index
paperlib reindex
Core Commands
Library Management
paperlib init [path]- Initialize a paper library directorypaperlib status- Show library configuration and layoutpaperlib reindex- Rebuild search index from stored papers
Paper Import
paperlib import --pdf <path>- Import a local PDF filepaperlib import --arxiv <id>- Import paper from arXiv- Options:
--title,--notes,--tags,--library
Paper Management
paperlib list- List all imported papers with statuspaperlib show <paper-id>- Show detailed paper informationpaperlib convert- Convert pending papers to Markdown using MinerU
Search (Future)
paperlib search <query>- Search papers by content and metadata
Library Structure
A paperlib library is organized as follows:
library_root/
├── config/
│ ├── config.toml
│ └── prompts/
├── papers/
│ ├── arxiv/
│ │ └── 2026/
│ │ └── arxiv-2212_06340/
│ │ ├── meta.json # Paper metadata
│ │ ├── source.pdf # Original PDF
│ │ ├── paper.md # Converted markdown
│ │ ├── summary.json # AI summary (optional)
│ │ ├── summary.md # Rendered summary
│ │ ├── assets/ # Images, figures
│ │ └── logs/ # Conversion logs
│ └── local/
│ └── <hash>/
│ └── ...
├── db/
│ └── paperlib.sqlite3 # Search index (rebuildable)
├── inbox/ # Temporary imports
└── cache/ # Processing cache
Data Model
Paper Metadata (meta.json)
Each paper has a meta.json file containing:
- Core identifiers:
paper_id,source_type,source_id - Bibliographic info:
title,authors,published_date,categories - File paths:
pdf_path,paper_md_path,summary_json_path - Processing status:
conversion_status,summary_status - User data:
tags,notes
Summary Data (summary.json)
Optional AI-generated summaries with:
- Structured fields: problem statement, method overview, results
- Categorization: problem tags, technique tags
- Relevance scoring and recommended sections
PDF Conversion
paperlib integrates with MinerU for high-quality PDF to Markdown conversion:
# Install MinerU (optional)
pip install mineru[core]
# Convert all pending papers
paperlib convert
# Convert specific paper
paperlib convert --paper-id <paper-id>
Machine-Readable Output
Most commands support --json output for automation:
paperlib list --json
paperlib show <paper-id> --json
paperlib status --json
Development
paperlib is designed for extensibility and integration with higher-level tools.
Running Tests
# Run all tests
uv run pytest
# Run specific test module
uv run pytest tests/test_models.py
# Run with coverage
uv run pytest --cov=paperlib
Code Quality
# Format code
uv run ruff format
# Check linting
uv run ruff check
# Type checking
uv run mypy src/
Architecture
paperlib follows clean architecture principles:
- Models: Data structures for papers and summaries
- Storage: File-based metadata and PDF management
- Index: SQLite search and retrieval layer
- Importers: PDF and arXiv import workflows
- Converters: PDF to Markdown transformation
- CLI: Command-line interface and argument parsing
Roadmap
- Core paper import (local PDF, arXiv)
- PDF to Markdown conversion (MinerU integration)
- Metadata management and search indexing
- CLI with all basic commands
- Comprehensive test suite
- Search command implementation
- AI summarization with provider abstraction
- JSON output for all commands
- Configuration file support
- Advanced arXiv workflows
Non-Goals
paperlib is intentionally focused and does NOT include:
- Web UI or GUI applications
- Multi-user or cloud-first features
- Mandatory daemon or background services
- Vector database requirements
- Fully autonomous research assistant behavior
License
MIT License - see LICENSE file for details.
Contributing
Contributions welcome! Please read the development guidelines in AGENTS.md and ensure all tests pass before submitting PRs.