Files
2026-04-17 20:04:32 -04:00

463 lines
10 KiB
Markdown

# CLI Reference
This document describes all available commands in the paperlib CLI.
## Global Options
All commands support these global options:
- `--help`, `-h`: Show help message
- `--version`: Show version information
Many commands also support:
- `--library`, `-L`: Specify library root directory (default: current directory)
- `--json`: Output machine-readable JSON instead of human-readable format
## Commands
### `paperlib init [PATH]`
Initialize a paper library directory structure.
**Arguments:**
- `PATH`: Directory to initialize (default: current directory)
**Examples:**
```bash
# Initialize library in current directory
paperlib init
# Initialize library in specific directory
paperlib init /path/to/my/papers
# Initialize and create parent directories
paperlib init ~/Documents/research/papers
```
**Behavior:**
- Creates standard directory structure (config/, papers/, db/, etc.)
- Safe to run multiple times (idempotent)
- Creates parent directories if they don't exist
---
### `paperlib import`
Import papers into the library from various sources.
**Required (one of):**
- `--pdf PATH`: Import a local PDF file
- `--arxiv ID`: Import paper from arXiv by ID or URL
**Options:**
- `--title TEXT`: Override paper title (for local PDFs)
- `--notes TEXT`: Add notes about the paper
- `--tags TAG1 TAG2`: Add tags to the paper
- `--library PATH`: Specify library directory
- `--json`: Output import results in JSON format for automation
**Examples:**
```bash
# Import local PDF
paperlib import --pdf paper.pdf --title "My Research" --tags ml ai
# Import from arXiv
paperlib import --arxiv 2212.06340
# Import with arXiv URL
paperlib import --arxiv https://arxiv.org/abs/2212.06340
# Import to specific library
paperlib import --pdf paper.pdf --library ~/research
# Import with JSON output for automation
paperlib import --arxiv 2212.06340 --json
```
**Behavior:**
- Generates stable paper ID based on content (local) or arXiv ID
- Copies PDF to structured storage location
- Creates meta.json with paper metadata
- Prevents duplicate imports (same content/ID)
- Indexes paper in search database
---
### `paperlib list`
List all papers in the library with their current status.
**Options:**
- `--library PATH`: Specify library directory
- `--json`: Output in JSON format
**Examples:**
```bash
# List all papers
paperlib list
# List papers in specific library
paperlib list --library ~/research
# Get machine-readable output
paperlib list --json
```
**Output Format:**
```
Found 3 papers:
📄 arxiv-2212_06340
The new discontinuous Galerkin methods based numerical relativity program Nmesh
By: Wolfgang Tichy, Liwei Ji, Ananya Adhikari (+2 more)
Categories: gr-qc
⏳ local-a1b2c3d4e5f6
Machine Learning Applications in Physics
Categories: cs.AI, physics.comp-ph
```
**Status Indicators:**
- ⏳ Paper imported, conversion pending
- 📄 PDF converted to Markdown
- 📝 AI summary generated
- ❌ Conversion or processing failed
---
### `paperlib show PAPER_ID`
Show detailed information about a specific paper.
**Arguments:**
- `PAPER_ID`: The unique paper identifier
**Options:**
- `--library PATH`: Specify library directory
- `--json`: Output in JSON format
**Examples:**
```bash
# Show paper details
paperlib show arxiv-2212_06340
# Show with JSON output
paperlib show local-a1b2c3d4 --json
```
**Output includes:**
- All metadata fields
- Processing status
- File locations and existence
- Import timestamp
- Tags and notes
---
### `paperlib convert`
Convert papers from PDF to Markdown using MinerU.
**Options:**
- `--library PATH`: Specify library directory
- `--paper-id ID`: Convert specific paper only
- `--retry-failed`: Retry papers with failed conversion status
- `--force`: Force reconvert all papers (including successful ones)
- `--no-ui`: Disable rich UI display (useful for scripting)
- `--json`: Output conversion results in JSON format (automatically disables UI)
**Examples:**
```bash
# Convert all pending papers (with rich UI)
paperlib convert
# Retry failed conversions
paperlib convert --retry-failed
# Force reconvert all papers
paperlib convert --force
# Convert specific paper
paperlib convert --paper-id arxiv-2212_06340
# Convert without UI (for scripts)
paperlib convert --no-ui
# Convert in specific library
paperlib convert --library ~/research
# Get JSON output for automation (disables UI automatically)
paperlib convert --json
paperlib convert --paper-id arxiv-2212_06340 --json
```
**Behavior:**
- Processes papers with `conversion_status: pending` (or failed with `--retry-failed`)
- Uses MinerU for PDF to Markdown conversion with CPU pipeline backend
- Shows rich UI with progress bar and live MinerU output (unless `--no-ui`)
- Updates metadata with conversion status
- Creates conversion logs in `logs/` directory
- Post-processes markdown to fix image references (`images/``assets/`)
- Handles conversion failures gracefully
**Rich UI Features:**
- Progress bar showing papers converted
- Live streaming of MinerU output
- Current paper being processed
- Color-coded output (errors in red, progress in blue, etc.)
---
### `paperlib reindex`
Rebuild the search index from stored paper metadata.
**Options:**
- `--library PATH`: Specify library directory
- `--json`: Output reindexing results and statistics in JSON format
**Examples:**
```bash
# Rebuild index
paperlib reindex
# Rebuild index for specific library
paperlib reindex --library ~/research
# Get JSON output with statistics
paperlib reindex --json
```
**Behavior:**
- Clears existing SQLite database
- Scans all meta.json files in papers/ directory
- Rebuilds full-text search index
- Reports statistics on completion
- Safe to run anytime (repairs corrupted index)
---
### `paperlib status`
Show library configuration and layout information.
**Options:**
- `--library PATH`: Specify library directory
- `--json`: Output in JSON format
**Examples:**
```bash
# Show current library status
paperlib status
# Show specific library status
paperlib status --library ~/research
# Get JSON output for automation
paperlib status --json
```
**Output:**
```
root: /home/user/papers
config: /home/user/papers/config/config.toml
database: /home/user/papers/db/paperlib.sqlite3
papers: /home/user/papers/papers
inbox: /home/user/papers/inbox
cache: /home/user/papers/cache
```
---
## Future Commands
These commands are planned but not yet implemented:
### `paperlib search QUERY`
Search papers by content and metadata.
### `paperlib summarize [PAPER_ID]`
Generate AI summaries for papers.
### `paperlib export FORMAT`
Export papers in various formats.
### `paperlib doctor`
Diagnose and repair library issues.
---
## Exit Codes
paperlib commands return standard exit codes:
- `0`: Success
- `1`: General error (file not found, invalid arguments, etc.)
- `2`: Command line argument error
## Configuration
paperlib looks for configuration in these locations (in order):
1. `$LIBRARY_ROOT/config/config.toml`
2. `~/.config/paperlib/config.toml`
3. Built-in defaults
## JSON Output Format
When using `--json`, commands output structured data suitable for programmatic consumption. All JSON responses follow a consistent envelope format with standard fields:
### Standard Response Envelope
**Success Response:**
```json
{
"success": true,
"timestamp": "2024-01-15T10:30:00.000Z",
// Command-specific data fields below
}
```
**Error Response:**
```json
{
"success": false,
"timestamp": "2024-01-15T10:30:00.000Z",
"error": "Error message here",
"error_code": 1
}
```
### Command-Specific JSON Formats
#### `paperlib status --json`
```json
{
"success": true,
"timestamp": "2024-01-15T10:30:00.000Z",
"library_root": "/home/user/papers",
"config_path": "/home/user/papers/config/config.toml",
"database_path": "/home/user/papers/db/paperlib.sqlite3",
"papers_dir": "/home/user/papers/papers",
"inbox_dir": "/home/user/papers/inbox",
"cache_dir": "/home/user/papers/cache"
}
```
#### `paperlib list --json`
```json
{
"success": true,
"timestamp": "2024-01-15T10:30:00.000Z",
"papers": [
{
"paper_id": "arxiv-2212_06340",
"source_type": "arxiv",
"source_id": "2212.06340",
"title": "Example Paper",
"authors": ["Alice Smith", "Bob Jones"],
"published_date": "2022-12-06T00:00:00.000Z",
"categories": ["cs.AI"],
"conversion_status": "success",
"summary_status": "pending",
"imported_at": "2024-01-15T10:30:00.000Z",
"tags": [],
"notes": ""
}
],
"total": 1
}
```
#### `paperlib show <paper_id> --json`
```json
{
"success": true,
"timestamp": "2024-01-15T10:30:00.000Z",
"paper": {
"paper_id": "arxiv-2212_06340",
"source_type": "arxiv",
"source_id": "2212.06340",
"title": "Example Paper",
"authors": ["Alice Smith", "Bob Jones"],
"conversion_status": "success",
"summary_status": "pending",
"pdf_path": "papers/arxiv/2022/arxiv-2212_06340.pdf",
"paper_md_path": "papers/arxiv/2022/arxiv-2212_06340.md",
"files_status": {
"pdf_exists": true,
"markdown_exists": true,
"summary_exists": false
}
}
}
```
#### `paperlib import --json`
```json
{
"success": true,
"timestamp": "2024-01-15T10:30:00.000Z",
"paper_id": "arxiv-2212_06340",
"title": "Example Paper Title",
"source_type": "arxiv",
"source_id": "2212.06340",
"authors": ["Alice Smith", "Bob Jones"],
"message": "Successfully imported arXiv paper",
"paper": {
// Full paper metadata object
}
}
```
#### `paperlib convert --json`
```json
{
"success": true,
"timestamp": "2024-01-15T10:30:00.000Z",
"action": "convert_pending",
"success_count": 5,
"failure_count": 1,
"total_attempted": 6
}
```
For single paper conversion (`--paper-id`):
```json
{
"success": true,
"timestamp": "2024-01-15T10:30:00.000Z",
"paper_id": "arxiv-2212_06340",
"conversion_success": true,
"conversion_status": "success",
"message": "Successfully converted paper"
}
```
#### `paperlib reindex --json`
```json
{
"success": true,
"timestamp": "2024-01-15T10:30:00.000Z",
"reindex_complete": true,
"papers_indexed": 42,
"errors": 1,
"statistics": {
"total_papers": 42,
"by_source_type": {
"arxiv": 38,
"local": 4
}
}
}
```
### JSON Data Types
- **Timestamps**: Always in ISO 8601 format (`YYYY-MM-DDTHH:mm:ss.sssZ`)
- **Paper IDs**: String identifiers (e.g., `"arxiv-2212_06340"`, `"local-a1b2c3d4"`)
- **Status Fields**: String enums (`"pending"`, `"success"`, `"failed"`)
- **Authors**: Array of strings
- **Categories/Tags**: Array of strings
- **File Paths**: Relative to library root
This JSON format is stable across paperlib versions for reliable automation and scripting.