463 lines
10 KiB
Markdown
463 lines
10 KiB
Markdown
# CLI Reference
|
|
|
|
This document describes all available commands in the paperlib CLI.
|
|
|
|
## Global Options
|
|
|
|
All commands support these global options:
|
|
|
|
- `--help`, `-h`: Show help message
|
|
- `--version`: Show version information
|
|
|
|
Many commands also support:
|
|
- `--library`, `-L`: Specify library root directory (default: current directory)
|
|
- `--json`: Output machine-readable JSON instead of human-readable format
|
|
|
|
## Commands
|
|
|
|
### `paperlib init [PATH]`
|
|
|
|
Initialize a paper library directory structure.
|
|
|
|
**Arguments:**
|
|
- `PATH`: Directory to initialize (default: current directory)
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Initialize library in current directory
|
|
paperlib init
|
|
|
|
# Initialize library in specific directory
|
|
paperlib init /path/to/my/papers
|
|
|
|
# Initialize and create parent directories
|
|
paperlib init ~/Documents/research/papers
|
|
```
|
|
|
|
**Behavior:**
|
|
- Creates standard directory structure (config/, papers/, db/, etc.)
|
|
- Safe to run multiple times (idempotent)
|
|
- Creates parent directories if they don't exist
|
|
|
|
---
|
|
|
|
### `paperlib import`
|
|
|
|
Import papers into the library from various sources.
|
|
|
|
**Required (one of):**
|
|
- `--pdf PATH`: Import a local PDF file
|
|
- `--arxiv ID`: Import paper from arXiv by ID or URL
|
|
|
|
**Options:**
|
|
- `--title TEXT`: Override paper title (for local PDFs)
|
|
- `--notes TEXT`: Add notes about the paper
|
|
- `--tags TAG1 TAG2`: Add tags to the paper
|
|
- `--library PATH`: Specify library directory
|
|
- `--json`: Output import results in JSON format for automation
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Import local PDF
|
|
paperlib import --pdf paper.pdf --title "My Research" --tags ml ai
|
|
|
|
# Import from arXiv
|
|
paperlib import --arxiv 2212.06340
|
|
|
|
# Import with arXiv URL
|
|
paperlib import --arxiv https://arxiv.org/abs/2212.06340
|
|
|
|
# Import to specific library
|
|
paperlib import --pdf paper.pdf --library ~/research
|
|
|
|
# Import with JSON output for automation
|
|
paperlib import --arxiv 2212.06340 --json
|
|
```
|
|
|
|
**Behavior:**
|
|
- Generates stable paper ID based on content (local) or arXiv ID
|
|
- Copies PDF to structured storage location
|
|
- Creates meta.json with paper metadata
|
|
- Prevents duplicate imports (same content/ID)
|
|
- Indexes paper in search database
|
|
|
|
---
|
|
|
|
### `paperlib list`
|
|
|
|
List all papers in the library with their current status.
|
|
|
|
**Options:**
|
|
- `--library PATH`: Specify library directory
|
|
- `--json`: Output in JSON format
|
|
|
|
**Examples:**
|
|
```bash
|
|
# List all papers
|
|
paperlib list
|
|
|
|
# List papers in specific library
|
|
paperlib list --library ~/research
|
|
|
|
# Get machine-readable output
|
|
paperlib list --json
|
|
```
|
|
|
|
**Output Format:**
|
|
```
|
|
Found 3 papers:
|
|
|
|
📄 arxiv-2212_06340
|
|
The new discontinuous Galerkin methods based numerical relativity program Nmesh
|
|
By: Wolfgang Tichy, Liwei Ji, Ananya Adhikari (+2 more)
|
|
Categories: gr-qc
|
|
|
|
⏳ local-a1b2c3d4e5f6
|
|
Machine Learning Applications in Physics
|
|
Categories: cs.AI, physics.comp-ph
|
|
```
|
|
|
|
**Status Indicators:**
|
|
- ⏳ Paper imported, conversion pending
|
|
- 📄 PDF converted to Markdown
|
|
- 📝 AI summary generated
|
|
- ❌ Conversion or processing failed
|
|
|
|
---
|
|
|
|
### `paperlib show PAPER_ID`
|
|
|
|
Show detailed information about a specific paper.
|
|
|
|
**Arguments:**
|
|
- `PAPER_ID`: The unique paper identifier
|
|
|
|
**Options:**
|
|
- `--library PATH`: Specify library directory
|
|
- `--json`: Output in JSON format
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Show paper details
|
|
paperlib show arxiv-2212_06340
|
|
|
|
# Show with JSON output
|
|
paperlib show local-a1b2c3d4 --json
|
|
```
|
|
|
|
**Output includes:**
|
|
- All metadata fields
|
|
- Processing status
|
|
- File locations and existence
|
|
- Import timestamp
|
|
- Tags and notes
|
|
|
|
---
|
|
|
|
### `paperlib convert`
|
|
|
|
Convert papers from PDF to Markdown using MinerU.
|
|
|
|
**Options:**
|
|
- `--library PATH`: Specify library directory
|
|
- `--paper-id ID`: Convert specific paper only
|
|
- `--retry-failed`: Retry papers with failed conversion status
|
|
- `--force`: Force reconvert all papers (including successful ones)
|
|
- `--no-ui`: Disable rich UI display (useful for scripting)
|
|
- `--json`: Output conversion results in JSON format (automatically disables UI)
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Convert all pending papers (with rich UI)
|
|
paperlib convert
|
|
|
|
# Retry failed conversions
|
|
paperlib convert --retry-failed
|
|
|
|
# Force reconvert all papers
|
|
paperlib convert --force
|
|
|
|
# Convert specific paper
|
|
paperlib convert --paper-id arxiv-2212_06340
|
|
|
|
# Convert without UI (for scripts)
|
|
paperlib convert --no-ui
|
|
|
|
# Convert in specific library
|
|
paperlib convert --library ~/research
|
|
|
|
# Get JSON output for automation (disables UI automatically)
|
|
paperlib convert --json
|
|
paperlib convert --paper-id arxiv-2212_06340 --json
|
|
```
|
|
|
|
**Behavior:**
|
|
- Processes papers with `conversion_status: pending` (or failed with `--retry-failed`)
|
|
- Uses MinerU for PDF to Markdown conversion with CPU pipeline backend
|
|
- Shows rich UI with progress bar and live MinerU output (unless `--no-ui`)
|
|
- Updates metadata with conversion status
|
|
- Creates conversion logs in `logs/` directory
|
|
- Post-processes markdown to fix image references (`images/` → `assets/`)
|
|
- Handles conversion failures gracefully
|
|
|
|
**Rich UI Features:**
|
|
- Progress bar showing papers converted
|
|
- Live streaming of MinerU output
|
|
- Current paper being processed
|
|
- Color-coded output (errors in red, progress in blue, etc.)
|
|
|
|
---
|
|
|
|
### `paperlib reindex`
|
|
|
|
Rebuild the search index from stored paper metadata.
|
|
|
|
**Options:**
|
|
- `--library PATH`: Specify library directory
|
|
- `--json`: Output reindexing results and statistics in JSON format
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Rebuild index
|
|
paperlib reindex
|
|
|
|
# Rebuild index for specific library
|
|
paperlib reindex --library ~/research
|
|
|
|
# Get JSON output with statistics
|
|
paperlib reindex --json
|
|
```
|
|
|
|
**Behavior:**
|
|
- Clears existing SQLite database
|
|
- Scans all meta.json files in papers/ directory
|
|
- Rebuilds full-text search index
|
|
- Reports statistics on completion
|
|
- Safe to run anytime (repairs corrupted index)
|
|
|
|
---
|
|
|
|
### `paperlib status`
|
|
|
|
Show library configuration and layout information.
|
|
|
|
**Options:**
|
|
- `--library PATH`: Specify library directory
|
|
- `--json`: Output in JSON format
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Show current library status
|
|
paperlib status
|
|
|
|
# Show specific library status
|
|
paperlib status --library ~/research
|
|
|
|
# Get JSON output for automation
|
|
paperlib status --json
|
|
```
|
|
|
|
**Output:**
|
|
```
|
|
root: /home/user/papers
|
|
config: /home/user/papers/config/config.toml
|
|
database: /home/user/papers/db/paperlib.sqlite3
|
|
papers: /home/user/papers/papers
|
|
inbox: /home/user/papers/inbox
|
|
cache: /home/user/papers/cache
|
|
```
|
|
|
|
---
|
|
|
|
## Future Commands
|
|
|
|
These commands are planned but not yet implemented:
|
|
|
|
### `paperlib search QUERY`
|
|
Search papers by content and metadata.
|
|
|
|
### `paperlib summarize [PAPER_ID]`
|
|
Generate AI summaries for papers.
|
|
|
|
### `paperlib export FORMAT`
|
|
Export papers in various formats.
|
|
|
|
### `paperlib doctor`
|
|
Diagnose and repair library issues.
|
|
|
|
---
|
|
|
|
## Exit Codes
|
|
|
|
paperlib commands return standard exit codes:
|
|
|
|
- `0`: Success
|
|
- `1`: General error (file not found, invalid arguments, etc.)
|
|
- `2`: Command line argument error
|
|
|
|
## Configuration
|
|
|
|
paperlib looks for configuration in these locations (in order):
|
|
1. `$LIBRARY_ROOT/config/config.toml`
|
|
2. `~/.config/paperlib/config.toml`
|
|
3. Built-in defaults
|
|
|
|
## JSON Output Format
|
|
|
|
When using `--json`, commands output structured data suitable for programmatic consumption. All JSON responses follow a consistent envelope format with standard fields:
|
|
|
|
### Standard Response Envelope
|
|
|
|
**Success Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"timestamp": "2024-01-15T10:30:00.000Z",
|
|
// Command-specific data fields below
|
|
}
|
|
```
|
|
|
|
**Error Response:**
|
|
```json
|
|
{
|
|
"success": false,
|
|
"timestamp": "2024-01-15T10:30:00.000Z",
|
|
"error": "Error message here",
|
|
"error_code": 1
|
|
}
|
|
```
|
|
|
|
### Command-Specific JSON Formats
|
|
|
|
#### `paperlib status --json`
|
|
```json
|
|
{
|
|
"success": true,
|
|
"timestamp": "2024-01-15T10:30:00.000Z",
|
|
"library_root": "/home/user/papers",
|
|
"config_path": "/home/user/papers/config/config.toml",
|
|
"database_path": "/home/user/papers/db/paperlib.sqlite3",
|
|
"papers_dir": "/home/user/papers/papers",
|
|
"inbox_dir": "/home/user/papers/inbox",
|
|
"cache_dir": "/home/user/papers/cache"
|
|
}
|
|
```
|
|
|
|
#### `paperlib list --json`
|
|
```json
|
|
{
|
|
"success": true,
|
|
"timestamp": "2024-01-15T10:30:00.000Z",
|
|
"papers": [
|
|
{
|
|
"paper_id": "arxiv-2212_06340",
|
|
"source_type": "arxiv",
|
|
"source_id": "2212.06340",
|
|
"title": "Example Paper",
|
|
"authors": ["Alice Smith", "Bob Jones"],
|
|
"published_date": "2022-12-06T00:00:00.000Z",
|
|
"categories": ["cs.AI"],
|
|
"conversion_status": "success",
|
|
"summary_status": "pending",
|
|
"imported_at": "2024-01-15T10:30:00.000Z",
|
|
"tags": [],
|
|
"notes": ""
|
|
}
|
|
],
|
|
"total": 1
|
|
}
|
|
```
|
|
|
|
#### `paperlib show <paper_id> --json`
|
|
```json
|
|
{
|
|
"success": true,
|
|
"timestamp": "2024-01-15T10:30:00.000Z",
|
|
"paper": {
|
|
"paper_id": "arxiv-2212_06340",
|
|
"source_type": "arxiv",
|
|
"source_id": "2212.06340",
|
|
"title": "Example Paper",
|
|
"authors": ["Alice Smith", "Bob Jones"],
|
|
"conversion_status": "success",
|
|
"summary_status": "pending",
|
|
"pdf_path": "papers/arxiv/2022/arxiv-2212_06340.pdf",
|
|
"paper_md_path": "papers/arxiv/2022/arxiv-2212_06340.md",
|
|
"files_status": {
|
|
"pdf_exists": true,
|
|
"markdown_exists": true,
|
|
"summary_exists": false
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
#### `paperlib import --json`
|
|
```json
|
|
{
|
|
"success": true,
|
|
"timestamp": "2024-01-15T10:30:00.000Z",
|
|
"paper_id": "arxiv-2212_06340",
|
|
"title": "Example Paper Title",
|
|
"source_type": "arxiv",
|
|
"source_id": "2212.06340",
|
|
"authors": ["Alice Smith", "Bob Jones"],
|
|
"message": "Successfully imported arXiv paper",
|
|
"paper": {
|
|
// Full paper metadata object
|
|
}
|
|
}
|
|
```
|
|
|
|
#### `paperlib convert --json`
|
|
```json
|
|
{
|
|
"success": true,
|
|
"timestamp": "2024-01-15T10:30:00.000Z",
|
|
"action": "convert_pending",
|
|
"success_count": 5,
|
|
"failure_count": 1,
|
|
"total_attempted": 6
|
|
}
|
|
```
|
|
|
|
For single paper conversion (`--paper-id`):
|
|
```json
|
|
{
|
|
"success": true,
|
|
"timestamp": "2024-01-15T10:30:00.000Z",
|
|
"paper_id": "arxiv-2212_06340",
|
|
"conversion_success": true,
|
|
"conversion_status": "success",
|
|
"message": "Successfully converted paper"
|
|
}
|
|
```
|
|
|
|
#### `paperlib reindex --json`
|
|
```json
|
|
{
|
|
"success": true,
|
|
"timestamp": "2024-01-15T10:30:00.000Z",
|
|
"reindex_complete": true,
|
|
"papers_indexed": 42,
|
|
"errors": 1,
|
|
"statistics": {
|
|
"total_papers": 42,
|
|
"by_source_type": {
|
|
"arxiv": 38,
|
|
"local": 4
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### JSON Data Types
|
|
|
|
- **Timestamps**: Always in ISO 8601 format (`YYYY-MM-DDTHH:mm:ss.sssZ`)
|
|
- **Paper IDs**: String identifiers (e.g., `"arxiv-2212_06340"`, `"local-a1b2c3d4"`)
|
|
- **Status Fields**: String enums (`"pending"`, `"success"`, `"failed"`)
|
|
- **Authors**: Array of strings
|
|
- **Categories/Tags**: Array of strings
|
|
- **File Paths**: Relative to library root
|
|
|
|
This JSON format is stable across paperlib versions for reliable automation and scripting. |