# CLI Reference This document describes all available commands in the paperlib CLI. ## Global Options All commands support these global options: - `--help`, `-h`: Show help message - `--version`: Show version information Many commands also support: - `--library`, `-L`: Specify library root directory (default: current directory) - `--json`: Output machine-readable JSON instead of human-readable format ## Commands ### `paperlib init [PATH]` Initialize a paper library directory structure. **Arguments:** - `PATH`: Directory to initialize (default: current directory) **Examples:** ```bash # Initialize library in current directory paperlib init # Initialize library in specific directory paperlib init /path/to/my/papers # Initialize and create parent directories paperlib init ~/Documents/research/papers ``` **Behavior:** - Creates standard directory structure (config/, papers/, db/, etc.) - Safe to run multiple times (idempotent) - Creates parent directories if they don't exist --- ### `paperlib import` Import papers into the library from various sources. **Required (one of):** - `--pdf PATH`: Import a local PDF file - `--arxiv ID`: Import paper from arXiv by ID or URL **Options:** - `--title TEXT`: Override paper title (for local PDFs) - `--notes TEXT`: Add notes about the paper - `--tags TAG1 TAG2`: Add tags to the paper - `--library PATH`: Specify library directory - `--json`: Output import results in JSON format for automation **Examples:** ```bash # Import local PDF paperlib import --pdf paper.pdf --title "My Research" --tags ml ai # Import from arXiv paperlib import --arxiv 2212.06340 # Import with arXiv URL paperlib import --arxiv https://arxiv.org/abs/2212.06340 # Import to specific library paperlib import --pdf paper.pdf --library ~/research # Import with JSON output for automation paperlib import --arxiv 2212.06340 --json ``` **Behavior:** - Generates stable paper ID based on content (local) or arXiv ID - Copies PDF to structured storage location - Creates meta.json with paper metadata - Prevents duplicate imports (same content/ID) - Indexes paper in search database --- ### `paperlib list` List all papers in the library with their current status. **Options:** - `--library PATH`: Specify library directory - `--json`: Output in JSON format **Examples:** ```bash # List all papers paperlib list # List papers in specific library paperlib list --library ~/research # Get machine-readable output paperlib list --json ``` **Output Format:** ``` Found 3 papers: 📄 arxiv-2212_06340 The new discontinuous Galerkin methods based numerical relativity program Nmesh By: Wolfgang Tichy, Liwei Ji, Ananya Adhikari (+2 more) Categories: gr-qc ⏳ local-a1b2c3d4e5f6 Machine Learning Applications in Physics Categories: cs.AI, physics.comp-ph ``` **Status Indicators:** - ⏳ Paper imported, conversion pending - 📄 PDF converted to Markdown - 📝 AI summary generated - ❌ Conversion or processing failed --- ### `paperlib show PAPER_ID` Show detailed information about a specific paper. **Arguments:** - `PAPER_ID`: The unique paper identifier **Options:** - `--library PATH`: Specify library directory - `--json`: Output in JSON format **Examples:** ```bash # Show paper details paperlib show arxiv-2212_06340 # Show with JSON output paperlib show local-a1b2c3d4 --json ``` **Output includes:** - All metadata fields - Processing status - File locations and existence - Import timestamp - Tags and notes --- ### `paperlib convert` Convert papers from PDF to Markdown using MinerU. **Options:** - `--library PATH`: Specify library directory - `--paper-id ID`: Convert specific paper only - `--retry-failed`: Retry papers with failed conversion status - `--force`: Force reconvert all papers (including successful ones) - `--no-ui`: Disable rich UI display (useful for scripting) - `--json`: Output conversion results in JSON format (automatically disables UI) **Examples:** ```bash # Convert all pending papers (with rich UI) paperlib convert # Retry failed conversions paperlib convert --retry-failed # Force reconvert all papers paperlib convert --force # Convert specific paper paperlib convert --paper-id arxiv-2212_06340 # Convert without UI (for scripts) paperlib convert --no-ui # Convert in specific library paperlib convert --library ~/research # Get JSON output for automation (disables UI automatically) paperlib convert --json paperlib convert --paper-id arxiv-2212_06340 --json ``` **Behavior:** - Processes papers with `conversion_status: pending` (or failed with `--retry-failed`) - Uses MinerU for PDF to Markdown conversion with CPU pipeline backend - Shows rich UI with progress bar and live MinerU output (unless `--no-ui`) - Updates metadata with conversion status - Creates conversion logs in `logs/` directory - Post-processes markdown to fix image references (`images/` → `assets/`) - Handles conversion failures gracefully **Rich UI Features:** - Progress bar showing papers converted - Live streaming of MinerU output - Current paper being processed - Color-coded output (errors in red, progress in blue, etc.) --- ### `paperlib reindex` Rebuild the search index from stored paper metadata. **Options:** - `--library PATH`: Specify library directory - `--json`: Output reindexing results and statistics in JSON format **Examples:** ```bash # Rebuild index paperlib reindex # Rebuild index for specific library paperlib reindex --library ~/research # Get JSON output with statistics paperlib reindex --json ``` **Behavior:** - Clears existing SQLite database - Scans all meta.json files in papers/ directory - Rebuilds full-text search index - Reports statistics on completion - Safe to run anytime (repairs corrupted index) --- ### `paperlib status` Show library configuration and layout information. **Options:** - `--library PATH`: Specify library directory - `--json`: Output in JSON format **Examples:** ```bash # Show current library status paperlib status # Show specific library status paperlib status --library ~/research # Get JSON output for automation paperlib status --json ``` **Output:** ``` root: /home/user/papers config: /home/user/papers/config/config.toml database: /home/user/papers/db/paperlib.sqlite3 papers: /home/user/papers/papers inbox: /home/user/papers/inbox cache: /home/user/papers/cache ``` --- ## Future Commands These commands are planned but not yet implemented: ### `paperlib search QUERY` Search papers by content and metadata. ### `paperlib summarize [PAPER_ID]` Generate AI summaries for papers. ### `paperlib export FORMAT` Export papers in various formats. ### `paperlib doctor` Diagnose and repair library issues. --- ## Exit Codes paperlib commands return standard exit codes: - `0`: Success - `1`: General error (file not found, invalid arguments, etc.) - `2`: Command line argument error ## Configuration paperlib looks for configuration in these locations (in order): 1. `$LIBRARY_ROOT/config/config.toml` 2. `~/.config/paperlib/config.toml` 3. Built-in defaults ## JSON Output Format When using `--json`, commands output structured data suitable for programmatic consumption. All JSON responses follow a consistent envelope format with standard fields: ### Standard Response Envelope **Success Response:** ```json { "success": true, "timestamp": "2024-01-15T10:30:00.000Z", // Command-specific data fields below } ``` **Error Response:** ```json { "success": false, "timestamp": "2024-01-15T10:30:00.000Z", "error": "Error message here", "error_code": 1 } ``` ### Command-Specific JSON Formats #### `paperlib status --json` ```json { "success": true, "timestamp": "2024-01-15T10:30:00.000Z", "library_root": "/home/user/papers", "config_path": "/home/user/papers/config/config.toml", "database_path": "/home/user/papers/db/paperlib.sqlite3", "papers_dir": "/home/user/papers/papers", "inbox_dir": "/home/user/papers/inbox", "cache_dir": "/home/user/papers/cache" } ``` #### `paperlib list --json` ```json { "success": true, "timestamp": "2024-01-15T10:30:00.000Z", "papers": [ { "paper_id": "arxiv-2212_06340", "source_type": "arxiv", "source_id": "2212.06340", "title": "Example Paper", "authors": ["Alice Smith", "Bob Jones"], "published_date": "2022-12-06T00:00:00.000Z", "categories": ["cs.AI"], "conversion_status": "success", "summary_status": "pending", "imported_at": "2024-01-15T10:30:00.000Z", "tags": [], "notes": "" } ], "total": 1 } ``` #### `paperlib show --json` ```json { "success": true, "timestamp": "2024-01-15T10:30:00.000Z", "paper": { "paper_id": "arxiv-2212_06340", "source_type": "arxiv", "source_id": "2212.06340", "title": "Example Paper", "authors": ["Alice Smith", "Bob Jones"], "conversion_status": "success", "summary_status": "pending", "pdf_path": "papers/arxiv/2022/arxiv-2212_06340.pdf", "paper_md_path": "papers/arxiv/2022/arxiv-2212_06340.md", "files_status": { "pdf_exists": true, "markdown_exists": true, "summary_exists": false } } } ``` #### `paperlib import --json` ```json { "success": true, "timestamp": "2024-01-15T10:30:00.000Z", "paper_id": "arxiv-2212_06340", "title": "Example Paper Title", "source_type": "arxiv", "source_id": "2212.06340", "authors": ["Alice Smith", "Bob Jones"], "message": "Successfully imported arXiv paper", "paper": { // Full paper metadata object } } ``` #### `paperlib convert --json` ```json { "success": true, "timestamp": "2024-01-15T10:30:00.000Z", "action": "convert_pending", "success_count": 5, "failure_count": 1, "total_attempted": 6 } ``` For single paper conversion (`--paper-id`): ```json { "success": true, "timestamp": "2024-01-15T10:30:00.000Z", "paper_id": "arxiv-2212_06340", "conversion_success": true, "conversion_status": "success", "message": "Successfully converted paper" } ``` #### `paperlib reindex --json` ```json { "success": true, "timestamp": "2024-01-15T10:30:00.000Z", "reindex_complete": true, "papers_indexed": 42, "errors": 1, "statistics": { "total_papers": 42, "by_source_type": { "arxiv": 38, "local": 4 } } } ``` ### JSON Data Types - **Timestamps**: Always in ISO 8601 format (`YYYY-MM-DDTHH:mm:ss.sssZ`) - **Paper IDs**: String identifiers (e.g., `"arxiv-2212_06340"`, `"local-a1b2c3d4"`) - **Status Fields**: String enums (`"pending"`, `"success"`, `"failed"`) - **Authors**: Array of strings - **Categories/Tags**: Array of strings - **File Paths**: Relative to library root This JSON format is stable across paperlib versions for reliable automation and scripting.