paperlib/docs/cli.md

# CLI Reference

This document describes all available commands in the paperlib CLI.

## Global Options

All commands support these global options:

- `--help`, `-h`: Show help message
- `--version`: Show version information

Many commands also support:
- `--library`, `-L`: Specify library root directory (default: current directory)
- `--json`: Output machine-readable JSON instead of human-readable format

## Commands

### `paperlib init [PATH]`

Initialize a paper library directory structure.

**Arguments:**
- `PATH`: Directory to initialize (default: current directory)

**Examples:**
```bash
# Initialize library in current directory
paperlib init

# Initialize library in specific directory
paperlib init /path/to/my/papers

# Initialize and create parent directories
paperlib init ~/Documents/research/papers
```

**Behavior:**
- Creates standard directory structure (config/, papers/, db/, etc.)
- Safe to run multiple times (idempotent)
- Creates parent directories if they don't exist

---

### `paperlib import`

Import papers into the library from various sources.

**Required (one of):**
- `--pdf PATH`: Import a local PDF file
- `--arxiv ID`: Import paper from arXiv by ID or URL

**Options:**
- `--title TEXT`: Override paper title (for local PDFs)
- `--notes TEXT`: Add notes about the paper
- `--tags TAG1 TAG2`: Add tags to the paper
- `--library PATH`: Specify library directory
- `--json`: Output import results in JSON format for automation

**Examples:**
```bash
# Import local PDF
paperlib import --pdf paper.pdf --title "My Research" --tags ml ai

# Import from arXiv
paperlib import --arxiv 2212.06340

# Import with arXiv URL
paperlib import --arxiv https://arxiv.org/abs/2212.06340

# Import to specific library
paperlib import --pdf paper.pdf --library ~/research

# Import with JSON output for automation
paperlib import --arxiv 2212.06340 --json
```

**Behavior:**
- Generates stable paper ID based on content (local) or arXiv ID
- Copies PDF to structured storage location
- Creates meta.json with paper metadata
- Prevents duplicate imports (same content/ID)
- Indexes paper in search database

---

### `paperlib list`

List all papers in the library with their current status.

**Options:**
- `--library PATH`: Specify library directory
- `--json`: Output in JSON format

**Examples:**
```bash
# List all papers
paperlib list

# List papers in specific library
paperlib list --library ~/research

# Get machine-readable output
paperlib list --json
```

**Output Format:**
```
Found 3 papers:

📄 arxiv-2212_06340
   The new discontinuous Galerkin methods based numerical relativity program Nmesh
   By: Wolfgang Tichy, Liwei Ji, Ananya Adhikari (+2 more)
   Categories: gr-qc

⏳ local-a1b2c3d4e5f6
   Machine Learning Applications in Physics
   Categories: cs.AI, physics.comp-ph
```

**Status Indicators:**
- ⏳ Paper imported, conversion pending
- 📄 PDF converted to Markdown
- 📝 AI summary generated
- ❌ Conversion or processing failed

---

### `paperlib show PAPER_ID`

Show detailed information about a specific paper.

**Arguments:**
- `PAPER_ID`: The unique paper identifier

**Options:**
- `--library PATH`: Specify library directory
- `--json`: Output in JSON format

**Examples:**
```bash
# Show paper details
paperlib show arxiv-2212_06340

# Show with JSON output
paperlib show local-a1b2c3d4 --json
```

**Output includes:**
- All metadata fields
- Processing status
- File locations and existence
- Import timestamp
- Tags and notes

---

### `paperlib convert`

Convert papers from PDF to Markdown using MinerU.

**Options:**
- `--library PATH`: Specify library directory
- `--paper-id ID`: Convert specific paper only
- `--retry-failed`: Retry papers with failed conversion status
- `--force`: Force reconvert all papers (including successful ones)
- `--no-ui`: Disable rich UI display (useful for scripting)
- `--json`: Output conversion results in JSON format (automatically disables UI)

**Examples:**
```bash
# Convert all pending papers (with rich UI)
paperlib convert

# Retry failed conversions
paperlib convert --retry-failed

# Force reconvert all papers
paperlib convert --force

# Convert specific paper
paperlib convert --paper-id arxiv-2212_06340

# Convert without UI (for scripts)
paperlib convert --no-ui

# Convert in specific library
paperlib convert --library ~/research

# Get JSON output for automation (disables UI automatically)
paperlib convert --json
paperlib convert --paper-id arxiv-2212_06340 --json
```

**Behavior:**
- Processes papers with `conversion_status: pending` (or failed with `--retry-failed`)
- Uses MinerU for PDF to Markdown conversion with CPU pipeline backend
- Shows rich UI with progress bar and live MinerU output (unless `--no-ui`)
- Updates metadata with conversion status
- Creates conversion logs in `logs/` directory
- Post-processes markdown to fix image references (`images/` → `assets/`)
- Handles conversion failures gracefully

**Rich UI Features:**
- Progress bar showing papers converted
- Live streaming of MinerU output
- Current paper being processed
- Color-coded output (errors in red, progress in blue, etc.)

---

### `paperlib reindex`

Rebuild the search index from stored paper metadata.

**Options:**
- `--library PATH`: Specify library directory
- `--json`: Output reindexing results and statistics in JSON format

**Examples:**
```bash
# Rebuild index
paperlib reindex

# Rebuild index for specific library
paperlib reindex --library ~/research

# Get JSON output with statistics
paperlib reindex --json
```

**Behavior:**
- Clears existing SQLite database
- Scans all meta.json files in papers/ directory
- Rebuilds full-text search index
- Reports statistics on completion
- Safe to run anytime (repairs corrupted index)

---

### `paperlib status`

Show library configuration and layout information.

**Options:**
- `--library PATH`: Specify library directory
- `--json`: Output in JSON format

**Examples:**
```bash
# Show current library status
paperlib status

# Show specific library status
paperlib status --library ~/research

# Get JSON output for automation
paperlib status --json
```

**Output:**
```
root: /home/user/papers
config: /home/user/papers/config/config.toml
database: /home/user/papers/db/paperlib.sqlite3
papers: /home/user/papers/papers
inbox: /home/user/papers/inbox
cache: /home/user/papers/cache
```

---

## Future Commands

These commands are planned but not yet implemented:

### `paperlib search QUERY`
Search papers by content and metadata.

### `paperlib summarize [PAPER_ID]`
Generate AI summaries for papers.

### `paperlib export FORMAT`
Export papers in various formats.

### `paperlib doctor`
Diagnose and repair library issues.

---

## Exit Codes

paperlib commands return standard exit codes:

- `0`: Success
- `1`: General error (file not found, invalid arguments, etc.)
- `2`: Command line argument error

## Configuration

paperlib looks for configuration in these locations (in order):
1. `$LIBRARY_ROOT/config/config.toml`
2. `~/.config/paperlib/config.toml`
3. Built-in defaults

## JSON Output Format

When using `--json`, commands output structured data suitable for programmatic consumption. All JSON responses follow a consistent envelope format with standard fields:

### Standard Response Envelope

**Success Response:**
```json
{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  // Command-specific data fields below
}
```

**Error Response:**
```json
{
  "success": false,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "error": "Error message here",
  "error_code": 1
}
```

### Command-Specific JSON Formats

#### `paperlib status --json`
```json
{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "library_root": "/home/user/papers",
  "config_path": "/home/user/papers/config/config.toml",
  "database_path": "/home/user/papers/db/paperlib.sqlite3",
  "papers_dir": "/home/user/papers/papers",
  "inbox_dir": "/home/user/papers/inbox",
  "cache_dir": "/home/user/papers/cache"
}
```

#### `paperlib list --json`
```json
{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "papers": [
    {
      "paper_id": "arxiv-2212_06340",
      "source_type": "arxiv",
      "source_id": "2212.06340",
      "title": "Example Paper",
      "authors": ["Alice Smith", "Bob Jones"],
      "published_date": "2022-12-06T00:00:00.000Z",
      "categories": ["cs.AI"],
      "conversion_status": "success",
      "summary_status": "pending",
      "imported_at": "2024-01-15T10:30:00.000Z",
      "tags": [],
      "notes": ""
    }
  ],
  "total": 1
}
```

#### `paperlib show <paper_id> --json`
```json
{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "paper": {
    "paper_id": "arxiv-2212_06340",
    "source_type": "arxiv",
    "source_id": "2212.06340",
    "title": "Example Paper",
    "authors": ["Alice Smith", "Bob Jones"],
    "conversion_status": "success",
    "summary_status": "pending",
    "pdf_path": "papers/arxiv/2022/arxiv-2212_06340.pdf",
    "paper_md_path": "papers/arxiv/2022/arxiv-2212_06340.md",
    "files_status": {
      "pdf_exists": true,
      "markdown_exists": true,
      "summary_exists": false
    }
  }
}
```

#### `paperlib import --json`
```json
{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "paper_id": "arxiv-2212_06340",
  "title": "Example Paper Title",
  "source_type": "arxiv",
  "source_id": "2212.06340",
  "authors": ["Alice Smith", "Bob Jones"],
  "message": "Successfully imported arXiv paper",
  "paper": {
    // Full paper metadata object
  }
}
```

#### `paperlib convert --json`
```json
{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "action": "convert_pending",
  "success_count": 5,
  "failure_count": 1,
  "total_attempted": 6
}
```

For single paper conversion (`--paper-id`):
```json
{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "paper_id": "arxiv-2212_06340",
  "conversion_success": true,
  "conversion_status": "success",
  "message": "Successfully converted paper"
}
```

#### `paperlib reindex --json`
```json
{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "reindex_complete": true,
  "papers_indexed": 42,
  "errors": 1,
  "statistics": {
    "total_papers": 42,
    "by_source_type": {
      "arxiv": 38,
      "local": 4
    }
  }
}
```

### JSON Data Types

- **Timestamps**: Always in ISO 8601 format (`YYYY-MM-DDTHH:mm:ss.sssZ`)
- **Paper IDs**: String identifiers (e.g., `"arxiv-2212_06340"`, `"local-a1b2c3d4"`)
- **Status Fields**: String enums (`"pending"`, `"success"`, `"failed"`)
- **Authors**: Array of strings
- **Categories/Tags**: Array of strings
- **File Paths**: Relative to library root

This JSON format is stable across paperlib versions for reliable automation and scripting.