wyj/paperlib

Fork 0

Files

T

wyj 76580fc4a2 doc: doc the --json option

2026-04-17 20:04:32 -04:00

10 KiB

Raw Permalink Blame History

CLI Reference

This document describes all available commands in the paperlib CLI.

Global Options

All commands support these global options:

--help, -h: Show help message
--version: Show version information

Many commands also support:

--library, -L: Specify library root directory (default: current directory)
--json: Output machine-readable JSON instead of human-readable format

Commands

`paperlib init [PATH]`

Initialize a paper library directory structure.

Arguments:

PATH: Directory to initialize (default: current directory)

Examples:

# Initialize library in current directory
paperlib init

# Initialize library in specific directory
paperlib init /path/to/my/papers

# Initialize and create parent directories
paperlib init ~/Documents/research/papers

Behavior:

Creates standard directory structure (config/, papers/, db/, etc.)
Safe to run multiple times (idempotent)
Creates parent directories if they don't exist

`paperlib import`

Import papers into the library from various sources.

Required (one of):

--pdf PATH: Import a local PDF file
--arxiv ID: Import paper from arXiv by ID or URL

Options:

--title TEXT: Override paper title (for local PDFs)
--notes TEXT: Add notes about the paper
--tags TAG1 TAG2: Add tags to the paper
--library PATH: Specify library directory
--json: Output import results in JSON format for automation

Examples:

# Import local PDF
paperlib import --pdf paper.pdf --title "My Research" --tags ml ai

# Import from arXiv
paperlib import --arxiv 2212.06340

# Import with arXiv URL
paperlib import --arxiv https://arxiv.org/abs/2212.06340

# Import to specific library
paperlib import --pdf paper.pdf --library ~/research

# Import with JSON output for automation
paperlib import --arxiv 2212.06340 --json

Behavior:

Generates stable paper ID based on content (local) or arXiv ID
Copies PDF to structured storage location
Creates meta.json with paper metadata
Prevents duplicate imports (same content/ID)
Indexes paper in search database

`paperlib list`

List all papers in the library with their current status.

Options:

--library PATH: Specify library directory
--json: Output in JSON format

Examples:

# List all papers
paperlib list

# List papers in specific library
paperlib list --library ~/research

# Get machine-readable output
paperlib list --json

Output Format:

Found 3 papers:

📄 arxiv-2212_06340
   The new discontinuous Galerkin methods based numerical relativity program Nmesh
   By: Wolfgang Tichy, Liwei Ji, Ananya Adhikari (+2 more)
   Categories: gr-qc

⏳ local-a1b2c3d4e5f6
   Machine Learning Applications in Physics
   Categories: cs.AI, physics.comp-ph

Status Indicators:

⏳ Paper imported, conversion pending
📄 PDF converted to Markdown
📝 AI summary generated
❌ Conversion or processing failed

`paperlib show PAPER_ID`

Show detailed information about a specific paper.

Arguments:

PAPER_ID: The unique paper identifier

Options:

--library PATH: Specify library directory
--json: Output in JSON format

Examples:

# Show paper details
paperlib show arxiv-2212_06340

# Show with JSON output
paperlib show local-a1b2c3d4 --json

Output includes:

All metadata fields
Processing status
File locations and existence
Import timestamp
Tags and notes

`paperlib convert`

Convert papers from PDF to Markdown using MinerU.

Options:

--library PATH: Specify library directory
--paper-id ID: Convert specific paper only
--retry-failed: Retry papers with failed conversion status
--force: Force reconvert all papers (including successful ones)
--no-ui: Disable rich UI display (useful for scripting)
--json: Output conversion results in JSON format (automatically disables UI)

Examples:

# Convert all pending papers (with rich UI)
paperlib convert

# Retry failed conversions
paperlib convert --retry-failed

# Force reconvert all papers
paperlib convert --force

# Convert specific paper
paperlib convert --paper-id arxiv-2212_06340

# Convert without UI (for scripts)
paperlib convert --no-ui

# Convert in specific library
paperlib convert --library ~/research

# Get JSON output for automation (disables UI automatically)
paperlib convert --json
paperlib convert --paper-id arxiv-2212_06340 --json

Behavior:

Processes papers with conversion_status: pending (or failed with --retry-failed)
Uses MinerU for PDF to Markdown conversion with CPU pipeline backend
Shows rich UI with progress bar and live MinerU output (unless --no-ui)
Updates metadata with conversion status
Creates conversion logs in logs/ directory
Post-processes markdown to fix image references (images/ → assets/)
Handles conversion failures gracefully

Rich UI Features:

Progress bar showing papers converted
Live streaming of MinerU output
Current paper being processed
Color-coded output (errors in red, progress in blue, etc.)

`paperlib reindex`

Rebuild the search index from stored paper metadata.

Options:

--library PATH: Specify library directory
--json: Output reindexing results and statistics in JSON format

Examples:

# Rebuild index
paperlib reindex

# Rebuild index for specific library
paperlib reindex --library ~/research

# Get JSON output with statistics
paperlib reindex --json

Behavior:

Clears existing SQLite database
Scans all meta.json files in papers/ directory
Rebuilds full-text search index
Reports statistics on completion
Safe to run anytime (repairs corrupted index)

`paperlib status`

Show library configuration and layout information.

Options:

--library PATH: Specify library directory
--json: Output in JSON format

Examples:

# Show current library status
paperlib status

# Show specific library status
paperlib status --library ~/research

# Get JSON output for automation
paperlib status --json

Output:

root: /home/user/papers
config: /home/user/papers/config/config.toml
database: /home/user/papers/db/paperlib.sqlite3
papers: /home/user/papers/papers
inbox: /home/user/papers/inbox
cache: /home/user/papers/cache

Future Commands

These commands are planned but not yet implemented:

`paperlib search QUERY`

Search papers by content and metadata.

`paperlib summarize [PAPER_ID]`

Generate AI summaries for papers.

`paperlib export FORMAT`

Export papers in various formats.

`paperlib doctor`

Diagnose and repair library issues.

Exit Codes

paperlib commands return standard exit codes:

0: Success
1: General error (file not found, invalid arguments, etc.)
2: Command line argument error

Configuration

paperlib looks for configuration in these locations (in order):

$LIBRARY_ROOT/config/config.toml
~/.config/paperlib/config.toml
Built-in defaults

JSON Output Format

When using --json, commands output structured data suitable for programmatic consumption. All JSON responses follow a consistent envelope format with standard fields:

Standard Response Envelope

Success Response:

{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  // Command-specific data fields below
}

Error Response:

{
  "success": false,
  "timestamp": "2024-01-15T10:30:00.000Z", 
  "error": "Error message here",
  "error_code": 1
}

Command-Specific JSON Formats

`paperlib status --json`

{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "library_root": "/home/user/papers",
  "config_path": "/home/user/papers/config/config.toml",
  "database_path": "/home/user/papers/db/paperlib.sqlite3",
  "papers_dir": "/home/user/papers/papers",
  "inbox_dir": "/home/user/papers/inbox",
  "cache_dir": "/home/user/papers/cache"
}

`paperlib list --json`

{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "papers": [
    {
      "paper_id": "arxiv-2212_06340",
      "source_type": "arxiv",
      "source_id": "2212.06340", 
      "title": "Example Paper",
      "authors": ["Alice Smith", "Bob Jones"],
      "published_date": "2022-12-06T00:00:00.000Z",
      "categories": ["cs.AI"],
      "conversion_status": "success",
      "summary_status": "pending",
      "imported_at": "2024-01-15T10:30:00.000Z",
      "tags": [],
      "notes": ""
    }
  ],
  "total": 1
}

`paperlib show <paper_id> --json`

{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "paper": {
    "paper_id": "arxiv-2212_06340",
    "source_type": "arxiv",
    "source_id": "2212.06340",
    "title": "Example Paper",
    "authors": ["Alice Smith", "Bob Jones"],
    "conversion_status": "success",
    "summary_status": "pending",
    "pdf_path": "papers/arxiv/2022/arxiv-2212_06340.pdf",
    "paper_md_path": "papers/arxiv/2022/arxiv-2212_06340.md",
    "files_status": {
      "pdf_exists": true,
      "markdown_exists": true,
      "summary_exists": false
    }
  }
}

`paperlib import --json`

{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "paper_id": "arxiv-2212_06340",
  "title": "Example Paper Title",
  "source_type": "arxiv",
  "source_id": "2212.06340",
  "authors": ["Alice Smith", "Bob Jones"],
  "message": "Successfully imported arXiv paper",
  "paper": {
    // Full paper metadata object
  }
}

`paperlib convert --json`

{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "action": "convert_pending",
  "success_count": 5,
  "failure_count": 1,
  "total_attempted": 6
}

For single paper conversion (--paper-id):

{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "paper_id": "arxiv-2212_06340",
  "conversion_success": true,
  "conversion_status": "success",
  "message": "Successfully converted paper"
}

`paperlib reindex --json`

{
  "success": true,
  "timestamp": "2024-01-15T10:30:00.000Z",
  "reindex_complete": true,
  "papers_indexed": 42,
  "errors": 1,
  "statistics": {
    "total_papers": 42,
    "by_source_type": {
      "arxiv": 38,
      "local": 4
    }
  }
}

JSON Data Types

Timestamps: Always in ISO 8601 format (YYYY-MM-DDTHH:mm:ss.sssZ)
Paper IDs: String identifiers (e.g., "arxiv-2212_06340", "local-a1b2c3d4")
Status Fields: String enums ("pending", "success", "failed")
Authors: Array of strings
Categories/Tags: Array of strings
File Paths: Relative to library root

This JSON format is stable across paperlib versions for reliable automation and scripting.

10 KiB Raw Permalink Blame History

CLI Reference

Global Options

Commands

paperlib init [PATH]

paperlib import

paperlib list

paperlib show PAPER_ID

paperlib convert

paperlib reindex

paperlib status

Future Commands

paperlib search QUERY

paperlib summarize [PAPER_ID]

paperlib export FORMAT

paperlib doctor

Exit Codes

Configuration

JSON Output Format

Standard Response Envelope

Command-Specific JSON Formats

paperlib status --json

paperlib list --json

paperlib show <paper_id> --json

paperlib import --json

paperlib convert --json

paperlib reindex --json

JSON Data Types

10 KiB

Raw Permalink Blame History

`paperlib init [PATH]`

`paperlib import`

`paperlib list`

`paperlib show PAPER_ID`

`paperlib convert`

`paperlib reindex`

`paperlib status`

`paperlib search QUERY`

`paperlib summarize [PAPER_ID]`

`paperlib export FORMAT`

`paperlib doctor`

`paperlib status --json`

`paperlib list --json`

`paperlib show <paper_id> --json`

`paperlib import --json`

`paperlib convert --json`

`paperlib reindex --json`