paperlib/dev-docs/data-model.md

# Data Model

## Library data layout

The paper library on disk should be human-browsable.

A typical layout looks like:

```text
library_root/
  config/
    config.toml
    vocab.yaml
    prompts/
      summarize_paper.md

  inbox/
  papers/
    arxiv/
      2026/
        2604.12345/
          meta.json
          source.pdf
          paper.md
          summary.json
          summary.md
          ref.bib
          assets/
          logs/
            mineru.log
    local/
      sha256-.../
        meta.json
        source.pdf
        paper.md
        summary.json
        summary.md

  db/
    paperlib.sqlite3

  cache/
```

## Data boundaries

### `meta.json`

`meta.json` should contain deterministic or near-deterministic information, mostly from:

- import process
- file system state
- external paper metadata sources

Typical fields include:

- `paper_id`, `source_type`, `source_id`
- `title`, `authors`, `published_date`, `updated_date`, `categories`
- `pdf_path`, `paper_md_path`, `summary_json_path`, `summary_md_path`
- `imported_at`, `conversion_status`, `summary_status`

Avoid putting speculative AI content into `meta.json`.

### `summary.json`

`summary.json` is optional enrichment and may be regenerated.

It should contain structured fields such as:

- one-sentence summary, problem statement, method overview
- main results, claimed contributions, assumptions, limitations
- problem tags, technique tags, entities
- relevance-to-user fields, recommended sections

`summary.json` must include a schema version.

### SQLite

SQLite stores searchable/indexed state and job-independent status.

It should help with:
- listing papers, filtering and search, path lookup, tag lookup, status overview

But it should never be treated as the only durable source of paper metadata.

## Key conventions

- `meta.json` contains stable metadata and processing status
- `summary.json` contains structured AI-generated enrichment
- `summary.md` is rendered from `summary.json`
- `paper.md` is generated from the PDF by an external converter such as MinerU
- the database is rebuildable from the files above