# Data Model ## Library data layout The paper library on disk should be human-browsable. A typical layout looks like: ```text library_root/ config/ config.toml vocab.yaml prompts/ summarize_paper.md inbox/ papers/ arxiv/ 2026/ 2604.12345/ meta.json source.pdf paper.md summary.json summary.md ref.bib assets/ logs/ mineru.log local/ sha256-.../ meta.json source.pdf paper.md summary.json summary.md db/ paperlib.sqlite3 cache/ ``` ## Data boundaries ### `meta.json` `meta.json` should contain deterministic or near-deterministic information, mostly from: - import process - file system state - external paper metadata sources Typical fields include: - `paper_id`, `source_type`, `source_id` - `title`, `authors`, `published_date`, `updated_date`, `categories` - `pdf_path`, `paper_md_path`, `summary_json_path`, `summary_md_path` - `imported_at`, `conversion_status`, `summary_status` Avoid putting speculative AI content into `meta.json`. ### `summary.json` `summary.json` is optional enrichment and may be regenerated. It should contain structured fields such as: - one-sentence summary, problem statement, method overview - main results, claimed contributions, assumptions, limitations - problem tags, technique tags, entities - relevance-to-user fields, recommended sections `summary.json` must include a schema version. ### SQLite SQLite stores searchable/indexed state and job-independent status. It should help with: - listing papers, filtering and search, path lookup, tag lookup, status overview But it should never be treated as the only durable source of paper metadata. ## Key conventions - `meta.json` contains stable metadata and processing status - `summary.json` contains structured AI-generated enrichment - `summary.md` is rendered from `summary.json` - `paper.md` is generated from the PDF by an external converter such as MinerU - the database is rebuildable from the files above