2.1 KiB
2.1 KiB
Data Model
Library data layout
The paper library on disk should be human-browsable.
A typical layout looks like:
library_root/
config/
config.toml
vocab.yaml
prompts/
summarize_paper.md
inbox/
papers/
arxiv/
2026/
2604.12345/
meta.json
source.pdf
paper.md
summary.json
summary.md
ref.bib
assets/
logs/
mineru.log
local/
sha256-.../
meta.json
source.pdf
paper.md
summary.json
summary.md
db/
paperlib.sqlite3
cache/
Data boundaries
meta.json
meta.json should contain deterministic or near-deterministic information, mostly from:
- import process
- file system state
- external paper metadata sources
Typical fields include:
paper_id,source_type,source_idtitle,authors,published_date,updated_date,categoriespdf_path,paper_md_path,summary_json_path,summary_md_pathimported_at,conversion_status,summary_status
Avoid putting speculative AI content into meta.json.
summary.json
summary.json is optional enrichment and may be regenerated.
It should contain structured fields such as:
- one-sentence summary, problem statement, method overview
- main results, claimed contributions, assumptions, limitations
- problem tags, technique tags, entities
- relevance-to-user fields, recommended sections
summary.json must include a schema version.
SQLite
SQLite stores searchable/indexed state and job-independent status.
It should help with:
- listing papers, filtering and search, path lookup, tag lookup, status overview
But it should never be treated as the only durable source of paper metadata.
Key conventions
meta.jsoncontains stable metadata and processing statussummary.jsoncontains structured AI-generated enrichmentsummary.mdis rendered fromsummary.jsonpaper.mdis generated from the PDF by an external converter such as MinerU- the database is rebuildable from the files above