update: update the dev-docs for AI agent
This commit is contained in:
@@ -0,0 +1,92 @@
|
||||
# Data Model
|
||||
|
||||
## Library data layout
|
||||
|
||||
The paper library on disk should be human-browsable.
|
||||
|
||||
A typical layout looks like:
|
||||
|
||||
```text
|
||||
library_root/
|
||||
config/
|
||||
config.toml
|
||||
vocab.yaml
|
||||
prompts/
|
||||
summarize_paper.md
|
||||
|
||||
inbox/
|
||||
papers/
|
||||
arxiv/
|
||||
2026/
|
||||
2604.12345/
|
||||
meta.json
|
||||
source.pdf
|
||||
paper.md
|
||||
summary.json
|
||||
summary.md
|
||||
ref.bib
|
||||
assets/
|
||||
logs/
|
||||
mineru.log
|
||||
local/
|
||||
sha256-.../
|
||||
meta.json
|
||||
source.pdf
|
||||
paper.md
|
||||
summary.json
|
||||
summary.md
|
||||
|
||||
db/
|
||||
paperlib.sqlite3
|
||||
|
||||
cache/
|
||||
```
|
||||
|
||||
## Data boundaries
|
||||
|
||||
### `meta.json`
|
||||
|
||||
`meta.json` should contain deterministic or near-deterministic information, mostly from:
|
||||
|
||||
- import process
|
||||
- file system state
|
||||
- external paper metadata sources
|
||||
|
||||
Typical fields include:
|
||||
|
||||
- `paper_id`, `source_type`, `source_id`
|
||||
- `title`, `authors`, `published_date`, `updated_date`, `categories`
|
||||
- `pdf_path`, `paper_md_path`, `summary_json_path`, `summary_md_path`
|
||||
- `imported_at`, `conversion_status`, `summary_status`
|
||||
|
||||
Avoid putting speculative AI content into `meta.json`.
|
||||
|
||||
### `summary.json`
|
||||
|
||||
`summary.json` is optional enrichment and may be regenerated.
|
||||
|
||||
It should contain structured fields such as:
|
||||
|
||||
- one-sentence summary, problem statement, method overview
|
||||
- main results, claimed contributions, assumptions, limitations
|
||||
- problem tags, technique tags, entities
|
||||
- relevance-to-user fields, recommended sections
|
||||
|
||||
`summary.json` must include a schema version.
|
||||
|
||||
### SQLite
|
||||
|
||||
SQLite stores searchable/indexed state and job-independent status.
|
||||
|
||||
It should help with:
|
||||
- listing papers, filtering and search, path lookup, tag lookup, status overview
|
||||
|
||||
But it should never be treated as the only durable source of paper metadata.
|
||||
|
||||
## Key conventions
|
||||
|
||||
- `meta.json` contains stable metadata and processing status
|
||||
- `summary.json` contains structured AI-generated enrichment
|
||||
- `summary.md` is rendered from `summary.json`
|
||||
- `paper.md` is generated from the PDF by an external converter such as MinerU
|
||||
- the database is rebuildable from the files above
|
||||
Reference in New Issue
Block a user