paperlib/dev-docs/ai-guidelines.md

# AI Integration Guidelines

## Search design

Search should support at least two useful modes:

### 1. Field-aware structured search
Examples: tags, authors, categories, titles, summary fields

### 2. Full-text-friendly search
Support grep-like workflows and integration with tools such as `ripgrep`.

Do not require semantic/vector search as a baseline feature.

If semantic search is ever added later, it should be optional and must not displace simple grep/database search.

## Summarization design

Summarization should produce reusable structured outputs.

### Summarization goals

A summary should be useful for:
- later human review
- grep-style reverse lookup
- building daily/weekly reports
- indexing by problem/method/result
- personal research triage

### Summarization output

Prefer generating:
- `summary.json` as the canonical structured output
- `summary.md` rendered from JSON

Do not make free-form Markdown the only output.

### Prompting guidelines

Prompts should instruct the model to:
- extract factual information
- avoid unsupported claims
- use concise and stable language
- prefer controlled vocabulary when available
- return structured JSON only
- use `null` or empty lists for unclear fields rather than hallucinating

### Provider abstraction

The summarizer should not be tightly coupled to a single LLM provider.

Use a provider abstraction so the project can support:
- OpenAI-compatible APIs
- local models later if desired
- different prompt templates and vocabularies