What Is QMD?
QMD (Query Markup Documents) is a search engine that runs entirely on your computer. Unlike Google which searches the internet, QMD searches your local markdown files—your notes, documentation, meeting transcripts, and knowledge bases.
What makes QMD special is that it combines three different ways of searching, each with its own strengths:
Part 1: The Three Search Methods
1.1 Keyword Search (BM25)
What it is: Traditional search that looks for the exact words you type.
How it works: Imagine you're looking for the word "authentication" in all your files. The search goes through every file, finds the ones containing "authentication", and ranks them by:
- How many times the word appears
- How rare the word is (rare words = more relevant)
- How long the document is (shorter docs with the word = more relevant)
The algorithm is called BM25 (Best Match 25), invented in 1994. It's the same approach Google originally used.
Example:
Query: "authentication"
File A: "User authentication is the first step..." ✓ Contains word
File B: "Login requires authentication credentials..." ✓ Contains word
File C: "The meeting was about budgets..." ✗ No match
Limitation: If you search "login", BM25 won't find documents that only say "authentication"—even though they mean similar things.
1.2 Semantic Search (Vector Embeddings)
What it is: Search that understands meaning, not just words.
The magic concept: Embeddings
An "embedding" converts text into a list of numbers (called a "vector"). Here's the key insight:
Texts with similar meanings get similar numbers.
"login" → [0.23, 0.87, 0.12, 0.45, ...] (768 numbers)
"authentication" → [0.21, 0.89, 0.10, 0.48, ...] (very similar!)
"pizza" → [0.91, 0.02, 0.78, 0.15, ...] (very different)
How it works:
- Your query "how do I log in?" becomes a vector →
[0.24, 0.85, ...] - Every document chunk was already converted to vectors when indexed
- Find the documents whose vectors are "closest" to your query vector
- Close vectors = similar meaning = relevant results
What creates these embeddings? A neural network trained on billions of text examples. QMD uses embeddinggemma (300MB model).
1.3 LLM Re-ranking
What it is: An AI model that reads your query AND each candidate document, then judges relevance.
Why it's powerful: While embedding compares vectors (fast but somewhat rough), the re-ranker actually reads the text and thinks about whether it answers your question.
Query: "How do I configure environment variables?"
Candidate 1: "Set DATABASE_URL in your .env file..."
Re-ranker thinks: "This discusses environment variables config" → Score: 0.95
Candidate 2: "The environment was pleasant for the team..."
Re-ranker thinks: "Wrong meaning of 'environment'" → Score: 0.15
QMD uses qwen3-reranker (640MB model) for this.
Part 2: How QMD Combines All Three
Here's the complete flow when you run qmd query "your question":
Step 1: QUERY EXPANSION
Your query: "auth config"
LLM expands it to:
hyde: "Authentication can be configured by setting AUTH_SECRET..."
lex: authentication configuration
lex: auth settings
vec: how to configure authentication settings
vec: authentication configuration options
Step 2: PARALLEL SEARCH
- Original query → BM25 Search → Results List A (weighted 2x)
- Original query → Vector Search → Results List B
- Expanded "lex" → BM25 Search → Results List C
- Expanded "vec" → Vector Search → Results List D
Step 3: FUSION (Combining Results)
Reciprocal Rank Fusion (RRF):
For each document, score = sum of: 1/(60 + rank) for each list
Example: Doc X ranked #1 in List A, #5 in List B, not in C or D
Score = 1/(60+1) + 1/(60+5) = 0.0164 + 0.0154 = 0.0318
Bonus: Documents ranked #1 in any list get +0.05 boost
Take top 30 candidates →
Step 4: RE-RANKING
LLM Re-ranker reads each candidate with your query:
"Is this relevant?" → Gives 0.0 to 1.0 score
Final blend (position-aware):
- Top 3 RRF results: 75% RRF score + 25% reranker score
- Rank 4-10: 60% RRF score + 40% reranker score
- Rank 11+: 40% RRF score + 60% reranker score
Why? Top keyword matches are usually good—don't let reranker ruin them
Step 5: FINAL RESULTS
#1 docs/authentication.md (score: 0.92)
#2 config/env-setup.md (score: 0.87)
#3 guides/security.md (score: 0.73)
Part 3: Key Technologies Explained
3.1 SQLite - The Database
What it is: A database that stores all your indexed content in a single file.
Location: ~/.cache/qmd/index.sqlite
| Table | What's in it |
|---|---|
collections |
Which folders you're indexing |
documents |
Each markdown file's content and metadata |
documents_fts |
Full-text search index (FTS5) |
content_vectors |
Embedding vectors for each text chunk |
llm_cache |
Cached LLM responses (to avoid re-computing) |
3.2 FTS5 - Full-Text Search
What it is: SQLite's built-in search engine for text.
How QMD uses it:
-- Behind the scenes, QMD runs queries like:
SELECT * FROM documents_fts
WHERE documents_fts MATCH 'authentication OR auth'
ORDER BY bm25(documents_fts);
FTS5 automatically:
- Tokenizes text (splits into words)
- Builds an inverted index (word → document mapping)
- Calculates BM25 scores
3.3 sqlite-vec - Vector Storage
What it is: An extension that adds vector similarity search to SQLite.
-- Store a vector
INSERT INTO vectors_vec (id, embedding)
VALUES ('doc1_chunk0', [0.23, 0.87, 0.12, ...]);
-- Find similar vectors (cosine distance)
SELECT id, distance FROM vectors_vec
WHERE embedding MATCH [0.24, 0.85, 0.11, ...] -- query vector
ORDER BY distance
LIMIT 10;
3.4 node-llama-cpp - Local AI Models
What it is: A library that runs AI models directly on your computer (no cloud needed).
| Model | Size | Job |
|---|---|---|
embeddinggemma-300M |
300 MB | Converts text → vectors |
qwen3-reranker-0.6b |
640 MB | Judges document relevance |
qmd-query-expansion-1.7B |
1.1 GB | Expands your queries |
Format: GGUF (a compact format for running models on CPUs)
3.5 Bun - The Runtime
What it is: A fast JavaScript/TypeScript runtime (alternative to Node.js).
Why QMD uses it:
- Faster startup than Node.js
- Built-in SQLite support (
bun:sqlite) - Built-in TypeScript support (no build step)
Part 4: Code Structure Explained
4.1 Main Files
src/
├── qmd.ts # The CLI - parses commands, orchestrates everything
├── store.ts # Database operations - read/write data
├── llm.ts # AI model operations - embeddings, reranking
├── mcp.ts # MCP server - for AI agents like Claude
├── formatter.ts # Output formatting - JSON, CSV, Markdown, etc.
└── collections.ts # Config file management (~/.config/qmd/index.yml)
Part 5: The Fine-Tuning System
QMD includes a complete system to train the query expansion model.
5.1 Why Fine-Tune?
The base Qwen3 model is smart but doesn't know our specific output format:
hyde:
lex:
vec:
Fine-tuning teaches it this format.
5.2 Two Training Stages
Stage 1: SFT (Supervised Fine-Tuning)
- "Here are 2,290 examples of correct input→output"
- Model learns the format by imitation
- Uses LoRA (efficient training that only modifies ~1% of weights)
Stage 2: GRPO (Reinforcement Learning)
- Model generates multiple outputs
- A reward function scores each output
- Model learns to produce higher-scoring outputs
5.3 The Reward Function
| Criterion | Points | Example |
|---|---|---|
| Has lex: lines | +10 | lex: authentication |
| Has vec: lines | +10 | vec: how to log in |
| Has hyde: line | +20 | hyde: To configure auth... |
| Diverse content | +30 | Different words in each line |
| Named entities preserved | +20 | "Kubernetes" stays "Kubernetes" |
| No <think> blocks | +20 | Clean output |
Hard failures (0 points):
- Chat template leakage (
<|im_start|>) - Lines without lex:/vec:/hyde: prefix
Part 6: MCP Server
What is MCP?
Model Context Protocol (MCP) is a standard way for AI assistants (like Claude) to use external tools.
How QMD Uses It
When you run qmd mcp, it starts a server that exposes these tools:
| Tool | What it does |
|---|---|
qmd_search |
Keyword search |
qmd_vsearch |
Semantic search |
qmd_query |
Full hybrid search |
qmd_get |
Retrieve a document |
qmd_multi_get |
Retrieve multiple documents |
qmd_status |
Check index health |
An AI agent can then call these tools to search your documents.
Part 7: Key Concepts Summary
| Concept | One-Line Explanation |
|---|---|
| BM25 | "Word counting" search algorithm from 1994 |
| Embedding | Convert text → numbers (similar meaning = similar numbers) |
| Vector Search | Find similar embeddings using distance math |
| GGUF | Compact format for running AI models on CPU |
| FTS5 | SQLite's built-in full-text search |
| sqlite-vec | SQLite extension for vector similarity |
| RRF | Combine multiple ranked lists fairly |
| Re-ranking | AI reads query + doc and judges relevance |
| LoRA | Low-Rank Adaptation - efficient way to fine-tune models |
| MCP | Standard for AI assistants to use tools |
Part 8: Try It Yourself
Step 1: Index Some Documents
# Create a collection from your notes folder
qmd collection add ~/Documents/notes --name notes
# Check what was indexed
qmd status
Step 2: Generate Embeddings
# This downloads the embedding model and creates vectors
qmd embed
# Status should now show 0 documents needing embedding
qmd status
Step 3: Search
# Keyword search only (fastest)
qmd search "meeting notes"
# Vector search only (understands meaning)
qmd vsearch "what did we discuss yesterday"
# Full hybrid search (best quality, slowest)
qmd query "project deadlines"
Step 4: Get a Document
# By path
qmd get "notes/2025/january.md"
# By docid (shown in search results as #abc123)
qmd get "#abc123"
Glossary
| Term | Definition |
|---|---|
| Chunk | A piece of a document (800 tokens). Long docs are split into chunks. |
| Collection | A folder of documents to index (e.g., ~/notes) |
| Context | Description you add to help search understand what's in a folder |
| Docid | 6-character hash to uniquely identify a document (#abc123) |
| Embedding | Vector representation of text meaning |
| FTS | Full-Text Search |
| HyDE | Hypothetical Document Embedding - generate a fake answer to find similar real ones |
| Index | Data structure that makes searching fast |
| LoRA | Low-Rank Adaptation - efficient way to fine-tune models |
| RRF | Reciprocal Rank Fusion - algorithm to combine multiple ranked lists |
| Token | Word piece (roughly 4 characters in English) |
| Vector | List of numbers representing meaning |
Questions?
If you want to dive deeper into any section, let me know and I can explain:
- The math behind BM25 scoring
- How neural networks create embeddings
- The details of LoRA fine-tuning
- How RRF fusion actually works
- The MCP protocol specification
Generated by Opus 4.5 · Published February 3, 2026 · For Orphis's understanding of the QMD architecture