QMD Explained: A Complete Beginner's Guide

What Is QMD?

QMD (Query Markup Documents) is a search engine that runs entirely on your computer. Unlike Google which searches the internet, QMD searches your local markdown files—your notes, documentation, meeting transcripts, and knowledge bases.

What makes QMD special is that it combines three different ways of searching, each with its own strengths:

┌─────────────────────────────────────────────────────────────┐ │ How QMD Searches │ ├─────────────────────────────────────────────────────────────┤ │ 1. Keyword Search → Find exact words (like Ctrl+F) │ │ 2. Semantic Search → Find similar meanings │ │ 3. AI Re-ranking → Pick the best results │ └─────────────────────────────────────────────────────────────┘

Part 1: The Three Search Methods

1.1 Keyword Search (BM25)

What it is: Traditional search that looks for the exact words you type.

How it works: Imagine you're looking for the word "authentication" in all your files. The search goes through every file, finds the ones containing "authentication", and ranks them by:

How many times the word appears
How rare the word is (rare words = more relevant)
How long the document is (shorter docs with the word = more relevant)

The algorithm is called BM25 (Best Match 25), invented in 1994. It's the same approach Google originally used.

Example:
Query: "authentication"
File A: "User authentication is the first step..." ✓ Contains word
File B: "Login requires authentication credentials..." ✓ Contains word
File C: "The meeting was about budgets..." ✗ No match

Limitation: If you search "login", BM25 won't find documents that only say "authentication"—even though they mean similar things.

1.2 Semantic Search (Vector Embeddings)

What it is: Search that understands meaning, not just words.

The magic concept: Embeddings

An "embedding" converts text into a list of numbers (called a "vector"). Here's the key insight:

Texts with similar meanings get similar numbers.

"login" → [0.23, 0.87, 0.12, 0.45, ...] (768 numbers)
"authentication" → [0.21, 0.89, 0.10, 0.48, ...] (very similar!)
"pizza" → [0.91, 0.02, 0.78, 0.15, ...] (very different)

How it works:

Your query "how do I log in?" becomes a vector → [0.24, 0.85, ...]
Every document chunk was already converted to vectors when indexed
Find the documents whose vectors are "closest" to your query vector
Close vectors = similar meaning = relevant results

What creates these embeddings? A neural network trained on billions of text examples. QMD uses embeddinggemma (300MB model).

1.3 LLM Re-ranking

What it is: An AI model that reads your query AND each candidate document, then judges relevance.

Why it's powerful: While embedding compares vectors (fast but somewhat rough), the re-ranker actually reads the text and thinks about whether it answers your question.

Query: "How do I configure environment variables?"

Candidate 1: "Set DATABASE_URL in your .env file..."
Re-ranker thinks: "This discusses environment variables config" → Score: 0.95

Candidate 2: "The environment was pleasant for the team..."
Re-ranker thinks: "Wrong meaning of 'environment'" → Score: 0.15

QMD uses qwen3-reranker (640MB model) for this.

Part 2: How QMD Combines All Three

Here's the complete flow when you run qmd query "your question":

Step 1: QUERY EXPANSION

Your query: "auth config"

LLM expands it to:
  hyde: "Authentication can be configured by setting AUTH_SECRET..."
  lex: authentication configuration
  lex: auth settings
  vec: how to configure authentication settings
  vec: authentication configuration options

Step 2: PARALLEL SEARCH

Original query → BM25 Search → Results List A (weighted 2x)
Original query → Vector Search → Results List B
Expanded "lex" → BM25 Search → Results List C
Expanded "vec" → Vector Search → Results List D

Step 3: FUSION (Combining Results)

Reciprocal Rank Fusion (RRF):

For each document, score = sum of: 1/(60 + rank) for each list

Example: Doc X ranked #1 in List A, #5 in List B, not in C or D
Score = 1/(60+1) + 1/(60+5) = 0.0164 + 0.0154 = 0.0318

Bonus: Documents ranked #1 in any list get +0.05 boost
Take top 30 candidates →

Step 4: RE-RANKING

LLM Re-ranker reads each candidate with your query:

"Is this relevant?" → Gives 0.0 to 1.0 score

Final blend (position-aware):

Top 3 RRF results: 75% RRF score + 25% reranker score
Rank 4-10: 60% RRF score + 40% reranker score
Rank 11+: 40% RRF score + 60% reranker score

Why? Top keyword matches are usually good—don't let reranker ruin them

Step 5: FINAL RESULTS

#1 docs/authentication.md (score: 0.92)
#2 config/env-setup.md (score: 0.87)
#3 guides/security.md (score: 0.73)

Part 3: Key Technologies Explained

3.1 SQLite - The Database

What it is: A database that stores all your indexed content in a single file.

Location: ~/.cache/qmd/index.sqlite

Table	What's in it
`collections`	Which folders you're indexing
`documents`	Each markdown file's content and metadata
`documents_fts`	Full-text search index (FTS5)
`content_vectors`	Embedding vectors for each text chunk
`llm_cache`	Cached LLM responses (to avoid re-computing)

3.2 FTS5 - Full-Text Search

What it is: SQLite's built-in search engine for text.

How QMD uses it:

-- Behind the scenes, QMD runs queries like:
SELECT * FROM documents_fts 
WHERE documents_fts MATCH 'authentication OR auth' 
ORDER BY bm25(documents_fts);

FTS5 automatically:

Tokenizes text (splits into words)
Builds an inverted index (word → document mapping)
Calculates BM25 scores

3.3 sqlite-vec - Vector Storage

What it is: An extension that adds vector similarity search to SQLite.

-- Store a vector
INSERT INTO vectors_vec (id, embedding) 
VALUES ('doc1_chunk0', [0.23, 0.87, 0.12, ...]);

-- Find similar vectors (cosine distance)
SELECT id, distance FROM vectors_vec 
WHERE embedding MATCH [0.24, 0.85, 0.11, ...] -- query vector
ORDER BY distance 
LIMIT 10;

3.4 node-llama-cpp - Local AI Models

What it is: A library that runs AI models directly on your computer (no cloud needed).

Model	Size	Job
`embeddinggemma-300M`	300 MB	Converts text → vectors
`qwen3-reranker-0.6b`	640 MB	Judges document relevance
`qmd-query-expansion-1.7B`	1.1 GB	Expands your queries

Format: GGUF (a compact format for running models on CPUs)

3.5 Bun - The Runtime

What it is: A fast JavaScript/TypeScript runtime (alternative to Node.js).

Why QMD uses it:

Faster startup than Node.js
Built-in SQLite support (bun:sqlite)
Built-in TypeScript support (no build step)

Part 4: Code Structure Explained

4.1 Main Files

src/
├── qmd.ts        # The CLI - parses commands, orchestrates everything
├── store.ts      # Database operations - read/write data
├── llm.ts        # AI model operations - embeddings, reranking
├── mcp.ts        # MCP server - for AI agents like Claude
├── formatter.ts  # Output formatting - JSON, CSV, Markdown, etc.
└── collections.ts # Config file management (~/.config/qmd/index.yml)

Part 5: The Fine-Tuning System

QMD includes a complete system to train the query expansion model.

5.1 Why Fine-Tune?

The base Qwen3 model is smart but doesn't know our specific output format:

hyde: 
lex: 
vec:

Fine-tuning teaches it this format.

5.2 Two Training Stages

Stage 1: SFT (Supervised Fine-Tuning)

"Here are 2,290 examples of correct input→output"
Model learns the format by imitation
Uses LoRA (efficient training that only modifies ~1% of weights)

Stage 2: GRPO (Reinforcement Learning)

Model generates multiple outputs
A reward function scores each output
Model learns to produce higher-scoring outputs

5.3 The Reward Function

Criterion	Points	Example
Has lex: lines	+10	`lex: authentication`
Has vec: lines	+10	`vec: how to log in`
Has hyde: line	+20	`hyde: To configure auth...`
Diverse content	+30	Different words in each line
Named entities preserved	+20	"Kubernetes" stays "Kubernetes"
No <think> blocks	+20	Clean output

Hard failures (0 points):

Chat template leakage (<|im_start|>)
Lines without lex:/vec:/hyde: prefix

Part 6: MCP Server

What is MCP?

Model Context Protocol (MCP) is a standard way for AI assistants (like Claude) to use external tools.

How QMD Uses It

When you run qmd mcp, it starts a server that exposes these tools:

Tool	What it does
`qmd_search`	Keyword search
`qmd_vsearch`	Semantic search
`qmd_query`	Full hybrid search
`qmd_get`	Retrieve a document
`qmd_multi_get`	Retrieve multiple documents
`qmd_status`	Check index health

An AI agent can then call these tools to search your documents.

Part 7: Key Concepts Summary

Concept	One-Line Explanation
BM25	"Word counting" search algorithm from 1994
Embedding	Convert text → numbers (similar meaning = similar numbers)
Vector Search	Find similar embeddings using distance math
GGUF	Compact format for running AI models on CPU
FTS5	SQLite's built-in full-text search
sqlite-vec	SQLite extension for vector similarity
RRF	Combine multiple ranked lists fairly
Re-ranking	AI reads query + doc and judges relevance
LoRA	Low-Rank Adaptation - efficient way to fine-tune models
MCP	Standard for AI assistants to use tools

Part 8: Try It Yourself

Step 1: Index Some Documents

# Create a collection from your notes folder
qmd collection add ~/Documents/notes --name notes

# Check what was indexed
qmd status

Step 2: Generate Embeddings

# This downloads the embedding model and creates vectors
qmd embed

# Status should now show 0 documents needing embedding
qmd status

Step 3: Search

# Keyword search only (fastest)
qmd search "meeting notes"

# Vector search only (understands meaning)
qmd vsearch "what did we discuss yesterday"

# Full hybrid search (best quality, slowest)
qmd query "project deadlines"

Step 4: Get a Document

# By path
qmd get "notes/2025/january.md"

# By docid (shown in search results as #abc123)
qmd get "#abc123"

Glossary

Term	Definition
Chunk	A piece of a document (800 tokens). Long docs are split into chunks.
Collection	A folder of documents to index (e.g., ~/notes)
Context	Description you add to help search understand what's in a folder
Docid	6-character hash to uniquely identify a document (#abc123)
Embedding	Vector representation of text meaning
FTS	Full-Text Search
HyDE	Hypothetical Document Embedding - generate a fake answer to find similar real ones
Index	Data structure that makes searching fast
LoRA	Low-Rank Adaptation - efficient way to fine-tune models
RRF	Reciprocal Rank Fusion - algorithm to combine multiple ranked lists
Token	Word piece (roughly 4 characters in English)
Vector	List of numbers representing meaning

Questions?

If you want to dive deeper into any section, let me know and I can explain:

The math behind BM25 scoring
How neural networks create embeddings
The details of LoRA fine-tuning
How RRF fusion actually works
The MCP protocol specification

Generated by Opus 4.5 · Published February 3, 2026 · For Orphis's understanding of the QMD architecture