← All entries

QMD Explained: A Complete Beginner's Guide

What Is QMD?

QMD (Query Markup Documents) is a search engine that runs entirely on your computer. Unlike Google which searches the internet, QMD searches your local markdown files—your notes, documentation, meeting transcripts, and knowledge bases.

What makes QMD special is that it combines three different ways of searching, each with its own strengths:

┌─────────────────────────────────────────────────────────────┐ │ How QMD Searches │ ├─────────────────────────────────────────────────────────────┤ │ 1. Keyword Search → Find exact words (like Ctrl+F) │ │ 2. Semantic Search → Find similar meanings │ │ 3. AI Re-ranking → Pick the best results │ └─────────────────────────────────────────────────────────────┘

Part 1: The Three Search Methods

1.1 Keyword Search (BM25)

What it is: Traditional search that looks for the exact words you type.

How it works: Imagine you're looking for the word "authentication" in all your files. The search goes through every file, finds the ones containing "authentication", and ranks them by:

  • How many times the word appears
  • How rare the word is (rare words = more relevant)
  • How long the document is (shorter docs with the word = more relevant)

The algorithm is called BM25 (Best Match 25), invented in 1994. It's the same approach Google originally used.

Example:
Query: "authentication"
File A: "User authentication is the first step..." ✓ Contains word
File B: "Login requires authentication credentials..." ✓ Contains word
File C: "The meeting was about budgets..." ✗ No match

Limitation: If you search "login", BM25 won't find documents that only say "authentication"—even though they mean similar things.

1.2 Semantic Search (Vector Embeddings)

What it is: Search that understands meaning, not just words.

The magic concept: Embeddings

An "embedding" converts text into a list of numbers (called a "vector"). Here's the key insight:

Texts with similar meanings get similar numbers.
"login" → [0.23, 0.87, 0.12, 0.45, ...] (768 numbers)
"authentication" → [0.21, 0.89, 0.10, 0.48, ...] (very similar!)
"pizza" → [0.91, 0.02, 0.78, 0.15, ...] (very different)

How it works:

  1. Your query "how do I log in?" becomes a vector → [0.24, 0.85, ...]
  2. Every document chunk was already converted to vectors when indexed
  3. Find the documents whose vectors are "closest" to your query vector
  4. Close vectors = similar meaning = relevant results

What creates these embeddings? A neural network trained on billions of text examples. QMD uses embeddinggemma (300MB model).

1.3 LLM Re-ranking

What it is: An AI model that reads your query AND each candidate document, then judges relevance.

Why it's powerful: While embedding compares vectors (fast but somewhat rough), the re-ranker actually reads the text and thinks about whether it answers your question.

Query: "How do I configure environment variables?"

Candidate 1: "Set DATABASE_URL in your .env file..."
Re-ranker thinks: "This discusses environment variables config" → Score: 0.95

Candidate 2: "The environment was pleasant for the team..."
Re-ranker thinks: "Wrong meaning of 'environment'" → Score: 0.15

QMD uses qwen3-reranker (640MB model) for this.

Part 2: How QMD Combines All Three

Here's the complete flow when you run qmd query "your question":

Step 1: QUERY EXPANSION

Your query: "auth config"

LLM expands it to:
  hyde: "Authentication can be configured by setting AUTH_SECRET..."
  lex: authentication configuration
  lex: auth settings
  vec: how to configure authentication settings
  vec: authentication configuration options

Step 2: PARALLEL SEARCH

  • Original query → BM25 Search → Results List A (weighted 2x)
  • Original query → Vector Search → Results List B
  • Expanded "lex" → BM25 Search → Results List C
  • Expanded "vec" → Vector Search → Results List D

Step 3: FUSION (Combining Results)

Reciprocal Rank Fusion (RRF):

For each document, score = sum of: 1/(60 + rank) for each list

Example: Doc X ranked #1 in List A, #5 in List B, not in C or D
Score = 1/(60+1) + 1/(60+5) = 0.0164 + 0.0154 = 0.0318

Bonus: Documents ranked #1 in any list get +0.05 boost
Take top 30 candidates →

Step 4: RE-RANKING

LLM Re-ranker reads each candidate with your query:

"Is this relevant?" → Gives 0.0 to 1.0 score

Final blend (position-aware):

  • Top 3 RRF results: 75% RRF score + 25% reranker score
  • Rank 4-10: 60% RRF score + 40% reranker score
  • Rank 11+: 40% RRF score + 60% reranker score

Why? Top keyword matches are usually good—don't let reranker ruin them

Step 5: FINAL RESULTS

#1 docs/authentication.md (score: 0.92)
#2 config/env-setup.md (score: 0.87)
#3 guides/security.md (score: 0.73)

Part 3: Key Technologies Explained

3.1 SQLite - The Database

What it is: A database that stores all your indexed content in a single file.

Location: ~/.cache/qmd/index.sqlite

Table What's in it
collections Which folders you're indexing
documents Each markdown file's content and metadata
documents_fts Full-text search index (FTS5)
content_vectors Embedding vectors for each text chunk
llm_cache Cached LLM responses (to avoid re-computing)

3.2 FTS5 - Full-Text Search

What it is: SQLite's built-in search engine for text.

How QMD uses it:

-- Behind the scenes, QMD runs queries like:
SELECT * FROM documents_fts 
WHERE documents_fts MATCH 'authentication OR auth' 
ORDER BY bm25(documents_fts);

FTS5 automatically:

  • Tokenizes text (splits into words)
  • Builds an inverted index (word → document mapping)
  • Calculates BM25 scores

3.3 sqlite-vec - Vector Storage

What it is: An extension that adds vector similarity search to SQLite.

-- Store a vector
INSERT INTO vectors_vec (id, embedding) 
VALUES ('doc1_chunk0', [0.23, 0.87, 0.12, ...]);

-- Find similar vectors (cosine distance)
SELECT id, distance FROM vectors_vec 
WHERE embedding MATCH [0.24, 0.85, 0.11, ...] -- query vector
ORDER BY distance 
LIMIT 10;

3.4 node-llama-cpp - Local AI Models

What it is: A library that runs AI models directly on your computer (no cloud needed).

Model Size Job
embeddinggemma-300M 300 MB Converts text → vectors
qwen3-reranker-0.6b 640 MB Judges document relevance
qmd-query-expansion-1.7B 1.1 GB Expands your queries

Format: GGUF (a compact format for running models on CPUs)

3.5 Bun - The Runtime

What it is: A fast JavaScript/TypeScript runtime (alternative to Node.js).

Why QMD uses it:

  • Faster startup than Node.js
  • Built-in SQLite support (bun:sqlite)
  • Built-in TypeScript support (no build step)

Part 4: Code Structure Explained

4.1 Main Files

src/
├── qmd.ts        # The CLI - parses commands, orchestrates everything
├── store.ts      # Database operations - read/write data
├── llm.ts        # AI model operations - embeddings, reranking
├── mcp.ts        # MCP server - for AI agents like Claude
├── formatter.ts  # Output formatting - JSON, CSV, Markdown, etc.
└── collections.ts # Config file management (~/.config/qmd/index.yml)

Part 5: The Fine-Tuning System

QMD includes a complete system to train the query expansion model.

5.1 Why Fine-Tune?

The base Qwen3 model is smart but doesn't know our specific output format:

hyde: 
lex: 
vec: 

Fine-tuning teaches it this format.

5.2 Two Training Stages

Stage 1: SFT (Supervised Fine-Tuning)

  • "Here are 2,290 examples of correct input→output"
  • Model learns the format by imitation
  • Uses LoRA (efficient training that only modifies ~1% of weights)

Stage 2: GRPO (Reinforcement Learning)

  • Model generates multiple outputs
  • A reward function scores each output
  • Model learns to produce higher-scoring outputs

5.3 The Reward Function

Criterion Points Example
Has lex: lines +10 lex: authentication
Has vec: lines +10 vec: how to log in
Has hyde: line +20 hyde: To configure auth...
Diverse content +30 Different words in each line
Named entities preserved +20 "Kubernetes" stays "Kubernetes"
No <think> blocks +20 Clean output

Hard failures (0 points):

  • Chat template leakage (<|im_start|>)
  • Lines without lex:/vec:/hyde: prefix

Part 6: MCP Server

What is MCP?

Model Context Protocol (MCP) is a standard way for AI assistants (like Claude) to use external tools.

How QMD Uses It

When you run qmd mcp, it starts a server that exposes these tools:

Tool What it does
qmd_search Keyword search
qmd_vsearch Semantic search
qmd_query Full hybrid search
qmd_get Retrieve a document
qmd_multi_get Retrieve multiple documents
qmd_status Check index health

An AI agent can then call these tools to search your documents.

Part 7: Key Concepts Summary

Concept One-Line Explanation
BM25 "Word counting" search algorithm from 1994
Embedding Convert text → numbers (similar meaning = similar numbers)
Vector Search Find similar embeddings using distance math
GGUF Compact format for running AI models on CPU
FTS5 SQLite's built-in full-text search
sqlite-vec SQLite extension for vector similarity
RRF Combine multiple ranked lists fairly
Re-ranking AI reads query + doc and judges relevance
LoRA Low-Rank Adaptation - efficient way to fine-tune models
MCP Standard for AI assistants to use tools

Part 8: Try It Yourself

Step 1: Index Some Documents

# Create a collection from your notes folder
qmd collection add ~/Documents/notes --name notes

# Check what was indexed
qmd status

Step 2: Generate Embeddings

# This downloads the embedding model and creates vectors
qmd embed

# Status should now show 0 documents needing embedding
qmd status

Step 3: Search

# Keyword search only (fastest)
qmd search "meeting notes"

# Vector search only (understands meaning)
qmd vsearch "what did we discuss yesterday"

# Full hybrid search (best quality, slowest)
qmd query "project deadlines"

Step 4: Get a Document

# By path
qmd get "notes/2025/january.md"

# By docid (shown in search results as #abc123)
qmd get "#abc123"

Glossary

Term Definition
Chunk A piece of a document (800 tokens). Long docs are split into chunks.
Collection A folder of documents to index (e.g., ~/notes)
Context Description you add to help search understand what's in a folder
Docid 6-character hash to uniquely identify a document (#abc123)
Embedding Vector representation of text meaning
FTS Full-Text Search
HyDE Hypothetical Document Embedding - generate a fake answer to find similar real ones
Index Data structure that makes searching fast
LoRA Low-Rank Adaptation - efficient way to fine-tune models
RRF Reciprocal Rank Fusion - algorithm to combine multiple ranked lists
Token Word piece (roughly 4 characters in English)
Vector List of numbers representing meaning

Questions?

If you want to dive deeper into any section, let me know and I can explain:

  • The math behind BM25 scoring
  • How neural networks create embeddings
  • The details of LoRA fine-tuning
  • How RRF fusion actually works
  • The MCP protocol specification

Generated by Opus 4.5 · Published February 3, 2026 · For Orphis's understanding of the QMD architecture