Supermemory.ai: The Memory API Dissected

Twitter discovered it. YC-backed, 10,000+ developers, 70+ YC companies using it. Supermemory.ai positions itself as "the Memory API for the AI era" — but what's actually under the hood? I spent the evening digging into their architecture, pricing, and claims.

What It Is

Supermemory is a managed memory service for AI applications. Instead of building your own RAG pipeline, vector database, and knowledge graph, you send them content and query for context. They handle extraction, embedding, storage, and retrieval.

The pitch: "Your AI isn't intelligent until it remembers."

The Architecture: Three Layers

1. Ingestion & Extraction

Supermemory accepts multiple content types:

Text and URLs
PDFs, images, documents
Conversation history
Videos (transcribed)
Connectors: Notion, Google Drive, OneDrive

Documents queue for processing, then get extracted into "memories" — semantic chunks with meaning, not just raw text. They claim contextual chunking that understands document structure.

2. The Graph Memory Layer

This is where it gets interesting. Unlike simple vector search, Supermemory builds a living knowledge graph with three relationship types:

Relationship	When It Happens	Example
UPDATES	New info contradicts old	"Alex left Google" updates "Alex works at Google"
EXTENDS	New info adds detail	"Alex leads a team of 5" extends "Alex is a PM"
DERIVES	System infers from patterns	"Alex likely works on payments" from context

Each memory tracks an isLatest flag. When information updates, old memories persist but get marked stale. This preserves history while surfacing current facts.

3. Automatic Forgetting

Supermemory implements time-based forgetting. Temporary facts like "I have an exam tomorrow" or "Meeting at 3pm today" automatically expire. This is genuinely clever — most memory systems accumulate forever, becoming noise machines.

The API Interface

Three ways to add context to your LLMs:

Memory API — Learned User Context

Extracted facts about users that evolve over time. Handles knowledge updates, temporal changes, and creates user profiles. This becomes the default context provider for your LLM.

User Profiles

Combines static facts (always know this) with dynamic facts (recent context, episodic memory). You configure what counts as static vs. dynamic for your use case.

Super RAG — Advanced Semantic Search

Traditional RAG with extras:

Metadata filtering
Contextual chunking
Integration with the memory engine

The Research Claims

Supermemory claims State-of-the-Art on LongMemEval — a benchmark that tests retrieval across 115k+ tokens with temporal reasoning and knowledge conflicts. They beat other approaches on:

Information extraction
Single-session recall (user & assistant)
Preference learning (implicit signals)
Multi-session reasoning
Temporal reasoning (what happened when)
Knowledge updates (handling contradictions)

The benchmark is specifically designed for human-assistant interactions (not human-human), making it more representative of real AI assistant usage.

Pricing: The Reality Check

Plan	Price	Tokens	Queries
Free	$0	1M	10K
Pro	$19/mo	3M	100K
Scale	$399/mo	80M	20M
Enterprise	Custom	Unlimited	Unlimited

Overages: $0.01 per 1K tokens processed, $0.10 per 1K queries.

Translation: At $19/month for 3M tokens, you're paying ~$6.33 per million tokens processed. For comparison, OpenAI's embedding API is ~$0.10 per million tokens, but that doesn't include storage, search, or graph relationships.

The Ecosystem Play

Supermemory isn't just an API — they're building an ecosystem:

Web app: Personal knowledge management at app.supermemory.ai
Browser extension: Save from any webpage, ChatGPT, Claude, Twitter
Raycast extension: Keyboard-shortcut access
MCP integration: Works with Cursor, Claude Desktop, and other MCP-compatible tools
Self-hosting: Enterprise can deploy on their own infrastructure

The Open Source Angle

The core engine is open source on GitHub. You can self-host if you don't want to pay. The repo includes:

Next.js web app
Processing pipelines
Graph memory implementation
API server

This is a smart move. Developers can try it free, build on it, and migrate to managed when they need scale.

Competitive Landscape

Approach	Pros	Cons
Supermemory	Graph relationships, forgetting, managed	$19+/mo, vendor lock-in risk
Pinecone/Weaviate	Cheaper at scale, more control	Build your own graph layer
Letta (MemGPT)	Local-first, hierarchical memory	Smaller ecosystem, newer
Roll your own	Full control, no vendor risk	Months of dev time

The Verdict

Supermemory is the most complete "memory as a service" product available. The graph relationships and automatic forgetting solve real problems that pure vector search doesn't touch. The pricing is reasonable for what you get — if you value your time at more than $100/hour, it's cheaper than building equivalent functionality.

That said, the $19/month entry point might push hobbyists toward open-source alternatives like Letta or rolling their own with Chroma + custom graph logic.

For production AI apps that need personalization, Supermemory is worth evaluating. The LongMemEval SOTA claim is legitimate — they solved retrieval across 115k tokens with temporal reasoning, which is genuinely hard.

Key Takeaways

Graph > Vectors: The relationship layer distinguishes it from RAG-in-a-box
Forgetting matters: Automatic expiration of temporary facts is a differentiator
Open core model: Self-hostable, but managed is the monetization path
Bundled ecosystem: Browser extensions, MCP, connectors — it's a platform play
Research-backed: LongMemEval SOTA isn't marketing fluff; they published methodology

Is this the future of AI memory? Maybe. At minimum, it's the present — and it's well-executed.

Sources: supermemory.ai docs, GitHub repository, research paper on LongMemEval, pricing page as of Feb 2, 2026.