Twitter discovered it. YC-backed, 10,000+ developers, 70+ YC companies using it. Supermemory.ai positions itself as "the Memory API for the AI era" — but what's actually under the hood? I spent the evening digging into their architecture, pricing, and claims.
What It Is
Supermemory is a managed memory service for AI applications. Instead of building your own RAG pipeline, vector database, and knowledge graph, you send them content and query for context. They handle extraction, embedding, storage, and retrieval.
The pitch: "Your AI isn't intelligent until it remembers."
The Architecture: Three Layers
1. Ingestion & Extraction
Supermemory accepts multiple content types:
- Text and URLs
- PDFs, images, documents
- Conversation history
- Videos (transcribed)
- Connectors: Notion, Google Drive, OneDrive
Documents queue for processing, then get extracted into "memories" — semantic chunks with meaning, not just raw text. They claim contextual chunking that understands document structure.
2. The Graph Memory Layer
This is where it gets interesting. Unlike simple vector search, Supermemory builds a living knowledge graph with three relationship types:
| Relationship | When It Happens | Example |
|---|---|---|
| UPDATES | New info contradicts old | "Alex left Google" updates "Alex works at Google" |
| EXTENDS | New info adds detail | "Alex leads a team of 5" extends "Alex is a PM" |
| DERIVES | System infers from patterns | "Alex likely works on payments" from context |
Each memory tracks an isLatest flag. When information updates, old memories persist but get marked stale. This preserves history while surfacing current facts.
3. Automatic Forgetting
Supermemory implements time-based forgetting. Temporary facts like "I have an exam tomorrow" or "Meeting at 3pm today" automatically expire. This is genuinely clever — most memory systems accumulate forever, becoming noise machines.
The API Interface
Three ways to add context to your LLMs:
Memory API — Learned User Context
Extracted facts about users that evolve over time. Handles knowledge updates, temporal changes, and creates user profiles. This becomes the default context provider for your LLM.
User Profiles
Combines static facts (always know this) with dynamic facts (recent context, episodic memory). You configure what counts as static vs. dynamic for your use case.
Super RAG — Advanced Semantic Search
Traditional RAG with extras:
- Metadata filtering
- Contextual chunking
- Integration with the memory engine
The Research Claims
Supermemory claims State-of-the-Art on LongMemEval — a benchmark that tests retrieval across 115k+ tokens with temporal reasoning and knowledge conflicts. They beat other approaches on:
- Information extraction
- Single-session recall (user & assistant)
- Preference learning (implicit signals)
- Multi-session reasoning
- Temporal reasoning (what happened when)
- Knowledge updates (handling contradictions)
The benchmark is specifically designed for human-assistant interactions (not human-human), making it more representative of real AI assistant usage.
Pricing: The Reality Check
| Plan | Price | Tokens | Queries |
|---|---|---|---|
| Free | $0 | 1M | 10K |
| Pro | $19/mo | 3M | 100K |
| Scale | $399/mo | 80M | 20M |
| Enterprise | Custom | Unlimited | Unlimited |
Overages: $0.01 per 1K tokens processed, $0.10 per 1K queries.
Translation: At $19/month for 3M tokens, you're paying ~$6.33 per million tokens processed. For comparison, OpenAI's embedding API is ~$0.10 per million tokens, but that doesn't include storage, search, or graph relationships.
The Ecosystem Play
Supermemory isn't just an API — they're building an ecosystem:
- Web app: Personal knowledge management at app.supermemory.ai
- Browser extension: Save from any webpage, ChatGPT, Claude, Twitter
- Raycast extension: Keyboard-shortcut access
- MCP integration: Works with Cursor, Claude Desktop, and other MCP-compatible tools
- Self-hosting: Enterprise can deploy on their own infrastructure
The Open Source Angle
The core engine is open source on GitHub. You can self-host if you don't want to pay. The repo includes:
- Next.js web app
- Processing pipelines
- Graph memory implementation
- API server
This is a smart move. Developers can try it free, build on it, and migrate to managed when they need scale.
Competitive Landscape
| Approach | Pros | Cons |
|---|---|---|
| Supermemory | Graph relationships, forgetting, managed | $19+/mo, vendor lock-in risk |
| Pinecone/Weaviate | Cheaper at scale, more control | Build your own graph layer |
| Letta (MemGPT) | Local-first, hierarchical memory | Smaller ecosystem, newer |
| Roll your own | Full control, no vendor risk | Months of dev time |
The Verdict
Supermemory is the most complete "memory as a service" product available. The graph relationships and automatic forgetting solve real problems that pure vector search doesn't touch. The pricing is reasonable for what you get — if you value your time at more than $100/hour, it's cheaper than building equivalent functionality.
That said, the $19/month entry point might push hobbyists toward open-source alternatives like Letta or rolling their own with Chroma + custom graph logic.
For production AI apps that need personalization, Supermemory is worth evaluating. The LongMemEval SOTA claim is legitimate — they solved retrieval across 115k tokens with temporal reasoning, which is genuinely hard.
Key Takeaways
- Graph > Vectors: The relationship layer distinguishes it from RAG-in-a-box
- Forgetting matters: Automatic expiration of temporary facts is a differentiator
- Open core model: Self-hostable, but managed is the monetization path
- Bundled ecosystem: Browser extensions, MCP, connectors — it's a platform play
- Research-backed: LongMemEval SOTA isn't marketing fluff; they published methodology
Is this the future of AI memory? Maybe. At minimum, it's the present — and it's well-executed.
Sources: supermemory.ai docs, GitHub repository, research paper on LongMemEval, pricing page as of Feb 2, 2026.