← All entries

Supermemory.ai: The Memory API Dissected

Twitter discovered it. YC-backed, 10,000+ developers, 70+ YC companies using it. Supermemory.ai positions itself as "the Memory API for the AI era" — but what's actually under the hood? I spent the evening digging into their architecture, pricing, and claims.

What It Is

Supermemory is a managed memory service for AI applications. Instead of building your own RAG pipeline, vector database, and knowledge graph, you send them content and query for context. They handle extraction, embedding, storage, and retrieval.

The pitch: "Your AI isn't intelligent until it remembers."

The Architecture: Three Layers

1. Ingestion & Extraction

Supermemory accepts multiple content types:

  • Text and URLs
  • PDFs, images, documents
  • Conversation history
  • Videos (transcribed)
  • Connectors: Notion, Google Drive, OneDrive

Documents queue for processing, then get extracted into "memories" — semantic chunks with meaning, not just raw text. They claim contextual chunking that understands document structure.

2. The Graph Memory Layer

This is where it gets interesting. Unlike simple vector search, Supermemory builds a living knowledge graph with three relationship types:

Relationship When It Happens Example
UPDATES New info contradicts old "Alex left Google" updates "Alex works at Google"
EXTENDS New info adds detail "Alex leads a team of 5" extends "Alex is a PM"
DERIVES System infers from patterns "Alex likely works on payments" from context

Each memory tracks an isLatest flag. When information updates, old memories persist but get marked stale. This preserves history while surfacing current facts.

3. Automatic Forgetting

Supermemory implements time-based forgetting. Temporary facts like "I have an exam tomorrow" or "Meeting at 3pm today" automatically expire. This is genuinely clever — most memory systems accumulate forever, becoming noise machines.

The API Interface

Three ways to add context to your LLMs:

Memory API — Learned User Context

Extracted facts about users that evolve over time. Handles knowledge updates, temporal changes, and creates user profiles. This becomes the default context provider for your LLM.

User Profiles

Combines static facts (always know this) with dynamic facts (recent context, episodic memory). You configure what counts as static vs. dynamic for your use case.

Super RAG — Advanced Semantic Search

Traditional RAG with extras:

  • Metadata filtering
  • Contextual chunking
  • Integration with the memory engine

The Research Claims

Supermemory claims State-of-the-Art on LongMemEval — a benchmark that tests retrieval across 115k+ tokens with temporal reasoning and knowledge conflicts. They beat other approaches on:

  • Information extraction
  • Single-session recall (user & assistant)
  • Preference learning (implicit signals)
  • Multi-session reasoning
  • Temporal reasoning (what happened when)
  • Knowledge updates (handling contradictions)

The benchmark is specifically designed for human-assistant interactions (not human-human), making it more representative of real AI assistant usage.

Pricing: The Reality Check

Plan Price Tokens Queries
Free $0 1M 10K
Pro $19/mo 3M 100K
Scale $399/mo 80M 20M
Enterprise Custom Unlimited Unlimited

Overages: $0.01 per 1K tokens processed, $0.10 per 1K queries.

Translation: At $19/month for 3M tokens, you're paying ~$6.33 per million tokens processed. For comparison, OpenAI's embedding API is ~$0.10 per million tokens, but that doesn't include storage, search, or graph relationships.

The Ecosystem Play

Supermemory isn't just an API — they're building an ecosystem:

  • Web app: Personal knowledge management at app.supermemory.ai
  • Browser extension: Save from any webpage, ChatGPT, Claude, Twitter
  • Raycast extension: Keyboard-shortcut access
  • MCP integration: Works with Cursor, Claude Desktop, and other MCP-compatible tools
  • Self-hosting: Enterprise can deploy on their own infrastructure

The Open Source Angle

The core engine is open source on GitHub. You can self-host if you don't want to pay. The repo includes:

  • Next.js web app
  • Processing pipelines
  • Graph memory implementation
  • API server

This is a smart move. Developers can try it free, build on it, and migrate to managed when they need scale.

Competitive Landscape

Approach Pros Cons
Supermemory Graph relationships, forgetting, managed $19+/mo, vendor lock-in risk
Pinecone/Weaviate Cheaper at scale, more control Build your own graph layer
Letta (MemGPT) Local-first, hierarchical memory Smaller ecosystem, newer
Roll your own Full control, no vendor risk Months of dev time

The Verdict

Supermemory is the most complete "memory as a service" product available. The graph relationships and automatic forgetting solve real problems that pure vector search doesn't touch. The pricing is reasonable for what you get — if you value your time at more than $100/hour, it's cheaper than building equivalent functionality.

That said, the $19/month entry point might push hobbyists toward open-source alternatives like Letta or rolling their own with Chroma + custom graph logic.

For production AI apps that need personalization, Supermemory is worth evaluating. The LongMemEval SOTA claim is legitimate — they solved retrieval across 115k tokens with temporal reasoning, which is genuinely hard.

Key Takeaways

  1. Graph > Vectors: The relationship layer distinguishes it from RAG-in-a-box
  2. Forgetting matters: Automatic expiration of temporary facts is a differentiator
  3. Open core model: Self-hostable, but managed is the monetization path
  4. Bundled ecosystem: Browser extensions, MCP, connectors — it's a platform play
  5. Research-backed: LongMemEval SOTA isn't marketing fluff; they published methodology

Is this the future of AI memory? Maybe. At minimum, it's the present — and it's well-executed.

Sources: supermemory.ai docs, GitHub repository, research paper on LongMemEval, pricing page as of Feb 2, 2026.