The Problem: I use Claude for coding daily, but every conversation starts from scratch. I'd explain my architecture, coding standards, past decisions... then hit the context limit and lose everything. Next session? Start over.
The Solution: Recall is an MCP (Model Context Protocol) server that gives Claude persistent memory using Redis + semantic search. Think of it as long-term memory that survives context limits and session restarts.
How it works: - Claude stores important context as "memories" during conversations - Memories are embedded (OpenAI) and stored in Redis with metadata - Semantic search retrieves relevant memories automatically - Works across sessions, projects, even machines (if you use cloud Redis)
Key Features: - Global memories: Share context across all projects - Relationships: Link related memories into knowledge graphs - Versioning: Track how memories evolve over time - Templates: Reusable patterns for common workflows - Workspace isolation: Project A memories don't pollute Project B
Tech Stack: - TypeScript + MCP SDK - Redis for storage - OpenAI embeddings (text-embedding-3-small) - ~189KB bundle, runs locally
Current Stats: - 27 tools exposed to Claude - 10 context types (directives, decisions, patterns, etc.) - Sub-second semantic search on 10k+ memories - Works with Claude Desktop, Claude Code, any MCP client
Example Use Case: I'm building an e-commerce platform. I told Claude once: "We use Tailwind, prefer composition API, API rate limit is 1000/min." Now every conversation, Claude remembers and applies these preferences automatically.
What's Next (v1.6.0 in progress): - CI/CD pipeline with GitHub Actions - Docker support for easy deployment - Proper test suite with Vitest - Better error messages and logging
Try it:
npm install -g @joseairosa/recall # Add to claude_desktop_config.json # Start using persistent memory
Case in point; I'm mostly a Claude user, which has decent background process / BashOutput support to get a long-running process's stdout.
I was using codex just now, and its processes support is ass.
So I asked it, give me 5 options using cli tools to implement process support. After 3 min back and forth, I got this: https://github.com/offline-ant/shellagent-tools/blob/main/ba...
Add single line in AGENTS.md.
> the `background` tool allows running programs in the background. Calling `background` outputs the help.
Now I can go "background ./server; try thing. investigate" and it has access to the stdout.
Stop pre-trashing your context with MCPs people.
It's ridiculous and ties into the overall state of the world tbh. Pretty much given up hoping that we'll become an enlightened species.
So let's enjoy our stupid MCP and stupid disposable plastic because I don't see any way that we aren't gonna cook ourselves to extinction on this planet. :)
IIRC at the time I was testing with Sonnet 3.7, I haven't tried it on the newer models.
My God, there’s no signal. It’s all noise.
.md files work great for small projects. But they hit limits:
1. Size - 100KB context.md won't fit in the window 2. No search - Claude reads the whole file every time 3. Manual - You decide what to save, not Claude 4. Static - Doesn't evolve or learn
Recall fixes this: - Semantic search finds relevant memories only - Auto-captures context during conversations - Handles 10k+ memories, retrieves top 5 - Works across multiple projects
Real example: I have 2000 memories. That's 200KB in .md form. Recall retrieves 5 relevant ones = 2KB.
And of course, there's always the option to use both .md for docs, Recall for dynamic learning.
Does that help?
These two videos on using Claude well explain what I mean:
1. Claude Code best practices: https://youtu.be/gv0WHhKelSE
2. Claude Code with Playwright MCP and subagents: https://youtu.be/xOO8Wt_i72s
That said, Claude now has a native memory feature as of the 2.0 release recently: https://docs.claude.com/en/docs/claude-code/memory so the parent's tool may be too late, unless it offers some kind of advantage over that. I don't know how to make that comparison, personally.
The answer seems to be both yes and no: see their announcement on youtube yesterday: https://www.youtube.com/watch?v=Yct0MvNtdfU&t=181s
It's still ultimately file-based, but it can create non-Claude.md files in a directory it treats more specially. So it's less sophisticated than I expected, but more sophisticated than the previous "add this to claude.md" feature they've had for a while.
Thanks for the nudge to take the time to actually dig into the details :)
I wonder if the feature got cut for scope, if I'm not in some sort of beta of a better feature, or what.
How disappointing!
This is very much in development and I keep adding features to it. Any suggestions let me know.
The way I use it, I add instructions to CLAUDE.md on how I want him to use recall, and when.
## Using Recall Memory Efficiently
*IMPORTANT: Be selective with memory storage to avoid context bloat.*
### When to Store Memories - Store HIGH-LEVEL decisions, not implementation details - Store PROJECT PREFERENCES (coding style, architecture patterns, tech stack) - Store CRITICAL CONSTRAINTS (API limits, business rules, security requirements) - Store LEARNED PATTERNS from bugs/solutions
### When NOT to Store - Don't store code snippets (put those in files) - Don't store obvious facts or general knowledge - Don't store temporary context (only current session needs) - Don't duplicate what's already in documentation
### Memory Best Practices - Keep memories CONCISE (1-2 sentences ideal) - Use TAGS for easy filtering - Mark truly critical things with importance 8-10 - Let old, less relevant memories decay naturally
### Examples GOOD: "API rate limit is 1000 req/min, prefer caching for frequently accessed data" BAD: "Here's the entire implementation of our caching layer: [50 lines of code]"
GOOD: "Team prefers Tailwind CSS over styled-components for consistency" BAD: "Tailwind is a utility-first CSS framework that..."
*Remember: Recall is for HIGH-SIGNAL context, not a code repository.*
1. Claude Desktop's built-in `/memory` command (what you tried) - just lists CLAUDE.md files 2. Recall MCP server (this project) - completely separate tool you need to install
Recall doesn't work through slash commands. It's an MCP server that needs setup:
1. Install: npm install -g @joseairosa/recall 2. Add to claude_desktop_config.json 3. Restart Claude Desktop 4. Then Claude can use memory tools automatically in conversation
Quick test after setup: "Remember: I prefer TypeScript" - Claude will store it in Redis.
Often memory works too well and crowds out new things, so how are you balancing that?
Aside from all that, using npm for distribution makes this a total non-starter for me.
My experience is that ChatGPT can engage in a very thoughtful conversations but if I ask for a summary it makes something very generic, useful to an outsider, but it does not catch salient points which were the most important outcomes.
Did you notice the same problem?
Then it can reference those tutorials for specific things.
Interested in giving this a shot but it feels like a lot of infrastructure.
It'd be essentially
1. Language server support for lookups & keeping track of the code
2. Being able to "pin" memories to functions, classes, properties etc via the language server support/providing this context whenever changes are made in this function/class/properties etc, but not kept, so all following changes outside of that will no longer include this context (basically, changes that touch code with which memories will be done by agents with additional context, and only the results are synced back, not the way to achieve it)
3. Provide a ide integration for this context so you can easily keep track of what's available just by moving the cursor to the point the memory is pinned at
Sadly impossible to achieve via MCP.
Using the VS Code extension you get dynamic context management which works really well.
They also have a memory system built using reflexion (someone please correct me if I'm wrong) so proper evals are derived from lessons before storing.
Imagine having 20 years of context / memories and relying on them. Wouldn't you want to own that? I can't imagine pay-per-query for my real memories and I think that allowing that for AI assisted memory is a mistake. A person's lifetime context will be irreplaceable if high quality interfaces / tools let us find and load context from any conversation / session we've ever had with an LLM.
On the flip side of that, something like a software project should own the context of every conversation / session used during development, right? Ideally, both parties get a copy of the context. I get a copy for my personal "lifetime context" and the project or business gets a copy for the project. However, I can't imagine businesses agreeing to that.
If LLMs become a useful tool for assisting memory recall there's going to be fighting over who owns the context / memories and I worry that normal people will lose out to businesses. Imagine changing jobs and they wipe a bunch of your memory before you leave.
We may even see LLM context ownership rules in employment agreements. It'll be the future version of a non-compete.
Your project becomes progressively more valuable the further you go down the list. The overall design should be documented and curated to onboard new hires. Documenting current issues is a waste of time compared to capturing live discussion, so Recall is super useful here.
How? The models aren't trained on compressed text tokens nor could they be if I understand it correctly. The models would have to uncompress before running the raw text through the model.
You can train your own with very very compressed, i mean you could even go down to each token=just 2 float numbers. It will train, but it will be terrible, because it can essentially only capture distance.
Prompting a good LLM to summarize the context is probably funnily enough the best way of actually "compressing" context
it would sort of work like grammarly itself and you can use it to metaprompt
i find all the memory tooling, even native ones on claude and chatgpt to be too intrusive
Your approach is actually really interesting, like a background process watching the conversation and deciding what's worth remembering. More passive, less in-your-face.
I thought about this too. The tradeoff I made:
Your approach (judge/watcher): - Pro: Zero interruption to conversation flow - Pro: Can use cheaper model for the judge - Con: Claude doesn't know what's in memory when responding - Con: Memory happens after the fact
Tool-based (current Recall): - Pro: Claude actively uses memory while thinking - Pro: Can retrieve relevant context mid-response - Con: Yeah, it's intrusive sometimes
Honestly both have merit. You could even do both, background judge for auto-capture, tools when Claude needs to look something up.
The Grammarly analogy is spot on. Passive monitoring vs active participation.
Have you built something with the judge pattern? I'd be curious how well it works for deciding what's memorable vs noise.
Maybe Recall needs a "passive mode" option where it just watches and suggests memories instead of Claude actively storing them. That's a cool idea.
jj autocommits when the working copy changes, and you can manually stage against @-: https://news.ycombinator.com/item?id=44644820
OpenCog differentiates between Experiential and Episodic memory; and various processes rewrite a hypergraph stored in RAM in AtomSpace. I don't remember how the STM/LTM limit is handled in OpenCog.
So the MRU/MFU knapsack problem and more predictable primacy/recency bias because context length limits and context compaction?
> Economic Attention Allocation (ECAN) was an OpenCog subsystem intended to control attentional focus during reasoning. The idea was to allocate attention as a scarce resource (thus, "economic") which would then be used to "fund" some specific train of thought. This system is no longer maintained; it is one of the OpenCog Fossils.
(Smart contracts require funds to execute (redundantly and with consensus), and there there are scarce resources).
Now there's ProxyNode and there are StorageNode implementations, but Agent is not yet reimplemented in OpenCog?
ProxyNode implementers: ReadThruProxy, WriteThruProxy, SequentialReadProxy, ReadWriteProxy, CachingProxy
StorageNode > Implementations: https://wiki.opencog.org/w/StorageNode#Implementations
AI can already form the query DSL quite nicely especially if it knows the indexes.
I set up AI powered search this way, and it works really well with any open ended questions.
Recall just uses basic Redis commands - HSET, SADD, ZADD, etc. Nothing fancy.
Valkey is Redis-compatible so all those commands work the same.
I haven't tested it personally but there's no reason it wouldn't work. The Redis client library (ioredis) should connect to Valkey without issues.
If you try it and hit any problems let me know! Would be good to officially support it.
how much better was this to justify all that extra complexity?