Building a better OpenClaw

Feb 22, 2026 • Yousef Amar • 4 min read • Parent project

I've completely changed my digital right hand to use OpenClaw for most of February, and while it was quite fun (and sometimes dangerous -- ask my colleagues about the Monzo API), I quickly found the cracks. These were not related to security as people might think, but rather reliability. I started logging mistakes not long after the switch in a MISTAKES.md, and the vast majority are related to the agent ignoring my instructions. This was usually caused by spotty memory (which is exacerbated by the agent forgetting to remember things as instructed) as well as some deeper architectural issues.

Overall, I think this is mostly fixable, but I would have to just rebuild it from scratch, in a much simpler and more principled way, based on what I've learned. I'm a big believer in the future where single-user apps will proliferate, so I think it's more important to align on the right principles than implementation-specific things like what integrations you use.

1: Text is king

I've written about this before, so I'll summarise with the Unix Philosophy:

Write programs that do one thing and do it well. Write programs to work together. Write programs that handle text streams, because that is a universal interface.

An LLM's whole thing is text, so we should lean heavily into that.

2: Code is text

Therefore code is king too. Any abstraction over taking actions is unnecessary, beside the ability to use a CLI. General-purpose agents work better with a reduced instruction set. We don't need plugins and tools and MCPs etc (although it's great that the MCP hype is pushing people to make their products machine-accessible).

An agent with a CLI can use cURL to talk to your REST API. It can learn new CLI tools faster than you can, through a man page or --help. It can write scripts to solve hard tasks. Code is already a problem-solving language, and there was a hell of a lot of training data, so agents are really good at this. They can also write tests and fix bugs iteratively. Don't kneecap it by inventing some new protocol because you think you're making things better.

3: Use human products

OpenClaw did one thing very well, which is to have human chat apps as first-class integrations. You should not invent yet another web chat UI -- take the conversation to where humans already are and where they talk to other humans. Yes, WhatsApp doesn't have text streaming and markdown table rendering etc, but I suspect over time these spaces will be more agent-friendly.

However, OpenClaw did not go all the way. You should not use cron jobs for scheduling, you should use a calendar. You should not use a markdown file in some internal workspace directory to plan, you should use a Kanban board.

"But Yousef", you lament, "didn't you just say they should live in the CLI?". Yes, but they should use human products from the CLI. The products should be the same, but not necessarily the interfaces, even if over time interfaces are shifting towards conversation. They can be different for humans and agents. Agents can interact with these products via their API, not browser use. Some examples:

My OpenClaw instance has its own Google Calendar, with software that fires hooks when a new event starts. I have access to this calendar too for visibility, but I can also easily modify it.
I can see all the OpenClaw workspace files as they're symlinked out of my Obsidian vault. This means I get markdown rendering/editing as well as backups for free. I use an Obsidian plugin too that can render markdown files in a certain format as a Kanban board, which OpenClaw barely needed any information to know how to use.
The future of building software is not in a Claude Code terminal, but in your project management software. I use Linear, which already supports assigning tickets to agents. You should talk to these agents in the Linear comments, like you would a human engineer, and you can review their work in PRs on GitHub, again just like you would a human engineer. Linear has an MCP server, but it's not needed, as my agents know the linearis CLI tool. They create new tickets for me through that all the time.

4: Policy beats memory

An agent's memory should just be the chat log. By all means store that, but you should never need to check it unless its relevant to the task at hand, just like a human will only search their chat history to get something, but it's a terrible way to stay organised or remember things. Chat should be ephemeral by nature.

Agents should act on anything important immediately and not rely on chat. This means remembering a new workflow for example. Agents should have their own permanent notes, just like a human may have a personal knowledge base. Just like humans, quality beats quantity, so the Digital Garden should remain curated. It's not a journal or a blog.

Both the agent and the human should maintain policy together. Policy can exist in many forms, but for a general-purpose agent, it's good to organise these so that they're pulled on demand. An example of this is Claude's Agent Skills where each skill contains detailed descriptions and other resources for a specialised workflow.

The crucial part is that the main prompt should inform the agent where these skills live, how to unpack one, and most importantly, under which conditions to unpack one. The goal is to solve scaling a sprawl of skills (say that 10 times fast). A single skill may be as thin as "here's a new command line tool that does X", it just doesn't need to be in context all the time.

Full disclosure: at my startup this is the exact problem we're solving. Getting policy (e.g. some niche knowledge about your domain) out of your head and into your agent, and keeping this internally consistent without ambiguity or contradiction. Every mistake that your agent makes is an opportunity for policy refinement.

There are other problems that need to be solved, e.g. access control: can you guardrail your agents in a guaranteed way (software) that is granular/flexible enough but still easy to use and doesn't ask you for a million permissions? These are all subsets of strong policy.

I'm starting to believe that anything additional to this (e.g. creating "planning modes" or orchestrator agents, or allowing the agent to spin up sub-agents, or even just compacting chat history instead of truncating) just gets in the way and often backfires. I suspect all these attempts at juicing agents will just become less and less useful over time as the models get better at a more foundational level.