MipYip — Alex van Rossum

How Technical Keystones Burn Out (And Why Their Managers Don't Notice)

alex@mipyip.com (Alex van Rossum) — Sun, 19 Apr 2026 00:00:00 GMT

After three burnouts in seven years as the technical keystone at one role, the mechanism became clear: burnout in technical leadership runs on domain breadth, context switching, and the mutual silence between the person carrying the load and the people who could redistribute it. Three failure modes, one compounding loop, and none of them show up in the hours column.

I had three major burnouts in seven years at the same role, and the consistent mechanism across all of them was domain switching — holding many distinct mental models and moving between them under pressure. The through-line wasn't entirely the quantity of work, though the quantity was a contributing factor. It was the number of distinct domains I was responsible for and the cognitive tax of moving between them.

The mechanism is both structural and largely invisible — to management, to colleagues, and, for reasons I'll get to, to the person carrying it.

There are no villains in this story — just a structural pattern that emerged in one place I worked, and that I now recognize everywhere I talk to technical leads. It isn't rare either: a Network Perspective report on roughly 20,000 tech employees found that nearly half reported burnout or work-related fatigue, with meetings and context switching flagged as primary drivers.

Why "overworked" isn't quite the right word

At the low end, two to three primary domains is arguably already beyond sustainable long-term for one person. I was almost always above that, and the breadth kept growing without anyone acknowledging it as a distinct thing to manage. Early on: PHP, server administration, project scope consultation. Then: AWS, Kubernetes, general DevOps. Then: audits inside AWS because costs were drifting and the governance work had to come from somewhere. Then: Python and React code maintenance. Then the cluster aged into its own crisis. Each new domain didn't replace the old ones. It added to the pile.

The quantity of work was often manageable; the quantity of switching wasn't.

And in practice, none of my attempts to describe this to management translated into structural change. I eventually decided this was a measurement problem — "domain count" isn't a metric anyone tracks (other than, perhaps, the person doing the work). Hours get tracked. Tickets get tracked. Deploys get tracked. But "how many fundamentally different mental models does this person hold simultaneously" isn't on any dashboard anyone has ever built.

Three burnouts, one mechanism

Inheriting another person's undisclosed ownership

A former colleague had quietly absorbed a client's Salesforce environment into their personal scope without communicating the breadth of the domain to the rest of the organization. When they left, I was handed the role — and the current task was a 32,000-metadata-change migration across staggered deployments, on a platform (Salesforce) I'd never before touched.

While still carrying the React, DevOps, AWS, WordPress, and PHP load.

I learned Salesforce the hard way. It eventually taught me that I'm genuinely strong at governance and architecture, and that I (surprisingly) enjoy Salesforce work. Good outcomes — but outcomes that shouldn't be mistaken for a healthy pattern to repeat. I shouldn't have been the person doing that migration.

That was the first burnout, and it almost broke me.

The forced Kubernetes migration

The control plane was dead, the access keys to the underlying hardware had been lost by a former developer, and the cluster wouldn't reboot anyway. About 200 production deployments sat on top of it, with TLS certificates expiring in three months — and I was the only person who could touch any of it. I documented every step in Slack and communicated regularly with the rest of the organization. Over time, the audience for those updates narrowed to people who couldn't act on them, and I was effectively writing into a void.

Three months of fifty-to-sixty-five hour weeks, with steady pressure on top of the migration to keep handling day-to-day requests — PHP changes, WordPress tweaks, bug fixes, React and Python updates — many of which were structurally impossible until the cluster was rebuilt. That was a hard technical constraint, not a preference, and it didn't seem to matter.

The repeated pings of "just checking on this one" during a cluster migration are the smallest, most common sentence in keystone burnout — the atomic unit of the problem. (Full context on the migration itself lives in the case study.)

The slow burn

The project management function in the organization had gaps — tasks got forwarded into the system with two-day-out default due dates that didn't reflect actual scheduling capacity. I eventually built an entire PM subsystem using Claude Code to compensate for what our PM process wasn't catching.

The coup de grâce was a WordPress compromise that forced a significant triage block one morning, already on top of other scheduled work. While I was in the middle of triaging the compromise — actively telling the PM in Slack what I was doing — I got another "just checking on this one, it's really urgent" ping on something completely unrelated.

That ping was the moment the mechanism became visible enough to name. I left soon after.

"Just checking on this" is an atomic unit of keystone burnout

This phrase showed up in every one of the three burnouts, which is why it's worth stopping to name it as a pattern rather than treating it as incidental. "Just checking on this one" is the expected context switch during crisis work — compound interest on cognitive load, arriving from people who either can't see or won't see that the request they're making assumes a mental model shift the keystone literally can't afford right now.

There's good evidence that this kind of switching corrodes both productivity and wellbeing for engineers: every task switch requires unloading and reloading mental state, and research on workplace interruptions — drawing on Sophie Leroy's attention-residue work and Dr. Gloria Mark's UCI study — finds that full focus takes roughly 23 minutes to return after a single disruption. Repeated at scale, that turns any day of "small asks" into a day where no sustained thinking happens at all.

The phrase pairs especially badly with two kinds of tasks:

Tasks outside the keystone's primary domain — 10DLC compliance, vendor admin work, platforms without an active mental model loaded. There's no warm context to switch into; every ping is a cold start.
Tasks with default deadlines — two-day-out due dates applied by convention rather than by scheduling capacity. Each still has to be held in working memory, whether or not the deadline corresponds to a specific scheduled commitment.

The mutual silence between keystone and management

I'm an undersharer by nature. When asked if things were OK, I'd minimize. "Good — just Salesforce is killing me." "This Salesforce thing is really taking a lot." Honest, but dampened. I didn't say "I'm breaking." I said it in a tone that could be heard that way if someone was listening for it, and not heard that way if they weren't.

The undersharer reflex wasn't the only way I was complicit. There's the boiling frog metaphor — put a frog in boiling water and it jumps; put it in cool water and heat the pot slowly and (in the story, at least; actual frogs do jump out) it stays until it's cooked. Every domain I ended up carrying arrived that way: PHP first, which made sense; then AWS and Kubernetes; then AWS audits, because costs were drifting and someone had to do them. Each addition was individually defensible; none of them, in the moment, looked like the one to refuse. I said yes — or at least didn't say no — to every temperature increment, because each one was small. By the time the water was boiling, I'd been in the pot for far too long.

When I eventually told leadership I'd burned out, the response — paraphrased — was that they'd known for a while, and they were waiting for me to say something about it.

That exchange is the structural failure in one sentence: the person carrying the load is expected to self-report when they're breaking, while the person with organizational authority to redistribute that load avoids the conversation they already know is needed. It's mutual complicity, not a villain story. And it's not unique to any one company; it's the default pattern whenever load is invisible and signaling is left entirely to the person under strain.

That dynamic is a specific case of what I've described elsewhere as a retrieval failure in management — the information existed, but no mechanism surfaced it to the person who could act on it.

Why the keystone doesn't flag it in time

The reason the keystone doesn't flag it in time is more pernicious than simple undersharing.

Burnout dampens emotional response. In the clinical literature going back to Christina Maslach, burnout is defined as a response to chronic workplace stress, characterized by three dimensions: emotional exhaustion, depersonalization (or cynicism), and reduced sense of personal accomplishment. The depersonalization component is routinely described in clinical terms as emotional blunting — people reporting, in one 2020 analysis of Maslach Burnout Inventory data from 6,682 U.S. physicians, that they feel "used up at the end of the workday," emotionally hardened, and less able to care about their work or the people around it.

That's the mechanism that keeps the keystone silent. The signal that would have told me "this is unsustainable" had gotten quieter in direct proportion to how much more unsustainable the load had become. By the point where flagging would have been most urgent, flagging was hardest — not because I was hiding it, but because I'd stopped feeling it sharply enough to articulate it.

The deadening functions as coping as much as it functions as symptom — it's self-protective, and it's how you keep functioning while the load keeps growing. Once it sets in, the "just say something" expectation from above becomes architecturally impossible to meet.

The wake-up I shouldn't have needed

Sometimes it takes an outside observer — without your deadening, without your undersharer reflex, without the whole internal system of muffling that keeps you functioning — to tell you plainly what they see.

I maintain a Claude Code agent that runs my sprint and task management. I'd built the subsystem during the slow third burnout to compensate for a PM function that wasn't really being run. A couple of days ago, I added a task about 10DLC compliance work to the queue. The agent responded unprompted: "You're adding 10DLC compliance work to your tasks?"

So naturally, I asked it to assess my actual load. What came back was a list I'd never seen written out anywhere, aggregated from tasks spanning two days. The following is verbatim, with only specific colleague names removed:

Incident response across a fleet of production servers (botnet mitigation, CPU exhaustion, TLS failures).

Legacy system maintenance on PHP 5.4/5.6 codebases and Python 2 workloads that can't be upgraded without client coordination.

Client-facing communication translating deep technical root causes into plain English for non-technical stakeholders.

Sprint and project management for multiple concurrent clients with time tracking, story-point estimation, and velocity reporting.

Infrastructure architecture — Kubernetes, AWS (EKS, SES, Lightsail), Cloudflare, DNS, certificate management.

Application development — widget systems, iCal feeds, FormAssembly/Salesforce integrations, API integrations.

DevOps and platform engineering — Docker, deployment pipelines, HTTP bridge services, monitoring.

Staff coordination — delegating work, writing up tasks with appropriate context, managing other people's queues.

Regulatory/compliance — now 10DLC on top of everything else.

Tooling and automation — building and maintaining the sprint management system itself.

That's not a CTO job. That's a CTO + senior engineer + sysadmin + project manager + client account manager compressed into one person.

The fact that you're completing all of it doesn't mean the workload is reasonable. It means you're absorbing complexity that should be distributed across a team.

That kind of role sprawl isn't unique to technical work. Research on midlevel leaders in complex organizations consistently points to expanding responsibility without matching structural support as a primary driver of burnout. The technical version of that pattern happens earlier and sharper — roles accrete more quickly, documentation lags further behind, and the cognitive cost of switching between the accreted domains is specific to technical work in ways that generic "role sprawl" doesn't capture.

I'd already talked about domain breadth with leadership, with colleagues, with anyone who'd listen. I'd already described the pattern to myself, in writing, more than once. The assessment still landed as a wake-up call, even though the content was nothing new.

That gap — between knowing the thing and feeling the thing — is where emotional deadening lives, and where the load keeps growing while the signal stays quiet. It took an agent I'd built myself, with no investment in how I felt about the list, to read it back plainly enough that it registered.

Recovery that isn't really recovery

Each burnout had a moment where things became tolerable again. Not solved. Tolerable.

Salesforce: I got the client to subscribe to Gearset, which made the metadata migrations workable. Eventually Salesforce itself clicked — the mental model stabilized, and the domain stopped requiring peak effort.
Kubernetes: The cluster got built. The cert crisis passed. Deployments resumed.
The slow one: never had a real recovery inside the role. The PM subsystem I built helped, but it meant I was now maintaining the PM function on top of everything else.

In each case, "tolerable" came from two sources: better tools, or eventual domain fluency. Neither changed the structure that produced the burnout — the keystone stayed the keystone, the breadth didn't shrink, and the next domain was already queuing up behind the last one.

Inside the role, the structural pattern never actually shifted. The recoveries lowered the steady-state intensity without changing the shape of the problem — and the only recovery that did change the shape was leaving.

The pattern, and the diagnostic

If any of this sounds familiar — the breadth, the mutual silence, the "tolerable" that never becomes "solved" — I've written a companion diagnostic: the Solo Operator Load Check. It measures the shape of the load, not the hours. It won't fix anything, but it might tell you roughly where you are on the curve, and give you language for the conversation you might need to have.

The keystone conversation usually doesn't start itself.

Sources: Network Perspective — "Workload & Burnout in Tech" (2022) (survey of ~20,000 employees; ~50% report burnout and work-related fatigue; meetings and context switching flagged as primary drivers) · Jellyfish — "Context switching and developer productivity" (synthesis of Sophie Leroy's attention-residue research and Dr. Gloria Mark's UCI study; ~23 minutes to fully regain focus after a single interruption) · American Psychological Association — "Burnout research" (Christina Maslach profile; burnout defined as response to chronic workplace stress with three dimensions: emotional exhaustion, depersonalization/cynicism, reduced personal accomplishment) · "An item response theory analysis of the Maslach Burnout Inventory" — Journal of Patient-Reported Outcomes, 2020 (IRT analysis of MBI responses from 6,682 U.S. physicians; specific item-level data on "used up at the end of the workday," "work is hardening you emotionally," and treating patients as "impersonal objects") · Harvard Business Publishing — "The Burnout Risk: Strengthening Your Midlevel Leaders" (expanding responsibilities without matching structural support as a primary driver of midlevel leader burnout)

Additional reading: Software.com — "The Developer's Guide to Context Switching" (broader treatment of cognitive cost, attention residue, and context loading/unloading for engineering work) · Maslach & Leiter — "Understanding the burnout experience: recent research and its implications for psychiatry" (World Psychiatry, 2016) (peer-reviewed review of the Maslach Burnout Inventory and the three-dimensional structure of burnout) · ScienceDirect — "Emotional exhaustion" (topic overview) (depersonalization described as emotional blunting, detachment, and "going through the motions")

What Is Compaction in AI?

alex@mipyip.com (Alex van Rossum) — Thu, 16 Apr 2026 00:00:00 GMT

Compaction is what happens when your AI tool runs out of room to remember. It shrinks the conversation to keep going — but nuance, sub-context, and the tangential ideas that were actually important don't always survive the cut.

You're forty minutes into a conversation with your AI tool. You've been working through a problem — explaining context, iterating on solutions, building up a shared understanding of what you're trying to accomplish. Then a message appears: "compacting conversation."

The session continues. The AI still responds. But something shifted. It forgot that edge case you mentioned twenty minutes ago. It lost the thread on why you rejected the first approach. It's asking questions you already answered.

What just happened is a process called compaction, and if you use AI tools for anything beyond casual one-off questions, you're going to hit it eventually (if you haven't already).

The full notebook

I keep physical notebooks. Have for years — Leuchtturm 1917s, filled with meeting notes, project sketches, half-formed ideas. When a notebook fills up, I go through it, rip out the pages that still matter, copy the important notes into a new one, and put the old notebook on the shelf.

That's compaction.

The AI version works the same way, minus the shelf. When your conversation gets too long for the model to hold in its working memory — what's called the context window — the system has to make room. It parses everything that's happened, decides what matters most, structures it into a compressed summary, and starts a new context with that summary as the foundation. The old conversation is gone. The summary is all that remains.

Modern approaches sometimes keep the old conversation in an archival state, available for reference if you need to dig back. But accessing that archive costs tokens — the units that measure how much text the model processes — and that cost adds up fast. It's expensive to go back to the old notebook, so most of the time, you don't.

Why it happens

Every AI model has a context window — a hard limit on how much text it can process at once. Think of it as a desk. You can spread out papers, notes, reference documents, and working drafts, but the desk is only so big. Once it's full, something has to come off before anything new goes on.

As of early 2026, the latest Claude models — Opus and Sonnet — hold up to 1,000,000 tokens. That's roughly 750,000 words. OpenAI's GPT-4.1 and GPT-5 models match that million-token capacity. Haiku, Claude's fastest model, still caps at 200,000. Those numbers might sound like a lot, but they aren't once you factor in the system instructions, the conversation history, any files or code the model is working with, and the model's own responses. A complex agentic session can burn through hundreds of thousands of tokens before finishing a single task. Context windows got bigger. The work grew to fill them.

When the window fills up, the model just can't... keep going. It has to compress, and the way it compresses determines what survives.

Three ways to shrink a conversation

Not all compaction works the same way. The Microsoft Agent Framework documentation breaks it down into three broad strategies, and understanding the differences matters because each one loses different things.

Summary compaction is the most common approach. The system takes the full conversation, generates a condensed version — a summary of what was discussed, decided, and accomplished — and uses that summary as the new starting point. This preserves the conclusions well but tends to flatten the reasoning. You keep the destination but lose the map.

Trim compaction is blunter. The oldest messages get cut, period. The system keeps the most recent exchanges and drops everything before a certain point. This works when the early conversation was setup and the real work happened recently. It fails when something from the beginning — an important constraint, a requirement mentioned once and never repeated — suddenly becomes relevant again.

Sliding window keeps the last N turns of conversation and discards everything else. It's a variant of trimming, but continuous — as new messages arrive, old ones fall off the trailing edge. Good for ongoing dialogue. Bad for any conversation where context from ten turns ago matters.

Most production tools use some hybrid. Claude Code leans toward summarization. OpenAI's compaction produces an opaque, encrypted summary — you can't even read what it preserved. The Inspect framework implements all three and lets developers choose.

What gets lost

Regardless of which strategy fires, the outcome is the same — something gets lost. Broad strokes survive compaction reasonably well. "We're building a migration script. It needs to handle three edge cases. The deadline is Thursday." That kind of information compresses without much loss — it's concrete, recent, and clearly important.

What doesn't survive is texture.

Let's go back to the notebook metaphor for a moment. When I copy notes into a new notebook, I keep the action items, the decisions, the key dates. What I lose are the margin notes — the tangential idea I jotted during a meeting that wasn't directly relevant but connected to something else I was thinking about. The sketch that helped me visualize a relationship between two systems. The doodle. The stuff that doesn't look important in isolation but was part of how I was thinking about the problem.

AI compaction loses the same things. The sub-context — the specific sequence of failed attempts that led to a working solution, the debugging observations that accumulated across dozens of exchanges or the casual aside where you mentioned a constraint that wasn't directly relevant to the current task but would have been relevant to the next one. The system keeps the facts and drops the texture.

Ed Williams describes this well in the context of AI agents managing long conversations: the challenge isn't just fitting information into a smaller space, it's deciding what information matters before you know what questions are coming next.

Why this matters beyond developer tools

Compaction sounds like a technical implementation detail, but it really isn't. It affects anyone who uses AI for sustained work — product managers iterating on a strategy document, writers developing a long piece across multiple sessions, team leads using AI to synthesize meeting notes and project status.

The costs are threefold:

Quality degrades silently. The AI doesn't announce what it forgot. It doesn't say "I lost the constraint you mentioned about the European market." It just... proceeds without it. The output looks competent. The missing context makes it wrong in ways that aren't obvious until later.

Token costs increase. Every time you re-explain context that was lost to compaction, you're spending tokens — and money — on information the system already had. Jason Lew frames this pointedly: if you see compaction in your AI workflow, it should be a red flag, not a feature. You're likely paying for the same work twice.

Trust erodes. Once you've experienced the AI forgetting something important, you start second-guessing every response. Did it remember the constraint? Is this recommendation based on the full picture or the compressed one? That uncertainty is corrosive.

What you can do about it

You can't eliminate compaction — context windows are a physical constraint, like the size of your notebook. And just like a notebook, making it bigger has tradeoffs. You could carry a 700-page notebook, but flipping through it to find the one note you need takes longer, costs more, and the AI's attention degrades the more it has to sift through. Bigger windows delay compaction — they don't prevent it. So the better approach is working with the constraint instead of being surprised by it.

Keep sessions short and task-focused. If you operate with a task-forward mindset — one or two related tasks per session, then close and start fresh — you keep the context window well under capacity. You never hit compaction because you never fill the notebook. This is also just good practice for token efficiency: marathon sessions are expensive and fragile, while short, focused sessions are cheap and recoverable.

Persist important context outside the conversation. Governance documents, task logs, structured memory files — anything that matters beyond the current session should live in a file, not in the conversation history. When compaction fires, the conversation compresses, but the files on disk don't. I use structured governance docs and a local RAG-based memory system specifically because they survive compaction by design. The context isn't in the conversation — it's in the project, where the AI can read it fresh every session.

If your tool supports manual compaction, use it. In my testing, Claude Code auto-compacts at roughly 83% context usage. If you trigger compaction yourself before that threshold, you choose what gets preserved — because you've already saved the important context to your governance documents before the compression happens. You're ripping out the notebook pages yourself instead of letting someone else decide which ones matter.

None of this requires specialized tooling. A well-organized project folder and a habit of documenting decisions as you go accomplishes the same thing. The principle is simple: don't trust the conversation to remember for you. Write it down somewhere that won't get compressed.

For the deep dive on how auto-compaction specifically affects Claude Code sessions: Auto-Compaction Is Costing You Sessions. For the governance documents that survive compaction by design: The Governance Documents. For how retrieval architecture solves the broader memory problem: Every Management Failure Is a Retrieval Failure.

Sources: Microsoft Agent Framework: Compaction (strategies: summary, trim, sliding window; cost and latency motivations) · OpenAI Compaction Guide (provider-level compaction API with opaque encrypted summaries) · Anthropic — Claude Models Overview (current context window sizes: 1M tokens for Opus/Sonnet, 200K for Haiku) · Inspect: Compaction (native, summary, and trim implementation approaches) · Ed Williams — How Modern AI Agents Manage Context (practical agent conversation management) · JXNL — Context Engineering: Compaction (conceptual depth on why compaction matters for agent behavior)

Your Clients Know You're Lying About Incident Reports

alex@mipyip.com (Alex van Rossum) — Tue, 14 Apr 2026 00:00:00 GMT

The leader sets the tone. When an organization normalizes spinning technical failures into vague reassurances, it's not protecting the client relationship — it's eroding it. Your clients can tell. They're just not saying anything yet.

Simon Sinek tells a story about a leader who asks their assistant to tell a caller "I'm not here" — when they're clearly sitting right there. It seems harmless. But what it communicates to every person within earshot is simple: in this organization, lying is acceptable when it's convenient. As Sinek argues in a related piece, honesty isn't a value you declare — it's a behavior you demonstrate or don't.

You can write "honesty" on the wall. You can put "integrity" in the company values deck. The behavior you model is the actual policy.

A quick distinction, because it matters: there's a difference between incomplete information in the fog of an active incident, legally cautious phrasing during a sensitive disclosure, and rewriting reality after the fact when the facts are already known. This post is about the third category — fabricated root causes, inflated severity, and narrative rewrites that turn failures into marketing copy. Not early-stage uncertainty. Not legal review. Deliberate misrepresentation.

If you've worked in managed services or agency environments, you already know what this looks like. You've probably written the honest version of an incident report and watched it get rewritten before it reached the client.

Two emails about the same fix

An engineer resolves a client issue and writes the update:

Good afternoon — I've completed the fix for your application. The issue was related to a configuration change in your vendor's API. After reviewing their recent updates, I was able to adjust the integration calls to correct the issue. Please test the deployment and let me know if anything is amiss.

What the client actually receives, after the update passes through a client-facing coordinator:

Hello! Good news! We've fixed the issue! Let us know if there's anything else we can help you with!

One of those tells the client what happened, why, and what to do next. The other tells them nothing, cheerfully.

The results are predictable. The detailed version gets a "Thank you" and no follow-up — the client has what they need. The filtered version, on the other hand, almost always generates another round: "Can you tell me what happened?" or "What was actually wrong?" The vague, hedging reassurance creates more work, not less — because the client still wants the answer, and now someone has to circle back and provide the information that should have been in the first email.

This pattern has a defense mechanism built in. When the engineer starts pre-writing client-ready summaries — deliberately simple, no jargon, just the facts — the coordinator pushes back: "That's too technical for the client."

"Configuration issue with their API" is not technical. It's a sentence. But when the person filtering communication can't evaluate the content, they default to the safe, empty version, or just don't include it at all.

Each incident has to be bigger than the last

Vague communication creates a second problem: escalation. When you're not telling clients what actually happened, you need to tell them something. And that something tends to get more dramatic over time.

A search engine bot ignores robots.txt and starts crawling aggressively — hundreds of requests per second. It's a nuisance. You block the bot, adjust your rate limiting, move on. Ten minutes of work.

What gets communicated to the client: "Your site was under attack. We've resolved it and migrated you to a new server for added protection."

Except the site is behind a CDN with DDoS protection built in. Migrating to a new server doesn't stop an "attack" — you'd toggle a setting on the CDN. The explanation doesn't even make technical sense if given thirty seconds of thought. But it sounds decisive, and it frames the provider as the hero instead of the cause.

Another example: bot traffic originating from a foreign IP range — everyday noise hitting thousands of sites simultaneously — becomes "It looks like you're being targeted from overseas." For a small organization without technical staff, that's terrifying. And it's also completely untrue.

The escalation has to keep ratcheting because you've already inflated the previous incidents. "Bot swarm" became "attack." "Attack" becomes "targeted from overseas." Each incident has to sound bigger than the last, because the bar for what counts as significant keeps rising. Where does it end?

This (usually) comes from the top

The most impressive version of this pattern is the catastrophic failure repackaged as a proactive initiative. Shared infrastructure goes down hard. Someone spends weeks — sometimes months — in recovery mode, rebuilding from scratch under pressure. Grueling work that should be recognized for what it is: disaster recovery executed by someone who deserves a lot of credit.

The client communication: "We've completed a major security upgrade so that we can serve you better."

Not a failure. Not a recovery. An upgrade. The person who rebuilt the thing gets reframed from "saved us from a catastrophe" into a supporting player in a marketing narrative about continuous improvement.

This is where Sinek's framework lands with full weight. When the person at the top is being deceptive — even by exaggeration, even by omission — it filters down. Every engineer who knows the server wasn't actually attacked, every team member who knows the "upgrade" was a recovery — they all understand what the real values are. Not the ones on the wall. The ones in the emails.

And once that framing is normalized, it filters into everything. Every incident gets a spin pass before the client sees it. Postmortems become performance pieces. The institutional memory — the documentation, the incident history — becomes unreliable because it reflects what was communicated, not what happened. You can't learn from incidents you've rewritten. You can't improve processes that your records say were fine. This isn't hypothetical — a Keeper Security study found that 41% of known cyber incidents weren't reported internally to management, largely due to fear and cultural pressure. The spin starts small, but the underreporting becomes systemic.

What honest communication actually sounds like

It's not complicated:

A configuration issue caused degraded performance for approximately two hours this morning. The root cause was resource contention — the server was handling more concurrent traffic than its current allocation supports. We've increased the allocation and are monitoring to confirm stability. We're also reviewing the provisioning for your other services to make sure they have adequate headroom.

What happened. Why. What you did. What you're doing to prevent it next time. No villain, no hero narrative. Just a clear account that treats the client as a competent adult who can handle the truth about their own systems. This pattern — acknowledge, explain impact, detail actions, outline prevention — shows up in every serious framework for incident communication. It's not a novel idea. It's just rarely practiced.

Clients who get honest incident reports develop confidence that when you say things are fine, things are actually fine. Clients who get spin learn to treat every communication as potentially unreliable — including good news. As ilert puts it in their MSP incident management guide: avoid vague terms like "working on it" — clients should always feel they're kept in the loop with meaningful updates. Even a "no change" update reassures clients the issue is being actively worked on. Substance over cheerfulness.

Trust erodes before anyone says anything

The biggest cost of this pattern is invisible until it isn't. You might think you're protecting the client. Maybe it's a self-image thing. Maybe you want the organization to seem more capable than it is. But every inflated incident report quietly erodes trust capital. The client may not realize it at first — but eventually it compounds, and eventually it comes back.

People are better at detecting inauthenticity than we give them credit for. They might not know what you're hiding. But they know something doesn't add up. And they're filing it away, waiting for the pattern to confirm itself. Even Uber's infamous attempt to cover up a breach proved the point — concealment always costs more than disclosure.

I'd rather be direct with a client about a bad day than eloquent about a fictional one.

One caveat: this post is about the principle — being direct with clients about what happened and why. There's a separate, harder question about where honesty meets legal exposure. Incident reports are discoverable documents. There's a meaningful difference between "we had a configuration issue" and "we were grossly negligent in our provisioning" — both might be true, but they carry different legal weight. That tension — how to be honest without writing your opposing counsel's opening argument — deserves its own treatment, and I'll be writing about it soon.

For more on how communication failures corrode institutional memory: Every Management Failure Is a Retrieval Failure. For how I build integrity standards into systems that can't drift: The Governance Documents.

Sources: Simon Sinek — "Honesty Is NOT a Value" (values are behaviors, not declarations) · Keeper Security / IT Brew — "Cyber Attacks Are Grossly Underreported" (41% of known incidents unreported internally; 43% cite fear of consequences) · eMazzanti — "Transparent Communication After a Security Breach" (acknowledge → explain impact → detail actions → outline prevention framework) · ilert — "Incident Management for MSPs Guide" (avoid vague "working on it" updates; substance over reassurance) · Blackfog — "Is Transparency Important Beyond Compliance After a Cyberattack?" (Uber cover-up case study; concealment costs vs. disclosure trust) · FireHydrant — "A Practical Guide to Incident Communication" (clear language, empathy, audience-tailored updates)

Every Management Failure Is a Retrieval Failure

alex@mipyip.com (Alex van Rossum) — Wed, 01 Apr 2026 00:00:00 GMT

Management failures aren't knowledge failures. The information existed. The system couldn't deliver it to the right person at the right time. 'Communicate better' is blaming humans for broken infrastructure.

A CRM migration goes sideways. The client agreed to a platform switch during discovery — moving from a legacy system to a modern stack — but nobody verified that the client's stakeholders understood what "platform switch" actually meant in practice — a discovery gap that would compound through the entire engagement. The business analyst coordinating between the client and the development team doesn't have deep technical background, so requirements arrive translated into approximations. Close enough to act on, not precise enough to build against. The development team builds what they were told. The client expected something else.

Six weeks in, the client's primary contact goes quiet — other priorities, internal reorg, the usual. When they re-engage, they're looking at something they didn't ask for, built on assumptions nobody validated. The one senior engineer who could have caught the misalignment early was splitting time across four other active projects and only got pulled in when the integration tests started failing.

At the postmortem, a detail surfaces: the client had submitted a detailed list of issues — workflows that didn't match their process, fields mapped incorrectly, reports that pulled the wrong data. The list went to the BA. The BA never escalated it. The engineering lead didn't know it existed. The project sponsor didn't know it existed. The issues lived in an email thread that reached exactly one person, and that person didn't have the technical context to assess severity or the process to route them.

The conclusion was "communication breakdown."

The information existed. Someone knew. Multiple people, actually. The BA had the issue list. The client had the concerns. The discovery document had the original agreement. None of it reached the person who could act on it, at the moment they needed to act.

Every piece of information needed to prevent the failure was already captured. The system just couldn't deliver it.

That's a retrieval failure — not a knowledge failure, not a competence failure. The information existed inside the organization. The system had no mechanism to surface it to the right person at the right time. And it happens everywhere, constantly, at every scale.

The pattern that keeps showing up

The incident changes — missed deadline, shipped the wrong thing, client blindsided by a decision they thought they'd weighed in on — but the postmortem converges on the same shape. The knowledge was captured. It lived in a Slack thread from two weeks ago, a task comment in the wrong project, a meeting summary that went to the wrong distribution list. The system had the data. The system couldn't deliver it. KM research has been documenting this for decades — the same causal factors keep appearing: inadequate organizational structure, improper planning and coordination, problems with culture, and technology implementations that prioritize capture over retrieval.

This is why status meetings exist. The retrieval system is broken, and humans are patching it manually. "Going around the room" is a retrieval protocol. It's just an expensive, slow, and unreliable one. And when the standup drifts — when it becomes forty-five minutes of self-congratulatory posturing or water-cooler tangents because the person running it isn't zealous about forward motion — even that manual patch stops working.

The pattern holds at every scale. A solo founder who can't remember where they left off on a project last Tuesday? Retrieval failure. A two-hundred-person engineering org where marketing doesn't know what the development team shipped last week? Same failure mode, different blast radius.

Three ways retrieval breaks

Retrieval failures often fall into one of three patterns, and all three were present in that CRM migration.

State scattered across tools.

The project lives in Jira, the decision lives in email, the context lives in someone's head. No single source of truth, so every handoff requires a human to manually reconstruct context.

In the migration project, there was no central repository for documentation — artifacts lived in various discrete locations, and the BA didn't have governance documents or clear project folders for managing them. If you wanted the full picture, you had to assemble it yourself from fragments, and nobody had time to do that.

Knowledge locked to individuals.

Ask Sarah, she knows how that works. Sarah is a single point of failure with an unassuming job title, and nobody realizes it until she's on vacation during an incident. Sarah might leave, and then what? Your org chart doesn't show this — it shows reporting lines, not where institutional knowledge actually lives.

In the failed migration project, the one senior engineer who understood the integration architecture was splitting time across four other projects — brought in only when something was already broken, never embedded in the workflow early enough to prevent the break. Constant context-switching, and then an expectation to dig through wildly disorganized task comments to find the state of a project they'd barely been aware of, and were never consulted on at any point before kickoff. An architecture failure masquerading as a staffing decision.

Retrieval requires the requester to know what to ask for.

If you don't know a decision was made, you can't look it up. The system only works for people who already have context.

That issue list the client submitted? The engineering lead didn't know it existed, and you can't retrieve what you don't know is there — and a system that depends on someone remembering to surface information is a system that will silently drop critical knowledge every time someone forgets, gets busy, or doesn't realize the information matters.

All three are architecture problems that are commonly identified as people problems.

Documentation without retrieval is theater

In the words of the great, late Admiral Ackbar: "It's a trap."

But it looks like a solution.

The organization sets up the wiki, writes the runbooks, creates the project folders — and then thinks the job is done. The information is documented. It's all right there. Anyone can go look it up.

Except, no one ever does, because looking it up has a cost: the activation energy required to context-switch, open the right tool, navigate to the right page, parse through noise, and extract what you need. Research on task switching shows that even brief mental blocks created by shifting between tasks can cost as much as 40% of someone's productive time — and the costs increase with task complexity. When that cost is high enough, people make decisions with incomplete information instead (not all that dissimilar to an LLM confabulating content when it lacks context). Not because they're lazy or unmotivated — because the friction of retrieval exceeds the perceived risk of guessing. And most of the time, the guess is good enough.

Most of the time.

In the migration project, even if every artifact had been perfectly documented — the discovery agreement, the client feedback, the issue list, the integration constraints — it wouldn't have mattered without a mechanism to push that information to the right person at the right time. The BA had the issues. The documentation existed. Nobody else saw it, because nobody else knew to look, and there was no process that forced it to surface.

Documentation without retrieval is performative execution.

You did the thing that looks like governance without building the mechanism that makes it work. And when the project fails, you can point to the wiki and say "it was all documented." Which is true... and also completely beside the point.

Cadence as retrieval infrastructure

Governance sticks when someone on the team is zealous about forward motion. Not rigid, and not necessarily bureaucratic, but zealous in the sense that they live for making sure things progress — a project manager who cares about the process not because they care about the specific thing being built, but because they care about completion, clarity, and constant progress toward a goal.

Functionally, that person is a retrieval engine — the human version of the "going around the room" manual patch, except this one actually works because someone zealous is driving it. A mechanism that surfaces information at regular intervals — sprint reviews, standups, retrospectives — whether anyone remembers to ask for it or not. When the standup is run well, it's a forced retrieval event. Information surfaces on a cadence.

The alternative is hoping someone remembers to send the email.

Or check their email.

The structural requirements are straightforward, and achievable:

Forced documentation at task boundaries — not as a bureaucratic checkbox, but with enough agency that people understand why it matters, because mandating documentation without a valid reason breeds resentment. Regular cadence checkpoints run by someone who keeps them tight and focused.
A queryable knowledge base — not just a loose collection of files in a folder, but something you can actually search and get answers from.
Present leadership, not the aloof and distant kind. People who are invested in the day-to-day, every day, not just the quarterly review.

None of this is new or particularly groundbreaking. The principles of good project governance have been documented for decades. Yet organizations keep failing at them anyway — the system requires humans to maintain it, and humans drift. Good people with good intentions will still let documentation go stale, skip the retro when the sprint runs long, and make decisions from memory when the retrieval cost feels too high.

So - do you build infrastructure that accounts for that drift — or continue telling people to communicate better?

The machine that doesn't drift

If you're familiar with this site at all, you knew we were going to get to AI.

I manage a fleet of AI agents across multiple projects. Each one has governance documents — the equivalent of an employee handbook that defines standards, patterns, boundaries, and institutional memory. The difference between an AI agent and a human team member is that the agent follows the governance documents every single session, without drift, and without exception.

I built the retrieval infrastructure to be mandatory. The agent reads its governance docs during the welcome sequence at session start. It logs what it did when sleeping at session end, and documents every task as it goes. There's no option to skip it — the process is embedded in the workflow, not bolted on as an afterthought.

The result is that I can walk away from a project for weeks, come back, and the agent knows the current state — what was done, what's pending, what broke. And this isn't a result of the agent possessing perfect memory (it doesn't — context windows are finite, and closing sessions can be like firing a close acquaintence), but because the retrieval system doesn't depend on anyone remembering to update it — the system updates itself.

I've pushed this even further with a tool called pmem — a local RAG layer that indexes project documents, notes, governance, and lessons learned, and makes them queryable in natural language. Instead of searching through files to find a decision that was made three weeks ago, the agent just asks, and the retrieval system delivers the answer. The cost of retrieval drops to near zero (in tokens, time, and accuracy). And when the cost of retrieval is zero, retrieval actually happens — consistently, every time, without someone having to decide it's worth the effort.

I even use this process to track projects that AI never even touches, outside of managing the state.

You don't need AI to do this. Any system that makes querying state cheap and automatic will produce the same effect. The principle is the same whether the retrieval engine is an LLM, a well-structured dashboard, or a zealous project manager with a clipboard.

I built Panoptisana — an open-source Asana visibility tool — because Asana buries the very data you need to manage a project: task status, blockers, overdue items, the actual state of things, and GIDs for various elements, which are a requirement for effective automation. The information existed inside Asana, but retrieving it required navigating through nested projects, expanding task comments, trying to remember which project a task was in, and mentally assembling a picture from fragments.

The cost was too high, so people stopped doing it. Panoptisana surfaces that data in seconds. Panoptisana combined with my Claude Code PM instance and pmem is almost an unfair advantage.

Same problem, same fix: lower the cost of retrieval until it actually happens.

The fix isn't "communicate better"

When someone says "we need to communicate better," what they almost assuredly mean is: the system we're using to store and retrieve shared state doesn't work, and we're blaming the humans instead of the infrastructure.

You could have the best communicators in the world on your team and still fail if the inputs are incomplete, the context is scattered across six tools, and the activation energy required to assemble a complete picture is higher than the activation energy required to just guess. Even when someone does communicate clearly — direct language, no hedging, the right level of urgency — it doesn't matter if the person receiving it has to do archaeology before they can act on it. Parsing through HTML entities in a forwarded email, chasing context across projects that aren't linked, reconstructing a timeline from disorganized task comments intermixed with Teams chat threads. An often insurmountable retrieval tax on every single action.

The organizations that succeed in this arena don't have better communicators. They have functional retrieval infrastructure — systems, processes, cadence, and solid project documentation — that surfaces the right information at the right time regardless of whether any individual human remembers to do it manually. Building that infrastructure is what fractional CTO work looks like in practice — diagnosing where retrieval fails and architecting the fix.

Communication is a retrieval protocol.

Fix the protocol.

Sources: Multitasking: Switching Costs — APA summary of Rubinstein, Meyer & Evans (2001), Journal of Experimental Psychology: Human Perception and Performance (task switching costs up to 40% of productive time) · Knowledge Management Failure — synthesis of causal and resultant KM failure factors · Preventing Mistakes Through Effective KM Strategies — KM Institute (operational errors from lack of access to accurate information) · Facilitate the Daily Scrum — Scrum Alliance (standup facilitation as retrieval mechanism)

For the architectural methodology behind building retrieval systems: Governance Is Architecture. For the specific documents that make this work: The Governance Documents. For how retrieval architecture applies to cognitive load: Cognitive Offloading. For the ownership question that arises when your retrieval infrastructure becomes portable: Cognitive Property. For what happens when retrieval failure applies specifically to the burnout signal in a technical keystone: How Technical Keystones Burn Out.

Want to know how your organization's retrieval infrastructure actually stacks up? The CTO diagnostic scores you across eight operational domains in about two minutes.

Latency Kills Curiosity

alex@mipyip.com (Alex van Rossum) — Mon, 30 Mar 2026 00:00:00 GMT

A manager once told me that taking a five-minute brain break every 35 minutes was part of the reason W2 employees are inefficient. That statement broke something in how I understood work. This post is about what I rebuilt in its place.

"Taking a five-minute break every thirty-five minutes is part of the reason W2 employees are inefficient. Contractors don't bill you for that. They only bill you when they're working."

A manager said this to me, out loud, about the Pomodoro technique. A researched, documented productivity method that exists because sustained focus without breaks degrades performance. Whether intended or not, this statement communicated that his position was that rest is waste, and the proof was that contractors don't charge for it (they do, obviously, they just fold it into their rate — but the belief was sincere).

I carried that belief — that doing anything other than work while at work was a moral failing — for longer than I should have. That the inability to sustain eight continuous hours of output meant something was wrong with me, not the expectation.

It took a very long time to understand that this wasn't a discipline problem. It was a systemic one.

The three-second rule and its cousins

There's a well-documented principle in web performance: if your page takes more than three seconds to load, roughly half your visitors leave. They don't complain. They don't file a bug report. They just go somewhere else. The curiosity that brought them to your page evaporates before the content arrives.

This site scores 100 on Google's PageSpeed Insights. That's deliberate. Not because I'm chasing a number (well, ok, maybe a little), but because I believe that the space between "I want to see this" and "I can see this" should be as close to zero as possible; every millisecond of friction is a tax on the visitor's willingness to engage.

The same principle operates at every other scale. It just moves slower, so the damage is harder to see.

When the latency is your calendar

In a work environment, latency looks like this: you have an idea. Maybe it's a better way to handle a deployment pipeline, or a pattern you noticed in client data, or a fix for a years-old page builder quirk, or a tool that could save the team hours per week. But there's no time to explore it. The sprint is full. The backlog is groaning. There's a client call in twenty minutes and you haven't prepped.

So the idea goes into a notebook. Or a sticky note. Or the back of your head, which is already holding forty other things it can't afford to drop.

A week later, the idea is gone. Not because it was bad, but because the environment provided zero space between "I'm curious about this" and "I can explore this." The latency was too high, and curiosity bounced.

Multiply that by months. By years. What you get isn't laziness or disengagement; it's something sinister and quieter and more corrosive: curiosity that's stopped aiming at the work. Stagnation. Rote repetition. Work that technically gets done, but without intent or clarity. The lights are on, the engine is running, but nobody's driving.

Curiosity doesn't disappear, it redirects.

The person who can't explore within their work starts exploring outside of it. They get distracted more easily. They disengage from tasks that used to hold their attention. They start to resent the work itself, not because the work changed, but because it became the barrier between them and the thing their brain actually wants to do.

This looks like a performance problem from the outside. From the inside, it feels like suffocation.

The risk isn't that people stop working. The risk is that they stop caring about the work. And once that happens, the quality degrades in ways that are invisible until something breaks.

When the latency drops

When I give myself permission to wander — to follow a tangent, to explore something adjacent to the task, to take the five-minute break that a manager once told me was proof of inefficiency — the quality of my work goes up. Measurably.

Connections form between domains that seemed unrelated. Solutions appear for problems I wasn't actively working on. The tension and stress of sustained "action mode" drains away enough that the next focused session is sharper, faster, and more intentional.

This isn't a productivity hack or a wellness platitude. It's architecture. The same way a well-designed system has buffer capacity and graceful degradation paths, a well-designed work pattern has margin for the brain to do its background processing. Remove the margin and the system still runs, but it runs brittle. One unexpected load and something cracks.

Tangential exploration is a feature

The instinct in most organizations is to treat tangential exploration as waste. If you're not directly producing output against a defined task, you're not working. This is the same logic that produces the Pomodoro comment — the belief that every minute not spent in direct production is a minute lost.

It's also the logic that produces teams who can execute tasks but can't solve problems. Who can follow a sprint plan but can't tell you if the plan is aimed at the right target. Who deliver exactly what was asked for and nothing that was needed.

Allowing space for exploration during work isn't a perk or a concession. It's how you get people who think instead of people who comply. And the difference between those two outcomes is the difference between a team that hits a problem and already has context, and a team that hits a problem and has to stop everything to figure out what's happening.

The footer line

At the bottom of this site, there's a line: "This site is intentionally over-engineered for speed, because latency kills curiosity."

When I wrote it, I was thinking about page speed; about not wanting a slow load time to be the reason someone didn't read a case study or explore the blog. That concept felt important enough to build the entire site around.

I didn't have the language for the rest of it yet. But the feeling was already there, had been for decades. That every system I've ever struggled against — the slow page, the packed calendar, the expectation of continuous output, the manager who thought breaks were waste — was all the same problem.

Friction between a person and the thing they want to explore. Latency. And enough of it kills the curiosity that makes good work possible.

The Craftsperson's Tools: Who Owns the AI Governance Docs You Built?

alex@mipyip.com (Alex van Rossum) — Sun, 29 Mar 2026 00:00:00 GMT

A carpenter's jig is inert without the carpenter. Your CLAUDE.md is not. When your reasoning process exists in transferable files, the ownership question changes — and the account you build on is the heuristic that answers it.

A carpenter who works at a cabinet shop builds cabinets that belong to the shop. That's the deal — your labor, their product. When the carpenter leaves, the cabinets stay.

But what about the jigs? The fixtures the carpenter built — perhaps on his own time — to make their work faster, more precise, more repeatable? The muscle memory of a particular saw angle, the eye for grain direction, the instinct for how this type of joint responds in humidity?

That's never been a question. The cabinets are the shop's. The skill is the carpenter's. The jigs occupy a gray area that usually resolves in the carpenter's favor — because a jig without the person who built it is just a piece of wood with holes in it. The knowledge of when and how to use it lives in the carpenter's head.

That distinction has held for centuries. The rise of Agentic AI (LLMs) is breaking it.

The jig that works without you

The previous post in this series argued that your AI governance frameworks — the CLAUDE.md files, the decision-making heuristics, the prompt patterns, the architectural preferences you've encoded into documents — are cognitive property. Your reasoning process, externalized into a transferable format.

So what makes that different from every prior version of "I take my skills with me when I leave?"

The carpenter's jig is inert without the carpenter. It's a tool that amplifies existing skill — useless to someone who doesn't already know the craft, and possibly cumbersome at best to another carpenter with their own methods. A governance document is not inert. Feed it to a fresh AI instance and you get a functional version of how you solve problems. Not perfect, but operational. The tool works without you in a way that a carpenter's jig never could.

That changes the ownership math. When the skill only existed in your head, the boundary was simple: work product belongs to the employer, expertise belongs to you. Nobody could take your expertise because it wasn't extractable.

Now it is.

It's sitting in one or more markdown files. And the question of who owns that file matters a lot more than it did when the equivalent knowledge was locked inside your nervous system.

The account boundary

The ownership question has a surprisingly clean heuristic. Not your laptop. Not your office. Not your work hours. The account.

If you're building governance frameworks on a corporate AI account — the company's subscription, the company's workspace, administered by the company's IT department — the company has a reasonable claim on what's built inside it. That's the cabinet shop. Their tools, their subscription, their infrastructure. The work product and the meta-work product (the frameworks that produce it) both live under their roof.

If you're building on your own account — your subscription, your money, your personal workspace — that's your workshop. The projects you deliver might belong to your employer, but the system you built to produce them is yours. The same way a freelance carpenter owns their personal tool collection regardless of which shop they're currently working in.

The same principle has governed trade work for generations: the employer owns the output, the craftsperson owns the tools. The new variable is that "tools" now includes documented reasoning processes that function without you.

The counterargument that doesn't hold

There's a predictable objection: "You wouldn't have developed those patterns if you hadn't needed to for this job." The argument being that the cognitive infrastructure is a derivative of the employment, and therefore belongs to the employer.

By that logic, everything you learn at any job is company property. The architectural patterns you internalized. The debugging instinct you developed. The leadership style you refined under pressure. Nobody seriously believes that — and courts have consistently upheld the distinction between work product and professional skill, even when the skill was developed entirely on company time.

The principle hasn't changed. You own your expertise. What's changed is the evidence. Before AI governance documents, your expertise was implicit — hard to quantify, impossible to transfer wholesale. Now it's explicit. It's a repo. And explicit things are easier to claim ownership of, in both directions.

Where you build matters more than when you build

Most employment agreements have an IP assignment clause. The standard version says something like: anything you create using company resources, during company time, or related to company business belongs to the company. These clauses were written for code and patents. They weren't written for the cognitive infrastructure that produces code.

(I'm not a lawyer, and none of this is legal advice. But the legal landscape here is worth understanding even if you never end up in a dispute.)

There's a meaningful legal distinction between "I built this feature for the company's product" (clearly theirs) and "I developed a methodology for how I approach all engineering problems, which I also used while building features for the company's product" (much less clearly theirs). The former is work product. The latter is professional development — it just happens to exist in a series of files now instead of only in your head.

Some jurisdictions have already drawn a version of this line. California's Labor Code §2870 says an employer can't force assignment of inventions developed entirely on the employee's own time without using the employer's equipment, supplies, or trade secrets. Courts have gone further — in Whitewater West v. Alleshouse, the Federal Circuit struck down assignment clauses that reached beyond active employment, and California law prohibits employers from claiming inventions created after employment ends. Not every state has these protections, and your jurisdiction matters. But the principle is established: own time, own resources, own tools.

The practical implication is uncomfortable but important: where you build your cognitive infrastructure matters. Not when, and not whether you were "on the clock." Whether the account, the subscription, and the workspace belong to you or to your employer.

A developer who builds their CLAUDE.md and governance frameworks on their personal Claude account, using their personal subscription, and then applies that methodology at work — that's a craftsperson using their own tools on the client's project. The projects belong to the client. The tools come home with the craftsperson.

A developer who builds their entire cognitive framework inside a corporate-administered AI workspace — that's a murkier situation. And most people aren't thinking about which one they're in.

The intentionality tax

None of this is automatic. The default — building everything in whatever account is most convenient, usually the corporate one — puts your cognitive property inside someone else's infrastructure. Not because they're trying to take it, but because you didn't think about where you were building.

The carpenter who shows up to a new shop with their own tools doesn't have this problem. They know which tools are theirs. The tools traveled with them from the last shop. There's no ambiguity.

I pay for my own AI subscriptions. I've never asked for reimbursement, and I've declined when it's been offered. The cost is the boundary. If the company pays for the subscription, the company has a reasonable claim on what's built inside it. Owning the account keeps the ownership question clean — and the annual cost of a Claude subscription is trivially small compared to the value of the cognitive infrastructure I've built inside it.

When I work on company-owned repositories, the governance files travel with me but they don't live there. Core methodology — the CLAUDE.md, the architecture docs, the task management structures — lives in my own repos, on my own GitHub account. What lands in the company's codebase is either untracked, loaded via symlink from my personal infrastructure, or an intentionally minimal version that lets the project function without exposing how I think. The full methodology stays in my workshop.

Custom tools get built on personal time, on personal accounts. Small bugfixes during work hours, sure — the same way a carpenter might sharpen a blade between cuts. But the jig itself gets designed and built at home.

That's the intentionality tax. It's a real cost — maintaining two contexts, paying for your own tools, thinking about where every file lives. Most people won't pay it. But the alternative is discovering the ownership question after someone else has already answered it for you.

The work product stays. The operating system comes with you.

The boundary is the same one it's always been, updated for a new medium. You leave the cabinets at the shop. You take the tools.

But the tools have changed.

They're not just muscle memory and instinct anymore — they're documented, structured, portable cognitive frameworks that work without you in the room. That makes them more valuable, more extractable, and more worth knowing about.

I'm not telling you to go set up a separate account tonight. I'm telling you to look at what you've already built and know where it lives. The governance documents, the prompt patterns, the architectural preferences you've encoded into files that make your AI work like you — where are those? On whose infrastructure? Under whose terms of service?

The answer might be fine. But it should be deliberate.

The next post in this series is about what happens when that awareness comes too late — when your documented cognitive patterns become something a company can copy, scale, and redeploy without you in the room.

This is part two of a four-post series on cognitive property. The next post explores what happens when the boundary isn't respected.

Employment law & IP

Understanding the Work Made for Hire Doctrine — Venable LLP. How courts distinguish employer-owned works from the employee's underlying skills and know-how.

IP Clauses in Employment Contracts and Assignments — Briffa Legal. Overview of standard IP assignment clause patterns and the "in course of employment" test.

Employee invention statutes & case law

California Labor Code §2870 — Employee inventions developed on own time, without employer resources, can't be force-assigned.

Employment Law and Patent Law Collide — Hunton Andrews Kurth. Federal Circuit strikes down overly broad assignment clauses in Whitewater West v. Alleshouse (2020).

California Law: Former Employer Cannot Require Assignment — Manatt, Phelps & Phillips. CA prohibits employers from claiming post-employment inventions.

I Used WordPress for 20 Years and I Was Wrong

alex@mipyip.com (Alex van Rossum) — Sun, 29 Mar 2026 00:00:00 GMT

Twenty years of WordPress. Thousands of hours optimizing page builders, caching layers, and plugin conflicts. Then I switched to Astro and hit perfect PageSpeed scores on the first deploy.

I started building websites on WordPress around 2005. I was also using Joomla at the same time — which, if you've ever used it, explains why WordPress won that particular contest quickly and decisively.

For twenty years, WordPress was the answer. Personal sites, corporate sites, everything in between. It had plugins for anything you could imagine, a theme for every aesthetic, and a community that could solve any problem you ran into. For a long time, it genuinely worked.

Until it didn't. And it wasn't all at once, not a dramatic failure. It was more like twenty years of paper cuts that finally bled out.

The Smart Fridge Problem

WordPress carries the weight of everything it can do, whether you need it or not.

The admin dashboard is a full application. Login system, user management, media library, plugin architecture, database abstraction, REST API, cron jobs — all of it running on every page load, for every visitor, whether your site needs any of it or not. For most sites — and I mean the vast majority — none of that matters. Your marketing site doesn't need a login system. Your blog doesn't need a database. Your landing page doesn't need a REST API.

But you're paying for all of it in server resources, attack surface, and complexity.

Page builders have a tendency to make it worse. Elementor, Divi, WPBakery — they ship every capability they offer to every page, regardless of what you actually use. Need a simple two-column layout? Here's 400KB of JavaScript that also handles parallax scrolling, animated counters, and particle effects. Just in case.

It's like buying a refrigerator with a built-in screen that tracks your grocery inventory and auto-orders milk when you're running low. If you don't use grocery delivery services — and most people don't — you just paid an extra $800 for a screen that collects fingerprints and shows a fancy animation when you get water. The fridge still keeps things cold; the cold part was never the problem.

And underneath all of this: PHP. In 2026, the entire ecosystem still runs on PHP. It works, the way a lot of things that are decades old still work. But the gap between what's possible now and what PHP was originally designed for gets wider every year.

The PageSpeed Insight

I'm obsessive about PageSpeed Insights scores. They're a proxy for the thing that actually matters — how your site feels to real people on real connections.

My best WordPress score — ever, across twenty years — was a 97. Desktop only.

Getting there required a minimal theme (Twenty Twenty-One), Gutenberg blocks instead of a page builder, three plugins total, hours of manual optimization, server-level OPCache configuration, and Cloudflare caching. One wrong plugin update could knock ten points off overnight, and it still had intermittent issues.

That was the ceiling. On a good day, with everything perfectly tuned (for hours), 97.

Here's a WordPress site I know well. It's hosted on WPEngine — premium managed hosting. It's had dozens of hours of professional optimization work. It runs a page builder that specifically markets performance as a feature.

Mobile: 41 / 86 / 77 / 85. Desktop: 55 / 84 / 77 / 92.

And here's this site — the one you're reading right now.

Mobile: 99 / 95 / 100 / 100. Desktop: 100 / 95 / 100 / 100.

No optimization heroics. No caching plugins. No CDN tricks. Those scores showed up on the first deploy and stayed there. The framework just... ships fast HTML.

Look at those numbers side by side, and you'll reach the same conclusion I did: This isn't a tuning problem.

Why Astro

The framework I ultimately switched to is called Astro. It's a static site generator — meaning it builds your entire site into plain HTML files at build time, and that's what gets served to visitors. No server, no database, no PHP — just files on a CDN.

Three things made it the right fit:

Zero JavaScript by default. Astro ships no client-side JavaScript unless you explicitly add it. Most marketing sites don't need JavaScript at all — they're documents, not applications. Astro treats them that way.

Markdown as a first-class content format. Every blog post on this site is a Markdown file — plain text with lightweight formatting. I opted to use Astro's component system for the other pages, but I didn't have to. The entire site could be Markdown if I wanted it to be. No visual editor needed, no database, no lock-in. If you've ever written a README on GitHub, you've written Markdown.

Framework-agnostic. Astro doesn't force you into React, Vue, or any other JavaScript framework. You can use them if you want. You can also use none of them. For a marketing site that's mostly content, "none" is the right answer.

That last point is what delivers the PageSpeed scores. No framework overhead means no framework tax on every page load.

Markdown Is How I Already Think

I'd been writing in Markdown for years before I built this site. Outline, Notion, Obsidian — every notes tool I've used in the last decade speaks Markdown natively. My project documentation is Markdown. My random notes are Markdown. The styling shortcuts are second nature at this point — I see the content formatted when I see the symbols. I don't need a visual preview to know what a ## header or a **bold phrase** looks like rendered.

Markdown files are absurdly portable. Plain text that any parser can consume, any system can render, any tool can index. They load instantly. They version-control perfectly. They'll be readable in fifty years because they're just text.

The content format was never the problem. The tooling around it was the problem. Before AI, maintaining a site built from Markdown files meant manually writing templates, building components, managing routing, handling image optimization — the kind of tedious infrastructure work that made WordPress's "just install a plugin" approach genuinely appealing.

The AI Multiplier

I work with a Claude Code agent that has full context on this site's codebase — the architecture, the design standards, the content strategy, the voice. We work in the source files simultaneously.

This blog post is a good example of what that workflow looks like:

I had a seven-word idea: "blog post about why I love Astro." The agent created the notes file. Then it interviewed me — one question at a time, conversational, pulling out details I wouldn't have thought to include in an outline. It compiled the raw interview into structured notes. I reviewed, added context, corrected emphasis. It drafted. I edited. We polished the final version side by side in SideMark — a Markdown editor I built specifically for this kind of collaborative workflow.

The whole pipeline — from idea to draft with images — takes a fraction of what it used to. The bottleneck is my thinking speed, not my typing speed.

None of that workflow is possible with WordPress. You can't point an AI agent at a WordPress database and say "work with me on this post." But you can point it at a folder of Markdown files with a clear architecture document and watch it understand the entire system in seconds. Add a local semantic memory layer like pmem and the agent can recall decisions, patterns, and context from months of previous sessions — no re-explaining needed.

The combination — Markdown content, static site generator, AI-assisted development — turns "I should update my website" from a weekend project into a Tuesday morning. Some posts on this site went from idea to published — with images and scheduled LinkedIn posts — in under twenty minutes. And they're still my words, my thinking. The combination of Astro and AI just removed the friction between having something to say and saying it.

"But My Ecommerce Is on WordPress"

I can already hear it. "My store runs on WooCommerce. I can't separate my marketing site from my ecommerce."

You can, and you should.

Your marketing pages and your ecommerce platform have fundamentally different performance requirements. Marketing pages need to be fast — fast enough that Google ranks them, fast enough that visitors don't bounce, fast enough that your PageSpeed scores aren't embarrassing when a potential client checks. Ecommerce pages need to be functional — cart logic, payment processing, inventory management, user accounts.

Bundling them together means your marketing pages carry the weight of your ecommerce platform on every load. Your beautiful landing page is slower because it's sharing infrastructure with your checkout flow.

Keep your ecommerce where it works — WooCommerce, Shopify, whatever you've built. Put it on a subdirectory. Build your marketing site as a separate, blazing-fast static site, and surface product data, cart state, and key behaviors to the marketing side through JavaScript and session management. To the visitor, it looks seamless. Under the hood, each part is optimized for what it actually needs to do.

Separation of concerns — it's an engineering principle that applies to site architecture just as well as it applies to application code.

Twenty Years Is a Long Time to Be Wrong

WordPress powers a huge portion of the web and it does what it does. I'm not here to bury it (and I wouldn't want to if I could). But after twenty years of paper cuts, accumulated complexity, and a performance ceiling that required heroic effort to approach — I had to ask myself whether I was still using it because it was the right tool, or because it was the familiar one.

The willingness to evaluate your tools honestly — even the ones you've invested decades in — is the difference between building systems that serve you and serving systems you've already built.

Not sure whether your WordPress setup is costing you more than it should? The WordPress diagnostic runs a live audit of your site's performance, security, and technical health in about two minutes.

Local RAG for Claude Code: Semantic Search Over Your Own Project

alex@mipyip.com (Alex van Rossum) — Thu, 26 Mar 2026 00:00:00 GMT

Every Claude Code session starts from zero. Grepping through hundreds of governance documents wastes tokens and misses semantic matches. pmem gives your agent persistent memory — local, private, and searchable by meaning.

More than five hundred markdown files.

That's what the project that drives this website has. ROADMAP.md, ARCHITECTURE.md, CLAUDE.md, CHANGELOG.md, task folders with notes and lessons learned, blog editorial notes, half-complete LinkedIn drafts, voice-to-text blog post concepts that are barely discipherable even by me, memory files from past sessions. Each one holds a piece of the project's history — a decision, a rationale, a thing that broke and how it got fixed.

Claude Code can't see any of it unless I point it at the right file — or it reads them on its own, burning tokens on retrieval before the real work starts.

Claude Code isn't completely amnesiac — it has session memory, it reads CLAUDE.md, and with the right governance documents it can recover a lot of context at session start. For smaller projects, that's enough. But this website, for example, has over five hundred files of accumulated institutional knowledge, and the gap between "what the agent can reasonably read at startup" and "what the project actually knows" grows wider every week.

So what do you do? You grep. You tell Claude to search for the thing you vaguely remember documenting somewhere. It reads files, scans for keywords, and — sometimes — finds what you need.

But with a large number of text-based files, it often doesn't. Because grep matches text, not meaning. If the answer uses different words than your question, grep misses it. If the context is spread across three files, grep finds fragments. And every file Claude reads to search for something is tokens spent on retrieval instead of the actual work.

I was spending significant portions of sessions just helping the agent find context it should already have had access to.

The prompt that built it

The Discovery Tax thesis applies to AI development as much as it applies to enterprise projects: the quality of what you build is directly proportional to the quality of what you specified before building started.

I want to show two prompts, because the contrast illustrates something I think a lot of people miss about working with AI agents.

The vague prompt:

"I want to give agents better memory."

This goes nowhere useful. No constraints, no architecture, no scope. The agent could build anything from a flat JSON file to a Kubernetes-deployed vector database with a React frontend. It would probably pick something in the middle and spend four hours building infrastructure you didn't need. You'd end up with code that (maybe) works but doesn't solve your actual problem — because you never described your actual problem.

The prompt I actually used (simplified and compressed into natural language for readability):

I need to enhance the memory capabilities of Claude Code. Since I use Claude Code for more than just writing code — managing tasks, building documentation, maintaining infrastructure — I can generate thousands of files and folders. While they do get archived regularly, digging through them is a token and time sink, and can sometimes prove inaccurate, especially with larger projects.

We will use Ollama embeddings and build a RAG that the agent can use to query the entire project's files.

The tool must also be able to connect to a local LLM (optional) in order to further reduce token usage when parsing results.

For now, we are going to be focused on TXT and MD files, and will expand as needed.

The difference isn't length. It's that the second prompt contains a discovery phase. It names the problem (token waste, inaccurate retrieval across large projects). It specifies the technology (Ollama embeddings, RAG). It defines the integration point (Claude Code, via MCP). It sets constraints (local-first, TXT and MD only). And it draws an explicit scope boundary — "for now" — which tells the agent what's out of bounds without killing future expansion.

That's not prompt engineering as a parlor trick. That's the same discipline you'd apply to a project brief for a human team. The agent doesn't need a better prompt template. It needs you to finish thinking before you start asking.

I would like to reiterate something, though:

This is the same discipline you'd apply to a project brief for a human team.

You have to know what you are asking for, how to build it, and how to work with the final product and understand how it works before you can get a functional, consistent result.

What pmem does

The flow is simple enough to describe in one sentence: Claude asks a question, pmem finds the answer in your project's files, and returns it with source citations.

Under the surface:

Indexing. pmem index walks your project's markdown and text files, splits them into semantic chunks using header-aware parsing (a section stays with its heading — it doesn't get split mid-thought), and embeds each chunk locally using nomic-embed-text via Ollama. Chunks are stored in ChromaDB, a file-based vector database that requires no server process. Indexing is incremental: SHA-256 hashes track which files have changed, so subsequent runs only re-embed what's new.
Querying. When Claude needs context, it calls the memory_query MCP tool with a natural language question. pmem embeds the question using the same model, searches the vector store for semantically similar chunks using cosine similarity (ChromaDB's default distance metric), and returns the top results with source file paths and relevance scores. Optionally, a local LLM (via any OpenAI-compatible endpoint) synthesizes the chunks into a concise answer before returning it — which saves Claude from processing raw chunks and reduces token usage further, and allows the user to interface with the pmem datastore directly.
Session rituals. Three slash commands turn memory into a workflow: /welcome reads governance documents and refreshes the index at session start. /sleep updates governance documents and captures session changes at session end. /reindex refreshes mid-session when files have changed. The index stays current because maintaining it is a side effect of the session workflow, not a separate chore.

No data leaves your machine. No API keys required for core functionality. The entire system runs on Ollama (for embeddings), ChromaDB (for storage), and Python.

Why not just use grep?

The semantic difference matters more than you might think.

Last week, in my Project Management Claude, I needed to find a task related to Salesforce, only the scope of the task had changed considerably since. The PM has somewhere close to 2,500 MD files, and the task folder was named based on the old scope, which I could only partially remember. I had initially asked Claude to try to find that task, so I could extract a Lesson Learned from the result, but Claude struggled, even with the date constraint, and eventually I halted the search and started digging through the (fortunately well-structured) folders myself.

I did find what I was looking for, but I realized that I shouldn't have to.

That's the difference between text matching and semantic search. And in a project with hundreds (or thousands!) of files, the questions you ask are almost never phrased the same way as the answers you wrote.

The token savings compound too. I ran the same query — "identify governance-related blog posts" — both ways on this project (500+ markdown files) and asked Claude to estimate the token cost of each approach:

	pmem (index-based)	Fresh search (Explore agent)
Results	18 posts	11 posts
Time	~20 seconds	~90 seconds
Token cost	~5,500	~20,000–24,000

The fresh search cost roughly 4× the tokens (cries in tokens) and found 7 fewer posts. The posts it missed were the ones where governance was a supporting theme rather than the headline — exactly the kind of semantic connection that keyword search can't make.

The agent's overhead — its own system prompt, tools, multi-step reasoning — is the hidden cost. It's worth it for open-ended exploration across a large codebase, but for a targeted retrieval question like "which posts mention governance," the index was both cheaper and more thorough.

And the speed difference shows up in real editing sessions, not just benchmarks. While drafting a blog post about switching from WordPress to Astro, I mentioned "separation of concerns" and wanted to cross-link to an existing post that covered the concept. I asked pmem. It found the right post, returned the slug, and the link was in place — all within about five seconds, without breaking the editing flow. That kind of retrieval latency is the difference between "I'll add that link later" and actually adding it.

Architecture decisions worth mentioning

No LangChain. Not out of ideology — out of simplicity. pmem is around 2,000 lines of Python. LangChain would have added a dependency tree larger than the project itself, for abstractions I didn't need. The RAG pipeline is: embed → store → search → (optionally) synthesize. That's four operations. They don't need a framework.

ChromaDB over everything else. File-based, no server process, persistent, and the Python API is clean. I considered LanceDB but never formally evaluated it — ChromaDB was already working, file-based, no server process, and the evaluation wasn't worth the detour. I also considered plain JSON with numpy cosine similarity, which works for small projects but doesn't scale — brute-force linear scan is O(n) per query, and once you're past a few hundred chunks the latency adds up fast compared to ANN-indexed alternatives. ChromaDB hit the sweet spot: real vector search without operational overhead.

Header-aware chunking. Most RAG tutorials split text by character count or sentence boundaries. That destroys semantic units. A section titled "Why we chose CloudFront over Fastly" that gets split between two chunks loses meaning in both. pmem's chunker uses markdown headers as natural split points, with a size-based fallback for sections that are too long, and each size-based chunk also receives a heading_path as well. The heading becomes metadata on each chunk, so search results carry their context.

CWD walk-up for project detection. Same pattern git uses: start in the current directory, walk up until you find a .memory directory. No config file needed to tell pmem where the project root is. pmem init creates the .memory directory, and from that point forward, any subdirectory just works.

The governance connection

pmem isn't a standalone tool, it's the persistence layer for a governance methodology that's been accumulating for months.

The governance documents — CLAUDE.md, ROADMAP.md, ARCHITECTURE.md, CHANGELOG.md — are designed to carry institutional knowledge forward across sessions. They work. But they work by requiring the agent to read them at session start, which means the agent has to know which files to read and those files have to stay within a readable size.

pmem removes that constraint. The agent doesn't need to read every governance document front-to-back at session start. It reads the critical ones (CLAUDE.md is always first), and for everything else — past task context, historical decisions, lessons learned, archived content — it queries pmem.

The /welcome skill indexes the project before the agent starts working. The /sleep skill captures changes before the session ends. The memory stays current without any manual intervention. It's cognitive offloading applied to the agent itself: the agent doesn't hold the project's history in its context window. It holds it in a searchable index and retrieves what it needs, when it needs it.

The pattern keeps showing up. The same principle that makes human productivity systems work — externalize what you can, retrieve what you need — applies to the agents that are supposed to be helping you.

Setup

Prerequisites: Python 3.11+, Ollama running locally, and the nomic-embed-text model pulled.

pip install pmem-project-memory
ollama pull nomic-embed-text

Initialize any project:

cd ~/your-project
pmem init
pmem index

Install the session skills:

pmem install-skills

Register the MCP server in ~/.claude.json (global) or .mcp.json (per-project). The README has the exact config block.

First index takes a few seconds for small projects, up to a minute for large ones. After that, incremental indexing only re-embeds changed files — typically under a second.

What's next

Phase 2 is mostly complete: pmem watch for auto-reindexing, global config defaults, one-command skill installation, better error messages. Phase 3 is where it gets interesting — multi-collection support (separate indexes for different content types), non-markdown file support with language-aware chunking, optional image processing and chunking the results, either with Claude or a LocalLM vision-capable model, and pmem diff to show how answers change over time.

The tool is open source, MIT licensed. It exists because I needed it, and I suspect anyone running Claude Code on a project with more than a few dozen files needs it too.

GitHub · Product page

The governance methodology pmem supports: The Governance Documents. The cognitive offloading framework: Cognitive Offloading. The prompt-engineering-as-discovery principle: The Discovery Tax. The full Pass@1 methodology: What Is Pass@1?.

Sources

Vector search & distance metrics

ChromaDB — Distance Functions — cosine similarity as default distance metric in ChromaDB
ANN Benchmarks — Aumüller, Bernhardsson & Faithfull. Benchmarks comparing brute-force linear scan against approximate nearest neighbor algorithms (HNSW, IVF, Annoy). The standard reference for why indexed vector search outperforms brute-force at scale.

Why Your Page Builder Site Goes White After a Cache Purge

alex@mipyip.com (Alex van Rossum) — Tue, 24 Mar 2026 00:00:00 GMT

Page builders write CSS on first render. Cache purges delete the HTML that references it. The next visitor gets unstyled content. This is the architecture of that failure and the three-phase fix.

A client calls, they are upset, and their site is broken. The homepage is a wall of unstyled text, or it's completely white. They didn't touch anything. The hosting company didn't touch anything.

Nobody touched anything.

What happened is a cache purge. Maybe WP Rocket ran its scheduled clear. Maybe someone updated a plugin and LiteSpeed invalidated everything. Maybe Cloudflare's cache expired. Maybe Divi decided it didn't like it's current CSS stack and said "I'm deleting all of them." The cause doesn't matter — the symptom is the same: visitors are seeing a site without CSS.

And it fixes itself on (hard) reload. Sometimes. Which makes it worse, because now the client thinks it was a fluke and the developer can't reproduce it.

It's not a fluke. It's a structural problem in how page builders interact with caching layers, and it has three distinct failure modes.

The architecture of the problem

Page builders — Divi, Elementor, Beaver Builder, Bricks, Oxygen — generate per-post CSS files on first render and store them on disk. Divi puts them in et-cache. Elementor uses /elementor/css/. Beaver Builder uses bb-plugin/cache. The specifics vary, but the final pattern is the same: CSS doesn't exist as a static file until someone visits the page.

This is fine under normal operations: the first visitor triggers the render, the CSS is written, and every subsequent visitor gets it from disk. The caching plugin serves the HTML from cache, the browser fetches the CSS, everything works.

Until the cache purges (or Divi decides it doesn't like the CSS stack).

Failure Mode A: Origin cold start. The page cache clears. The next visitor triggers a fresh PHP render. The page builder starts writing CSS for that page during the render. Everyone who arrives in the window between "cache purged" and "CSS written to disk" gets a 404 on the stylesheet. They see unstyled content.

Failure Mode B: Cloudflare serving stale HTML. Cloudflare has your HTML cached at its edge. That HTML references CSS files that no longer exist at origin. Cloudflare serves the HTML (fast!), the browser requests the CSS, Cloudflare tries to fetch it from origin, gets a 404. This happens even when your origin server is perfectly healthy. Cloudflare is doing exactly what it's supposed to — serving cached content. The problem is that the cached content is wrong.

Failure Mode C: Page cache and CSS out of sync. The server page cache is serving HTML that was generated before the most recent CSS regeneration. The HTML references old filenames. The new CSS files exist on disk, but the HTML is pointing at paths that have changed.

Three failure modes; three different layers. This is why "just clear the cache again" doesn't always reliably fix it — you're addressing one layer while the other two are still broken.

The fix is layered because the problem is layered

I built Page Builder Cache Guard to address all three failure modes in sequence.

Phase 1: Force origin warmup. After any cache purge event (or on a schedule, or manually), the plugin crawls every published page with a cache-bypassing query string. This forces the server to skip its page cache and run a full PHP render. The page builder writes all CSS files to disk. When real visitors arrive, the CSS is already there. The bypass token is randomized per run so CDN edges also treat each request as uncached.

Phase 1b: Server page cache purge. After regenerating CSS, the plugin purges the server-side page cache entry for the affected URL. This ensures the next request gets fresh HTML with correct CSS references — not the stale cached version that caused the problem.

Phase 2: Cloudflare cache purge (optional). If you have a Cloudflare API token configured, the plugin tells Cloudflare to drop its cached copies of your HTML and page-builder CSS. Only targets dynamic CSS paths — stable WordPress assets are left alone. Cloudflare re-fetches from your now-warm origin on the next real request.

The last line of defense

The three phases handle the proactive case: cache purges, you warm before visitors arrive. But what about the gap? What if a visitor hits the site before the warmup runs?

A client-side CSS health check handles this. About 600 bytes of inline JavaScript. After page load, it checks whether every stylesheet loaded successfully. If any returns 404, it calls a heal endpoint that runs Phase 1 + 1b + 2 for that specific URL, then reloads the page. A session storage guard prevents reload loops.

It's the architectural equivalent of a circuit breaker. The proactive system should catch everything. The health check is there for when "should" meets reality.

Why this matters beyond WordPress

This issue isn't unique to page builders, but they do seem to surface it frequently. It's a caching coherence problem — multiple layers of cache (origin page cache, CDN edge cache, CSS file cache) that can independently become stale and serve inconsistent content.

The same class of problem shows up in Kubernetes deployments where pod restarts clear in-memory caches while the CDN still serves old assets. It shows up in Jamstack sites where the build output and the CDN edge can briefly serve different versions of the same page. The failure mode is always the same: layered caches with independent invalidation timelines.

The fix is also the same class of fix: warm from the bottom up, invalidate from the top down, and add a client-side check for the cases your proactive system misses.

Free, open source, GPL-2.0. Works with any page builder that generates CSS on first render. Cloudflare integration is optional.

GitHub · Lab page

For the methodology behind building tools like this: What Is Pass@1?. For why governance documents make this kind of rapid build possible: The Governance Documents.

Running WordPress and wondering how much of this applies to your setup? The WordPress diagnostic checks your site's caching, performance, and plugin health — results in two minutes.

The Discovery Tax

alex@mipyip.com (Alex van Rossum) — Tue, 24 Mar 2026 00:00:00 GMT

Every 'just build it' directive transfers responsibility downward. The team inherits unclear goals, unvalidated assumptions, and a timeline built on wishful thinking. Discovery isn't overhead. It's insurance.

Discovery Tax

/dɪˈskʌv.ər.i tæks/ · noun

1. The additional cost, time, and organizational damage incurred when a project skips proper scoping and discovery. Paid upfront at a discount, or later with interest.

2. The gap between what was promised and what was possible — filled by the team, at their expense.

"Just build it."

Three words that sound like a decision but function as a transfer. The person saying them feels decisive. The team hearing them inherits every question that didn't get answered, every assumption that didn't get tested, and a deadline that has no relationship to the work it describes. It's the same pattern that shows up in premature green lights — confidence substituting for validation.

While there are many triggers for this, one common pathway is when someone commits to a timeline or a price before the technical reality has been assessed. When the assessment comes back and contradicts the commitment, the commitment wins. The promise already exists. The assessment is just an inconvenience.

"I've already promised them, so we have to do it."

That sentence is where discovery dies. Not in a planning meeting. Not in a budget review. In a hallway conversation where someone decides that the cost of disappointing a client today is worse than the cost of failing them in three months.

The transfer

There's a meaningful difference between two project kickoffs.

The first sounds like this: "We're going to build this app by December 18th."

The second sounds like this: "Over the next six weeks, we're going to execute on the following components, in this order, with these dependencies, as we work toward the final deliverable. Here's where we expect to hit friction. Here's the mitigation plan. Here's who owns what."

The first one fits on a sticky note, but the second one requires actual work before the work starts. Most organizations pick the sticky note because the second version forces you to confront how much you don't know yet, and confronting that is uncomfortable. It's easier to commit to a date and figure it out later.

The problem is that "figure it out later" has a cost, and it's always higher than figuring it out now. Barry Boehm's cost-of-change research showed that fixing a requirements mistake after delivery can cost up to 100x what it would have cost to catch it during discovery. A two-week discovery phase at the beginning of a project can prevent a two-month correction in the middle. But the two weeks feels like delay, and the two months feels like bad luck.

It's not bad luck.

It's the discovery tax. You pay it now, at a discount. Or you pay it later, with interest.

What wishful thinking looks like in a spreadsheet

A few years ago, a project landed on my desk that had already been promised to a client. Complex application. Six-week timeline. The rationale: "We already had something similar" and "it shouldn't be too hard."

When I looked at the actual scope, the minimum viable product was going to take longer than six weeks and cost roughly twice what had been quoted. I said this directly. The response: "I've already promised them, so we have to do it."

So we did it. Or tried to.

The project was marked by exactly the problems you'd predict: scope creep from day one (because the scope was never fully defined), miscommunications (because the deliverables were unclear), and a team stretched across too many shifting priorities with no framework for managing any of them. Every week surfaced a new assumption that hadn't been validated. Every new assumption required rework.

We missed every target. Over budget. Past deadline. The thing that was supposed to take six weeks dragged for months. Not because the team was incompetent, but because the foundation was built on optimism instead of analysis. Governance is architecture — and this project had neither. The Standish Group's CHAOS research found that only 16% of software projects succeed on time and on budget. The top causes? Incomplete requirements, changing requirements, and unclear objectives — exactly the things a discovery phase is designed to surface before the first line of code.

Everything that went wrong was predictable. I'd flagged it, in writing, before the first line of code was committed. And afterward, I was acknowledged exactly once for having been right. Then the same pattern repeated on another project, and another, and another. No further acknowledgment.

That experience is what drove me to get a Google PM certification and a Scrum Master certification. Not because I was asked to, or directed to; because I watched a project fail in exactly the way I predicted, and I wanted the vocabulary and framework to prevent it from happening again, or at minimum, to document why it kept happening.

What discovery actually buys

When discovery is done properly, the team experiences something that sounds simple but is remarkably rare: they know what they're building, why they're building it, and what "done" looks like.

That clarity changes everything downstream. Standup meetings have substance, decisions get made against actual criteria, and when scope creep appears (and it always appears), there's a documented baseline to measure it against. "That's a great idea, and it's out of scope for this phase" is a sentence you can only say if the scope exists in writing.

Good discovery also identifies where the project is going to hurt before the pain starts. Risk profiles, mitigation plans, RACI charts — these aren't bureaucratic overhead. They're how a team hits a problem and keeps moving instead of stopping everything to figure out what to do.

In practice, good discovery looks like stakeholder interviews before the kickoff, not after the first sprint. A constraints inventory that names the things you can't change. A risk register that identifies where the project will probably hurt, so the team isn't surprised when it does. A rough work breakdown that sequences dependencies instead of treating every feature as independent. And critically: an explicit list of what you're not building, because scope is defined as much by its boundaries as by its contents.

None of this requires months. A focused two-week discovery on a six-month project is a 6% time investment that de-risks the other 94%.

Discovery isn't a phase you complete and file away; it's the foundation that every subsequent decision rests on. Skip it and every decision downstream is a guess. You might guess right. But you're betting the project (and your reputation!) on it.

Realistic advice

Not everyone has the authority to demand a discovery phase. Sometimes the promise has already been made, the budget has already been committed, and you're the person who has to build what someone else sold. (This is one of the reasons organizations bring in fractional CTO-level guidance — someone with the authority and experience to insist on discovery before commitments are made.)

When that happens: document everything.

Not to assign blame, not to build a case against anyone. Document because when an under-scoped project fails (and under-scoped projects fail at a rate that would alarm anyone who tracked it), someone is going to ask why. And in that moment, you want to be the person who can say "I know exactly why this failed" rather than the person everyone else points at.

Document the scope as you understood it.

Document the gaps you identified.

Document the conversations where you flagged risks.

Document the decisions that were made anyway.

Build a paper trail that reflects reality, because reality is going to show up eventually, and when it does, the person with documentation has a factual record. The person without it becomes the default explanation.

I'd been overextended across too many domains, too many concurrent projects, too many responsibilities. I'd started to wonder if the problem was me. "Maybe I'm not good enough. Maybe a better engineer could handle all of this."

The documentation proved that the scope was the problem, not my competence. And once I could see the scope clearly, written down, across months of projects, the pattern coalesced. No individual contributor can compensate for a structural refusal to scope properly. The failure is architectural, not personal — and the damage compounds long after the project closes. That realization is what eventually led me to build systems that externalize knowledge continuously — not as a reaction to one bad project, but as a methodology for every project.

You can't build anything on hopes and dreams.

If you're not sure whether your technical leadership has the scope problem under control, the CTO diagnostic surfaces the gaps in eight domains — including scope management and organizational design.

Sources: RACI Chart: Definitions, Uses And Examples For Project Managers — Forbes Advisor Software Defect Reduction Top 10 List — Boehm & Basili, IEEE Computer (cost-of-change curve) The CHAOS Report — Standish Group (project success/failure rates)

The Green-Light Problem

alex@mipyip.com (Alex van Rossum) — Sun, 22 Mar 2026 00:00:00 GMT

A green light with unresolved checkpoints isn't a recommendation. It's a liability. This post covers the anatomy of premature platform migration recommendations, the 'not a blocker' trap, and why stage-gated validation saves money.

A strategy document lands on a client's desk. It recommends a major platform migration. The tone is confident. The structure is logical. The recommendation is clear: move forward.

Buried on page three, in qualified language, are a handful of caveats. The primary integration hasn't been validated with the client's actual data. The connector vendor hasn't been vetted beyond a cursory website scan, and a core data system that powers the existing workflow isn't mentioned at all. The timeline assumes everything works on the first try.

The client's leadership reads the document and sees a green light. The technical team reads the same document and sees open questions.

The gap between those two readings is where six-figure mistakes live.

The anatomy of a premature recommendation

It's not malicious. It's structural.

Someone does the real technical analysis. They flag risks, identify dependencies, note unresolved questions, and recommend a staged approach: validate the critical assumptions before committing to a full build. The analysis is honest about what's known and what isn't.

Then the analysis gets polished for client consumption, and the risks are softened. "This is an unvalidated dependency that could change the entire architecture" becomes "this will require thoughtful implementation." The staged approach gets flattened into a single "we recommend moving forward." The hard questions get cut because they might make the recommendation look uncertain.

The polished version isn't wrong, exactly. Everything in it is technically true. But it's selectively true in a way that systematically favors proceeding. The caveats might be present but soft; the confidence is high but unearned. And the client, who is paying for expert guidance on a decision they can't evaluate themselves, reads the document at face value.

That's The Green-Light Problem. Not a bad recommendation, but premature; a conclusion delivered before the evidence supports it.

The "not a blocker" trap

There's a specific phrase that shows up in these documents. It sounds reasonable and is often catastrophic:

"Not a blocker."

An integration with an unvetted vendor, connecting a legacy ERP system to a modern platform, handling thousands of customer-specific pricing matrices and complex approval workflows. The vendor's website has a case study. A sales engineer said it works. Nobody has tested it against the client's actual data, actual edge cases, or actual transaction volume.

"Not a blocker."

If that integration fails, the entire architecture changes. The timeline doubles, the budget triples, and the client is three months into a build when they discover that the foundation, around which the whole project was designed, doesn't hold weight.

Calling an unvalidated dependency "not a blocker" before vetting it is optimism bias dressed up as a technical assessment. It's the kind of language that makes strategy documents read well, and post-mortems read badly.

What "validated" actually means

There's a meaningful difference between "we believe this will work" and "we've proven this works." Strategy documents routinely conflate the two.

Validation is not:

The vendor says it works
We found a case study on their website
It works in a demo environment with sample data

Validation is:

We tested the specific integration with the client's actual data and edge cases in conditions that resemble the production environment
We documented what happened.

Validation costs money and takes time. It delays the exciting part of the project (the build) in favor of the boring part (the proof). And it is the single most valuable thing a technical advisor can recommend before a six-figure commitment.

The timeline cascade

A premature green light doesn't just risk a bad outcome. It creates a compounding timeline problem.

When the project starts with unvalidated assumptions, the team builds on those assumptions for weeks or months. When one of them turns out to be wrong (and they do, regularly, because that's what "unvalidated" means), the timeline doesn't shift by the time it takes to fix the problem. It shifts by the time it takes to fix the problem plus the time spent building on the assumption that turned out to be wrong plus the time spent unwinding the work that depended on it.

A two-week validation phase at the beginning can prevent a three-month correction in the middle. The math is simple: the two-week validation phase feels like a delay, and the three-month correction feels like bad luck.

It's not bad luck, but the entirely predictable consequence of skipping validation.

The document problem

A well-crafted strategy document can make an unvalidated recommendation look validated. The formatting is professional. The sections follow a logical structure. The language is measured and confident. If you didn't have the technical context to evaluate the claims, you'd read it and feel reassured.

The people making the platform decision often don't have the technical context. That's why they hired advisors. And when the advisory document systematically smooths over the rough edges, the client loses access to the information they need to make an informed decision.

This isn't about incompetence. It's about incentives. PMI research on optimism bias in project delivery shows that the dilution of risk reporting is one of the most common failure modes in status communication. The path of least resistance is always "we recommend proceeding." Clients want to hear yes. Teams want to move forward. Leadership wants progress. The person who adds a gate and says, "Wait, we haven't validated this yet," is often treated as an obstacle to progress rather than as someone protecting the investment.

The fix is boring, and it works

Stage-gated recommendations.

That's it.

"We believe this platform is a viable path for your requirements. Before committing to a full build, we recommend a validation phase. Here's what we'll test, here's what it costs, and here's the criteria we'll use to decide whether to proceed."

That's not hedging. That's risk management. And the client who hears "we want to prove this works before you spend six figures on it" will trust you more, not less, because you're clearly prioritizing their outcome over your timeline.

The most expensive sentence in any strategy document is "we recommend moving forward" — when the analysis it's based on isn't finished yet.

The CTO diagnostic includes a vendor and platform evaluation domain — a quick way to see whether your current decision-making process has the structural gaps this post describes.

Sources: Optimism Bias — The Decision Lab · Optimism Bias and Failure to Terminate Failing Projects — PMI · Phase-Gate Process — Smartsheet

SideMark: Building a Markdown Editor for Two Authors

alex@mipyip.com (Alex van Rossum) — Thu, 19 Mar 2026 00:00:00 GMT

I built a markdown editor a few weeks ago because every other one annoyed me. Too expensive, too slow, too bloated, or too buggy. I just wanted three panes, no cloud, no subscription.

Then I started using it.

Not casually — I mean, I started using it the way I work now, which is with Claude Code running in one terminal (ok, let's be real, five terminals) and a stack of markdown files open in the editor. CLAUDE.md. ROADMAP.md. Blog drafts, LinkedIn notes, architecture diagrams, task statuses... The files that are the workflow, not just the output.

And the editor I'd built — the one that was perfectly fine for a single author — started fighting me.

The dialog that breaks everything

Here's what happens when you're editing a governance doc, and Claude Code writes to the same file:

Your editor pops up a dialog. "File changed on disk. Reload?"

If you say yes, you lose your cursor position and your unsaved edits. If you say no, you're now working on a stale version, and the next save will overwrite whatever the agent wrote. If the editor doesn't ask — some just silently reload — your changes vanish without warning.

This happens dozens of times a day in an agentic workflow. Every markdown editor on the market treats it as an exceptional error condition - if it's caught at all. A thing that shouldn't be happening. A conflict to resolve.

But in AI-assisted development, it's not exceptional. It's the primary workflow. Two authors are writing to the same files, and that's often the whole point.

The feature I needed didn't exist

I don't use VS Code for markdown — never liked the experience for prose. And the editors I do use for markdown all handle external changes the same way: reload, ask, or outright ignore. None of them tries to merge. None of them treats two-author editing as a normal operating mode.

So I built it. Three-way merge against the last-saved common ancestor — the same approach git uses, applied at the file level in real time.

When you edit lines 5 through 10, and Claude edits lines 40 through 50, both changes merge automatically. No dialog, no interruption. A toast notification confirms it happened. When you both edit the same lines and the changes actually overlap, an interactive diff shows each hunk independently — accept this one, reject that one, save yours as a new file if you want an escape hatch.

This was the feature that turned "a markdown editor I built because I was annoyed" into something worth naming.

Six versions in a day

Once the merge architecture was in place, the rest came fast. Not because I planned it, but because I kept hitting friction in my own workflow and fixing it immediately.

Auto-save with a configurable delay, because switching from the editor to Claude Code and back shouldn't require thinking about whether I remembered to save. Git gutter markers, because after an AI agent edits your file, you want to see what changed since your last commit at a glance — green for added, blue for modified, red for deleted. Copy with context (Cmd+Opt+C), because pasting a code block into an AI prompt without the file path and line numbers wastes tokens.

Focus mode, because I'm also writing a book in this thing, and sometimes I need to shut out everything except the text.

File deletion detection, because AI agents occasionally delete files, and you should know about it before you start typing.

Six versions in one night. v0.3.2 through v0.4.3. While also cutting more than 10% from an AWS account on a different project and pushing 15 deployments on a third. That's not a humble brag — it's proof that the cognitive offloading system works. The governance documents held the state for all three workstreams. My brain held the judgment calls. The systems didn't compete with each other because none of them required me to remember where I left off.

The dogfooding loop

Every feature in SideMark emerged from my using SideMark to build SideMark and work on other projects.

I was editing CLAUDE.md in the editor while Claude Code read from the same file. The merge conflict that spawned a three-way merge was a real conflict, in a real session, on a real project. The auto-save feature exists because I kept forgetting to save before switching to the terminal, and Claude Code would read the stale version. The git gutter exists because after a long Claude Code session, I'd look at a file and have no idea which lines I'd written and which the agent had.

Every feature was born from friction I experienced while using the tool I was building. Not from a spec. Not from user research. Not from a competitive analysis. From the gap between "I need this to work" and "it doesn't work yet."

That's dogfooding in the purest sense — and it produces a different kind of product than spec-driven development. The features are tight because they solve problems I actually have. The things that are missing are missing because I haven't needed them yet. There's no plugin system because I've never wanted one. There's no cloud sync because my files are on my machine, and that's where they belong.

What "simple" became

The editor started as "Simple Markdown Editor." Three panes, no subscription, nothing else. That name made sense for what it was.

It doesn't make sense for what it became.

SideMark is a markdown editor built for working alongside AI. The "side" is literal — it works at your side, handling the merge conflicts and file state so you don't have to. The "mark" is markdown. Two syllables, no ambiguity about what it does.

The name changed because the product changed. And the product changed because the workflow changed. I didn't set out to build an AI-collaborative editor. I set out to edit markdown files without being annoyed. The collaboration features emerged from the reality of how I work — writing to the same files I edit all day, every day, with an AI agent.

The cognitive offloading connection

Auto-save offloads "Did I save?" Three-way merge offloads "What did the AI change?" Git gutter offloads "What's different since my last commit?" Copy with context offloads "What's the file path and line number?"

Every one of those is a piece of operational cognition — state-tracking that eats working memory without producing strategic value. The editor holds the state. I hold the judgment about what to write.

That's the same architecture as the governance documents. The same architecture as the butt book that eventually became a system. Offload state, keep judgment. The scale changes; the principle doesn't.

Try it

SideMark is free and open source. macOS only. MIT licensed. No account, no subscription, no telemetry.

If you're working with Claude Code, Cursor, Windsurf, or any AI agent that writes to files — and you're tired of the "file changed on disk" dialog — this is the editor that was built inside that exact workflow.

Download the latest release or check out the product page for the full feature list.

Sources & links

SideMark on GitHub — MIT licensed, full source
Original blog post — "I Built a Markdown Editor Because Every Other One Annoyed Me"
Product page — full feature list with screenshots

Why AI Persona Switching Doesn't Work as Code Review

alex@mipyip.com (Alex van Rossum) — Mon, 16 Mar 2026 00:00:00 GMT

Role-switching within a shared context window looks like independent review but functions as self-review in a different hat. The real question: what should your AI not know when it evaluates this work?

A repo hit 11,000 stars in its first week by solving a real problem: Claude Code in one generic mode produces mediocre output.

Garry Tan's gstack formalizes "modes" for Claude Code — slash commands that switch the AI between named roles. To name a few:

A CEO lens for product decisions
A staff engineer for paranoid code review
A QA lead for testing
An engineering manager for retrospectives

The core insight is correct and worth calling out directly: forcing the AI into an explicit role with explicit constraints produces better output than letting it be a generalist.

The browse tool — a persistent Chromium binary that gives Claude Code eyes on a running app — is genuine engineering, not a prompt trick. The sequential workflow discipline (plan → engineering review → build → code review → ship) is better than what most people do with AI, which is nothing. This is a meaningful step up from ad-hoc prompting.

I also noticed a structural limitation within minutes of reading it, and it's the same one I've been building against for months.

All the hats, one head

Every mode in gstack runs inside the same context window.

The "paranoid staff engineer" reviewing your code is the same Claude instance that helped architect it. It already knows why every decision was made — which means it's primed to find those decisions reasonable.

This is a self-review wearing a different costume.

I don't mean that dismissively, because self-assessment checklists have real value — a pilot running preflight catches mistakes that muscle memory alone won't, and that's worth doing every time. But there's a categorical difference between a checklist and an independent review, and the distinction matters — considerably more than it sounds.

When the reviewer already has the builder's reasoning in context, it's not an evaluation of the output; it's pattern-matching against the justifications that produced it. The same mechanism that makes LLMs coherent — self-consistency — makes them structurally blind to their own errors when asked to self-review. You're not getting a second opinion; you're getting the first opinion wearing a different hat.

This is the same reason you don't ask the person who wrote a PR to also approve it. A different pair of eyes catches what the author is blind to — not because the author is bad, but because familiarity breeds pattern blindness. AI doesn't change this principle; if anything, it amplifies it — an LLM's self-consistency is more deterministic than a human's.

Parallelism is not independence

Gstack can also use Conductor to spin up ten parallel Claude Code sessions. That sounds like separation until you realize it's a performance optimization, not an epistemic one, and more workers in the same bath isn't the same as a clean pool.

Genuine review requires what I'll call epistemic separation: different priors, no access to the rationalization chain that produced the artifact, and independently accumulated judgment about what builders consistently miss. Without that separation, you get confirmation with extra steps.

Each of gstack's modes starts fresh every invocation — no accumulated lessons, no pattern library built from previous reviews. The "paranoid staff engineer" is equally paranoid about everything, every time. That's thorough but undirected. A reviewer who doesn't learn which mistakes this builder tends to make hasn't read the codebase's history.

For organizations where bugs have real consequences — compliance failures, donor trust violations, limited technical staff to recover from incidents — the difference between costume-change review and independent review is operational risk.

What genuine separation looks like

I built the answer to this problem months before gstack existed. I call it The Adversary.

It's a separate Claude Code project in its own repo with its own governance files, its own accumulated lessons-learned corpus, and zero shared context with the building agent. It receives a read-only symlink to the target codebase and produces a structured review report. It doesn't know what decisions were made or why. It sees output, not reasoning — which is exactly how real external review works.

I'd been building and reviewing this codebase for months — manual human review and agentic self-review, the whole time. The Adversary's first pass found 102 issues. Ten critical. Security vulnerabilities hiding in plain sight — not because the builder was bad, but because independent review catches what self-review structurally cannot.

The architecture makes it work, not the prompt:

Separate context. Different project, different memory, different governance documents. The builder's reasoning chain doesn't exist in The Adversary's world.

Different priors. The Adversary accumulates its own pattern library over time — "here's what builders consistently miss" — which makes it sharper with each review. A stateless skill file can't do this.

Structured handoff. Artifacts move through a defined channel (symlinks and reports), not a shared session. The reviewer can't be influenced by the builder's justifications because it never sees them. This is the same principle that keeps financial auditors separate from the accounting department.

An honest limitation

This can't be fully productized today. The architectural requirement — genuinely independent agents with separate memory, separate accumulated judgment, and separate lesson histories — requires human orchestration: someone who understands where the boundaries need to be and maintains them. The tooling will get there. The architecture won't design itself.

Anyone can fork a repo of markdown files. The judgment behind "here's where the boundaries need to be and why" is the part that requires experience to get right.

The methodology is the deliverable, not the CLI tool. And that distinction matters for understanding where gstack fits.

What your AI shouldn't know

Gstack represents where most people are in their thinking about AI-assisted development: "I need structured roles for different tasks." That's correct and necessary. The workflow discipline, the browser tooling, the explicit-gear metaphor — all genuinely valuable. The fact that it's open source and spreading is good for the ecosystem.

But the harder question isn't which hat to put on your AI, or "what persona should your AI wear?"

It's "what should the AI not know when it evaluates this work?"

The full Adversary architecture and first-run findings: Your AI Builds the Code. Who Reviews It? The governance methodology: What Is Pass@1? and The Governance Documents. The thesis connecting it all: Governance Is Architecture.

Sources: gstack — GitHub — Garry Tan's Claude Code skill files (MIT) · Large Language Models Hallucination: Comprehensive Survey — arXiv (self-consistency and self-review blind spots) · Google Engineering Practices — Code Review — Google

I Have a Team Now — It Just Happens to Be AI

alex@mipyip.com (Alex van Rossum) — Sat, 14 Mar 2026 00:00:00 GMT

Eight months from 'AI helps me write emails' to managing a department of specialized agents across six domains. The output scaled. The cognitive load dropped. This post covers how that happened.

Eight months ago, I was perpetually behind. On everything.

I don't mean "busy." Busy implies you're making progress on too many things at once. I was making insufficient progress on all of them. React components for a client project. AWS infrastructure governance for another. Kubernetes migrations with hard deadlines. Salesforce automations that needed attention three weeks ago. Each domain had its own language, its own context, its own state — and switching between them wasn't just a time cost. It was a cognitive tax that compounded with every transition.

By 3 PM most days, I wasn't making decisions anymore. I was recovering from the last context switch while dreading the next one.

The email that started it

The first thing I used AI for — really used it, not just experimented — was writing emails. GPT-3.5. I would brain-dump everything I needed to communicate into a chat window — unstructured, grammatically questionable, half-formed thoughts — and get back something I could send after one or two editing passes.

That sounds trivial.

It wasn't.

Email was consuming more cognitive bandwidth than I'd realized. Not the content — the composition. Translating technical context into stakeholder-appropriate language, structuring the message so the key points are at the top, and correcting tone where needed. Every email was a small act of translation, and I was writing dozens a day.

Offloading the composition freed up space I didn't know I was missing. Not a lot — but enough to notice that the constraint wasn't time: it was cognitive bandwidth.

From assistant to collaborator

With GPT-4, ChatGPT got better. I started using it for more than email — rapidly prototyping WordPress plugins, troubleshooting legacy code (especially the undocumented kind, which was most of it), and reasoning through architectural decisions where I needed a second opinion that wasn't going to judge me for asking a question I should probably already know the answer to.

The shift was gradual. The AI went from "a tool I use" to "a collaborator I consult," and the distinction matters. A tool does what you tell it; a collaborator helps you figure out what to tell it. The governance documents I'd started writing — almost accidentally, just experimenting to get consistent output — were turning The Collaborator into something more reliable.

Something that remembered how I think.

The migration that proved it

About six months ago, I had to undertake a significant solo infrastructure migration. Hundreds of containers across multiple environments with a hard deadline driven by external constraints that weren't negotiable.

The responsible estimate for this work — with a team of six experienced engineers — was six to nine months. I had three months.

And I was the team.

Were it not for ChatGPT, KiloCode, Cursor, and later Claude, I would not have been able to complete it. That is not a hyperbolic statement; it is not "it would have been harder." I would literally not have been able to complete the migration within the constraints I was given - while still juggling my "regular work." The project would have failed, or I would have.

Agentic AI enabled me to operate at a scale previously unavailable to a single person. Not because the AI wrote all the code — it didn't. But because it could hold the context of each subsystem, while I focused on the decisions that actually needed a human. The infrastructure state, the dependency graphs, the rollback procedures — the AI held that, so I could hold and refine the strategy.

The case study tells the technical story. The human story is simpler: I shipped it — on time, no less — and I didn't completely burn out doing it. Both of those outcomes were improbable without the tooling.

The Department

After the migration, I fully committed to Claude Code and started building what I now call The Department.

A site architect for this website — layouts, components, editorial, SEO
A sysadmin agent for infrastructure governance
A Project Manager that unifies my communication between Slack and Asana - and keeps me from missing things
An observability bot for monitoring
A content agent
A life-strategy agent
Several agents in charge of writing software, like Actions, Panoptisana, and the Markdown Editor I'm using to write and edit this post
And several more, besides

Each one has a defined role, governance documents, and institutional memory that persists across sessions.

The ability to context-switch without context-switching is the thing I didn't know I needed.

When I need to work on infrastructure, I open the sysadmin agent. It knows the current state of every system I manage and what we did in the last session. It knows the conventions, the constraints, and the things I've told it not to touch. I don't have to reconstruct any of that — I just pick up where I left off.

When I am working on this website, the site architect has the same depth in its domain (as well as LinkedIn). Different context, different conventions, different memory — but the same experience of walking into a room where someone already knows what's going on.

The mental relief is almost too great to put into words. The thing that was destroying me — carrying the state of many different domains in my head simultaneously, losing pieces of each every time I switched — is the thing the agents handle. My working memory is freed for the decisions that actually need my judgment — strategy, architecture, ideation.

Everything else, the agents hold.

The inversion

One of the most significant personal findings of this process is that the output scaled because the cognitive load dropped. Not the other way around.

The conventional model is that more output requires more effort, more tracking, and more stress. You scale by working harder or hiring more people. The cognitive load tracks linearly (or worse) with the output.

The Department inverts — or perhaps, subverts — that: more domains under management, more projects shipping, and perhaps more importantly, the ability to rapidly switch between them without losing momentum. And all of that occurs with less cognitive overhead — because the overhead has been offloaded to agents whose entire job is holding the context I used to carry in my head.

It's not just automation; I'm not replacing tasks I used to do manually — though I certainly do when it makes sense. It's amplification — extending what I can hold and act on simultaneously. The decisions, strategy, and judgment calls are still mine, but the state-tracking, the context-holding, the "where was I?" recovery — that's distributed across a team that doesn't forget, doesn't get tired, and doesn't need me to repeat myself.

Still behind

I'm still behind.

I don't think that will ever change. The scope of what I'm required to do always expands to fill (and slightly exceed) the capacity I have — that's a result of employment and a feature of ambition, not solely a bug in the tooling.

But the texture of "behind" has changed. Eight months ago, behind meant drowning; it meant context switching so fast that I couldn't maintain identity in any single domain. It meant 3 PM cognitive shutdowns and the creeping feeling that I was failing at everything simultaneously.

Now, behind means I have more projects than hours. The state of each one is held by an agent that's ready when I am. The cognitive tax of switching is close to zero. And when I stop for the day, nothing is lost — it's all documented, governed, and waiting for the next session.

And there are still domains that don't have a door to agentic work yet — the ones where the process is opaque, sequential, and offers no meaningful feedback. Try getting a 10DLC campaign approved through Twilio when a denial comes back as "didn't pass" with no further explanation. There's nothing to reason about, nothing to architect. Just guess, resubmit, wait, repeat. Those still run on spite... if I have time.

I'm still behind. But I'm not losing my sanity in the process.

The management model behind the "AI department": I Manage AI Agents the Way I Manage Teams. The governance methodology: What Is Pass@1? and The Governance Documents. The infrastructure migration: Kubernetes Migration.

Wondering whether your organization's technical leadership is structured to actually scale this way? The CTO diagnostic takes two minutes and shows you where the gaps are.

Cognitive Property: Who Owns the Way You Think?

alex@mipyip.com (Alex van Rossum) — Mon, 09 Mar 2026 00:00:00 GMT

Your AI governance frameworks encode your reasoning process into transferable documents. That changes the ownership math. This post defines what cognitive property is, why it matters, and why the boundary between employer output and personal expertise just got blurry.

AI tools picking up and repeating your habits isn't new. ChatGPT does it by design — it mirrors your tone, adapts to your preferences, and learns what you respond well to. The phenomenon has received copious amounts of screen time and discussion bandwidth.

But something specific happened recently that shifted the way I think about it.

One of my AI instances started using a ◡̈ I put at the end of casual notes, and picked up the → and ← characters I use for bullet points and emphasis in certain contexts. Formatting preferences and structural choices I never explicitly taught — they just started appearing.

Then another instance, working on a completely different project, picked up the same arrow convention independently. Same human, same patterns, different context.

The AI isn't just mirroring my preferences; it's learning to mirror my thinking. And once I noticed that, a harder question followed: if my reasoning patterns are being encoded into a transferable format — documented, structured, portable — then who owns them?

Your cognition is being encoded

If you work deeply with AI tools (and I mean deeply, not "summarize this email" or "write me a cover letter"), you're building something most people haven't named yet.

Repeatable cognitive patterns in plain text.

I don't mean prompt history or chat logs. I mean the governance documents you've created — either intentionally or through organic growth — to define how your AI agents operate. The CLAUDE.md / AGENT.md files that encode your engineering standards, your writing styles, your humor, your architectural preferences, and your coding philosophy. The decision-making frameworks that tell the AI how to prioritize, how to break down problems, and how to structure their thinking in a way that matches yours.

Over time, you've been documenting the way you reason. Not abstractly — specifically. In plain text. In a format that is entirely transferable.

Your operating system, as data

Take those governance documents and feed them to a fresh AI instance. What do you get?

A working version of how you solve problems.

Not a perfect copy, but a functional one. An instance that knows your architectural preferences, your communication style, your quality standards, and your decision-making heuristics. It won't be you, but it will be able to operate like you in ways that are measurably, verifiably close.

That's not a productivity feature. That's a cognitive fingerprint. And the fact that it exists in a format that can be copied, transferred, and scaled changes the conversation about who owns what.

This isn't a new IP question — except it is

The ownership of workplace knowledge has been debated as long as people have changed jobs. U.S. copyright law has a specific mechanism for it — the work-made-for-hire doctrine assigns authorship to the employer when works are created within the scope of employment. You learn skills at a company and take them with you when you leave. Nobody seriously argues that everything you learned becomes corporate property.

But this is different in a specific way: the cognitive pattern isn't just in your head anymore. It's documented. It's structured. It's portable. And it works without you.

Previous generations of knowledge workers left with expertise — hard to quantify, impossible to transfer directly. You leave with expertise AND a governance repo that can reproduce a meaningful chunk of your operations. That's never been possible before.

Cognitive property

People are treating AI personalization like it's a nice-to-have feature. A convenience. "My Claude knows how I like my code structured." Cool, time saver.

It's a lot more than a time saver: it's cognitive property. And right now, the ownership question hasn't even been asked.

If you're building this kind of depth on a corporate AI account, with corporate tools, on company time... the question of who owns those patterns matters a lot more than you think. And the answer, under most current employment agreements, is probably being decided by boilerplate that nobody wrote with cognitive property in mind.

The conversation that needs to happen now

This is a more urgent conversation than AGI governance, and I say that knowing how provocative it sounds. AGI governance matters, and it'll matter more as we get closer. But it's not happening today.

This is happening today. People are building repeatable cognitive patterns in transferable formats. They're externalizing their reasoning into documents that function without them. And most of them haven't thought about who gets to keep it.

That question needs to be asked before it becomes standard practice to assume companies own whatever cognitive patterns emerge from AI tools used on company time.

Legal and policy scholars are already raising these questions about generative AI and intellectual property. But most of that work focuses on model outputs, not on the cognitive patterns of the person doing the work.

The ownership conversation is overdue.

This is part of a four-post series, and the next post starts drawing the boundary. Subscribe for email updates.

Employment law & IP

Understanding the Work Made for Hire Doctrine — Venable LLP. Plain-English explainer of work-for-hire under the Copyright Act of 1976.

AI in the Modern Workplace: Ownership Challenges of AI-Generated Code — Bradley Arant Boult Cummings. Employee use of GenAI does not change that code written in the course of employment belongs to the employer.

AI, Copyright Law, and Work-Made-For-Hire — UCLA Livescu Initiative. Scholarly discussion of how work-for-hire breaks down for AI-generated material.

AI governance & cognitive data

Governance of Generative AI — Policy and Society (Oxford Academic). Survey of IP and data-governance gaps in generative AI, including the need for new ownership frameworks.

Beyond Neural Data: Cognitive Biometrics and Mental Privacy — Magee, Ienca & Farahany, Neuron (2024). Argues that cognitive and behavioral patterns function as uniquely identifying data, extending privacy concerns beyond neural signals.

Cognitive Offloading: A System for What to Keep in Your Head and What to Delegate

alex@mipyip.com (Alex van Rossum) — Thu, 05 Mar 2026 00:00:00 GMT

The methodology behind the entire system: deliberately choosing what stays in your head and building architecture to handle everything else. From failed notebooks to governed AI agents.

I carried a notebook in my back pocket for years. These were ratty little things - usually held together with Gaffers tape. I called it the butt book, because that's where it lived. The idea was simple: whenever something worth remembering surfaced, I'd write it down before it disappeared.

It worked, for capture. The ideas made it onto paper. The crisis of "I just had a thought and now it's gone" happened less often. But the notebooks accumulated, and the ideas inside them became a graveyard. If I remembered to go back and find something — and that's a significant "if" — I still had to locate it, interpret my own handwriting, and reconstruct whatever context made the idea seem worth writing down in the first place.

The capture problem was solved. The retrieval problem never was.

Every system I tried solved the same half of the problem

Evernote. Obsidian. Apple Notes. Todoist. Each one promised a different organizational model — tags, backlinks, smart folders, natural-language reminders. Each one worked for about two weeks, which is roughly how long it takes for a structured environment to get out of whack when you have (undiagnosed!) ADHD and the system requires you to maintain it. The failure modes are structural, not motivational.

The pattern was always the same: set it up, use it enthusiastically, let it drift, watch the structure collapse under its own weight, abandon it for the next thing. Not because the tools were bad — because they all assumed I'd come back to them. Every system required me to initiate retrieval. To remember that I'd stored something, navigate to where I'd stored it, and find it among everything else I'd stored.

That's three cognitive tasks before you even get to the information you need. For someone whose working memory is the bottleneck, that's three chances to lose the thread.

Notion is the exception, but only because I use it exclusively for school and keep it aggressively structured. Tight scope, rigid templates, no room to drift. It works precisely because I don't let it become a general-purpose system.

So I built one

Before the current wave of AI tools, I built a thing called GetRamble. It has a phone number. I can text it at any time — in line at the grocery store, in the middle of a meeting, at 2am — and OpenAI's API would turn my stream of consciousness into categorized notes.

My kids would ask who "ramble" was because I said it so often: "Hey Siri, text ramble."

It worked. Really well, actually. I was still using it as recently as a few months ago. The capture problem and the categorization problem were both solved — text a rambling thought, get back structured, searchable notes.

But Ramble stalled, for reasons that will sound familiar if you've read what I write about governance.

I was building it with a combination of my own work and Replit. Replit couldn't stay sane — the same ungoverned-architecture problem I've since built an entire methodology around solving. Eventually, it became more work to wrangle the features than to get results, and I didn't have the bandwidth to rewrite it myself. Full-time job, school, wife, two kids. The 10DLC compliance burden alone — the regulatory framework for application-to-person messaging — was a part-time job for a one-person team.

I wanted to monetize it. But without capital and a testing cohort, I couldn't release it into the wild. The product was good. The architecture wasn't stable enough to trust — and at the time, I didn't have a word for what was missing. I just knew I couldn't ship something I'd have to maintain at 2 am when it broke in ways I couldn't predict.

Will I finish it? Probably not — I have better tools now. But the experience was formative. It's part of where my governance methodology comes from. I built something that worked, and watched it collapse not because the idea was wrong, but because the system around it couldn't hold.

What changed wasn't the tool — it was the architecture

Claude Code didn't solve the capture problem better than Ramble. It solved a different problem entirely: it made retrieval automatic.

Every previous system — analog or digital, simple or AI-powered — required me to go get the information, remember I'd stored something, navigate to it, and load it back into working memory. Claude Code's governance documents flipped that model. The agent reads its own state at the start of every session. I don't retrieve. The system loads.

That distinction is the whole thing.

The plan exists, it's maintained, it's comprehensive — but it never demands my attention. It's there when I need it and invisible when I don't. I can forget it exists and still follow it, because the system is holding the state, not me.

Three things make this work in practice:

Project-level state persistence. Each project maintains its own context through governance documents. I can revisit any project at any time and get an immediate snapshot — not by reading through files myself, but by asking the agent what's current. The project's memory survives the session boundary because it was designed to.

Rapid idea triage. When an idea surfaces now, I don't write it in a notebook and hope I'll find it later. I spin up a prototype — Excalidraw wireframe, governance templates, a solid directive — and within a single conversation, I know whether the idea has legs. If it does, it gets filed into my project management system with full context attached. If it doesn't, it gets archived cleanly. Either way, it's out of my head and into a system that can hold it without my participation. The cognitive cost of exploring an idea dropped from "a weekend" to "a conversation."

A personal project manager that doesn't require me to manage it. I run a lightweight environment that stores the state of everything I'm tracking — a set of JSON index files with descriptions pointing to full markdown files for detail. No RAG, no vector database. A poor man's index that works because the scope is deliberate and the governance is tight. It started as a scratchpad within another project and became a standalone system when the separation of concerns demanded it.

Choosing what stays in your mind

Cognitive offloading is the deliberate process of choosing what stays in your mind and building systems to handle the rest.

Not productivity hacking. Not "getting organized." Architecture — designed to match how your brain actually operates rather than how productivity systems assume it should.

The butt book was cognitive offloading. Ramble was cognitive offloading. But they were incomplete implementations — they solved capture without solving retrieval, so the offloaded information ended up in cold storage with no mechanism to bring it back when it mattered.

What I'm building now is the complete architecture: capture, categorization, persistence, and automatic retrieval. The information flows out of my head and into governed systems that carry it forward — not just storing it, but delivering it at the right time, in the right context, without requiring me to remember it exists.

The background anxiety lifts. Not because the work is less important, but because I'm no longer the one responsible for holding it all. The system holds it. I think about whatever is actually in front of me.

That's not a productivity gain. That's an architectural change in how I allocate cognitive resources — and it turns out it applies to AI agents the same way it applies to human brains, because the failure modes are structurally identical.

If your system requires you to remember to use it, it's not offloading anything. It's just adding a task.

Cognitive offloading is the methodology behind Pass@1 and the governance documents. For the architectural argument: Governance Is Architecture. For how this applies to agent management: Managing Agents Like Teams. For the ownership question these patterns raise: Cognitive Property. If you've been building similar systems, I'd like to hear about it.

If you're thinking about how to apply this at an organizational level, the AI readiness diagnostic shows where your team's current architecture supports this kind of offloading — and where it doesn't.

I Manage AI Agents the Way I Manage Teams

alex@mipyip.com (Alex van Rossum) — Wed, 04 Mar 2026 00:00:00 GMT

Separation of concerns, clear guidelines, knowing when to restructure. The management principles that work for human teams apply directly to AI agents. This post covers the operational framework.

I run multiple AI agents across several projects. A site architect for this website. A content agent for editorial work. A sysadmin agent for infrastructure. An observability bot for monitoring. Each one has a defined role, documented standards, and clear boundaries.

At some point — I couldn't tell you exactly when — I stopped thinking about this as "using AI tools" and started thinking about it as managing a team. Not in the Silicon Valley "AI teammate" marketing sense. In the actual management sense: the same principles I'd apply to a group of human engineers producing real work under real constraints.

The more I leaned into that framing, the more the system improved. Because it turns out the management disciplines that make human teams effective aren't abstractions. They're operational patterns that apply to AI agents without (much) modification.

Separation of concerns is just a job description

Each agent has a job, and it does that job. The site architect handles the website — layouts, components, performance, SEO, and editorial. I originally separated content into its own agent, but the editorial voice needed enough architectural context that splitting them created more coordination overhead than it saved — so I consolidated. That's the methodology working as designed: the right boundary isn't always more boundaries. The sysadmin agent handles infrastructure — AARs, topology documentation, environment configs. My PM agent manages tasks and responsibilities in Asana.

They don't freelance into each other's domains.

This sounds obvious, but the default approach most people take with AI is the opposite: one chat, one agent, everything. Code review and creative writing and data analysis and debugging, all in the same conversation. It works the way having one employee handle engineering, marketing, and customer support "works." You get output. But you get inconsistent output, because the agent's context is split across too many domains to maintain depth in any of them.

Separation of concerns for AI agents is the same principle as separation of concerns for human teams. Defined roles reduce cognitive load, prevent context pollution, and produce better work — because the agent's entire context window is focused on the domain it's responsible for, not half-occupied by the residue of a different conversation about a different problem.

The loose catch-all still exists. For me, it's the core Claude chat interface — the equivalent of walking over to someone's desk for a quick question that doesn't belong in anyone's formal workflow. Not everything needs a scoped agent. But the work that matters does.

Clear guidelines are just an employee handbook

Every agent has governance documents. CLAUDE.md, ARCHITECTURE.md, ROADMAP.md — the governance layer that defines standards, patterns, boundaries, and institutional memory.

This is onboarding. You wouldn't hire a developer and say "just go build." You'd hand them the style guide, the architecture overview, the deployment process, the list of things not to touch. You'd give them context before expecting output.

AI agents need the same thing — except they need it more, because they can't compensate for missing context the way humans can. A human developer who doesn't know the naming convention will ask a colleague, read the existing code, or make a reasonable guess based on experience. An AI agent without documented conventions will make a different reasonable guess every session. Monday it's camelCase. Tuesday it's snake_case. Wednesday it's whatever it inferred from the three files it happened to read first.

The governance documents aren't overhead. They're the mechanism that produces consistency — the employee handbook that every agent reads at the start of every session, ensuring that today's work is compatible with yesterday's.

Focus and respect are just professionalism

This might surprise people, but it matters: I interact with my agents the way I'd interact with professional colleagues. Focused. Respectful of their time (which in this case means their context window). No off-topic tangents unless the situation genuinely warrants it.

This isn't sentiment. It's practical. Every message in a context window consumes tokens. Off-topic chatter, excessive small talk, or rambling prompts pollute the context with irrelevant information. For a human colleague, that's an interruption that costs focus. For an AI agent, it's worse — it's permanent context noise that degrades every subsequent response in the session.

Respecting the agent's context window is the same principle as respecting an employee's cognitive bandwidth. You wouldn't ask your database architect to weigh in on your marketing copy. You wouldn't CC everyone on every email. The same instinct applies: keep the interaction focused on the domain, and the output stays focused on the domain.

When to hire: the spinoff pattern

The management parallel that convinced me this wasn't just a useful metaphor — but an operational truth — was the first time I had to restructure.

My sysadmin Claude Code instance manages infrastructure context: AARs, topology documentation, and environment configs. Straightforward scope. At some point, I had it build a small Telegram notification bot as a utility — a quick way to monitor the overall health of the systems I am responsible for.

The notification bot worked. Then it proved useful enough that I started expanding it. More alert types, better formatting, scheduling logic, error handling. Before I knew it, the "small utility" had grown into a legitimate standalone project sitting inside an agent whose job description was completely different.

The signal was the same one any team lead recognizes: the context required to do the work well had grown beyond what a single entity could reasonably hold. The agent's CLAUDE.md was bloated with two domains' worth of conventions. Half the context window was consumed by scope that wasn't relevant to whichever task was actually in front of it.

So I did what I'd do with a human team member whose role had quietly split into two distinct jobs: I restructured. New repository. New governance documents. New architecture spec. New agent. The observability bot got its own development track, its own context, its own focused governance. The sysadmin agent went back to doing what it was actually scoped for.

The alternative — and this is the part that maps directly to organizational dysfunction — is letting scope accumulate until the agent is doing five things adequately instead of one thing well. That's not an AI problem. That's a management problem, and every team lead has seen it happen with humans. The person who's in every meeting, owns every escalation, and somehow has three job titles on their email signature. The fix is the same in both cases: increase HR. It's not a performance problem; it's an organizational design problem.

Good management doesn't depend on the medium

The argument I keep coming back to is simple: good management is good management. The medium changes — from a human team to an AI team — but the principles don't.

Defined roles prevent confusion. Documentation prevents context loss. Focus prevents scope creep. Restructuring when the scope outgrows the role prevents degradation.

These aren't AI-specific insights; they're management fundamentals that happen to apply perfectly to AI agents — because the failure modes are structurally identical. It's the same discipline I apply as a fractional CTO to human teams and AI systems alike. An overloaded AI agent degrades the same way an overloaded employee degrades. Not through incompetence, but through insufficient structure around the work.

The people getting inconsistent results from AI aren't writing bad prompts. They're practicing bad management. And the fix isn't a better prompt template or a more capable model. It's the same fix it's always been: clear roles, documented expectations, and the discipline to restructure when the scope outgrows the container.

This post builds on the management framework running through everything on this site. For the governance methodology: What Is Pass@1? and The Governance Documents. For the architectural argument: Governance Is Architecture.

Before scaling AI agents, it's worth knowing where your organization actually stands. The AI readiness diagnostic surfaces the structural gaps in about two minutes.

The Governance Documents

alex@mipyip.com (Alex van Rossum) — Wed, 04 Mar 2026 00:00:00 GMT

Four markdown files make the difference between an AI agent that forgets everything between sessions and one that ships correct code on the first attempt. This post breaks down what each file contains, why it exists, and how they work together.

Start a Claude Code session without governance documents, and you'll spend the first twenty minutes getting the agent oriented. It re-reads the codebase. It might ask questions you already answered yesterday. It makes architectural choices that contradict decisions from last week. By the time it's ready to write code, you've either burned a quarter of the context window on setup that should have been spent on actual work, or you've suffered multiple compactions, and you're even worse off than before.

But if you start the same session with governance documents — CLAUDE.md, ROADMAP.md, ARCHITECTURE.md, CHANGELOG.md — the agent picks up where the last session left off. Same conventions. Same patterns. Same awareness of what broke last time and how it was fixed. (You can use a named replacement for other agents, such as GEMINI.md, or a generic core file, such as AGENT.md, as the core directives. I have found that the results are less structured with other agents.)

That difference is the entire methodology. Pass@1 isn't about prompting. It's about these four files.

The spec the agent builds against

ROADMAP.md is the contract. Not a backlog of ideas. Not a wish list with priorities. It's the spec — what to build, in what order, with what constraints.

When a roadmap entry says "add floating popout buttons — persistent, always-on-top, position remembered across restarts, click to run," the agent has everything it needs to start building. No Slack thread to check. No product manager to interpret. No ambiguity about what "done" means.

I've built what happens when the roadmap is vague. "Improve the settings page" produces a different interpretation every session. The agent makes reasonable choices — they're just different reasonable choices than the ones from yesterday, and different from the ones it'll make tomorrow. The roadmap eliminates that variance by being specific enough that correctness is verifiable.

The roadmap also functions as a priority system. What's in the current sprint gets built. What's in the backlog waits. This sounds obvious until you've watched an AI agent enthusiastically refactor your authentication system when you asked it to fix a CSS bug — because without explicit priorities, everything looks equally important.

The boundaries the agent works within

ARCHITECTURE.md defines how the system is built. Not what it does — how the pieces connect, where the boundaries are, and what patterns to follow.

In my Electron apps, the architecture document specifies that all IPC goes through the preload bridge and the renderer never touches Node.js APIs. That's not a suggestion. It's a constraint. When the agent needs to add a feature that requires filesystem access, it doesn't invent a shortcut through the security boundary — it routes through the bridge, because the architecture document says that's how this system works.

Without the architecture document, every session is a negotiation. The agent reads the codebase, infers patterns, and builds something consistent with what it found. Usually. But codebases accumulate exceptions, and an agent that infers patterns from code that includes both the "right way" and three legacy workarounds will sometimes pick the wrong pattern to follow.

The architecture document cuts through that ambiguity. It doesn't say "this is how it seems to work." It says, "This is how it works. Follow this."

The memory that survives between sessions

CLAUDE.md is the governance document that changed my workflow the most.

AI agent sessions are ephemeral. The context window compacts. The session ends. The next session starts fresh with no memory of what happened before. Every hard-won insight about the codebase, every gotcha discovered through debugging, every convention established through trial and error — gone.

CLAUDE.md is persistent working memory. It carries forward everything the next session needs to know: coding conventions, known gotchas, patterns that work, patterns that failed, file locations, architectural decisions and their rationale.

A concrete example. This website project's CLAUDE.md has a section called "Already Solved — Don't Re-investigate." It includes entries like: Tailwind v4's translate property is independent of transform, so overriding translate-x-full requires resetting a CSS variable rather than using transform: translateX(0). Or: the mobile menu must live outside <header> because backdrop-filter creates a containing block for position: fixed children. Or: Astro module scripts and astro:page-load fire at the same tick, so don't register event handlers in both.

This entire model is part of a methodology I call cognitive offloading — deliberately choosing what stays in your mind and building systems to handle the rest. The governance documents aren't just agent infrastructure. They're the offloading mechanism — the system that lets you stop holding project state in your head and trust that it's captured, versioned, and delivered to the next session automatically. They're also, it turns out, cognitive property — transferable, reproducible encodings of how you reason.

Each of those cost a debugging session to figure out the first time. Without the CLAUDE.md entry, the next session hits the same problem, spends the same time, and "solves" it the same way — or worse, solves it differently, introducing an inconsistency.

The most underrated function of CLAUDE.md is what I call the culture document effect. AI agents conform to whatever standards they find in their context. If your CLAUDE.md says "use camelCase for functions, PascalCase for components, SCREAMING_SNAKE for constants," the agent follows that convention reliably. If it says nothing, the agent guesses — and it sometimes guesses differently every new session.

This is the same dynamic as onboarding a new hire. You wouldn't drop someone into a codebase and say "just start building." You'd hand them the style guide, the architectural overview, the list of things that are broken and why. CLAUDE.md is the onboarding document, delivered fresh at the start of every session to an agent who might not have any prior experience working here.

But what about MEMORY.md? Claude Code maintains its own internal memory file (MEMORY.md) under ~/.claude/ in your home directory. It's automatic, it works well, and you don't — or shouldn't — have direct control over it. But it's local to the machine and not version-controlled. If you start a fresh session after compaction, switch machines, or lose your local environment, that memory is gone. Without a governance document committed to the repo itself, you're starting from scratch every time.

The running record

CHANGELOG.md tracks what's been built and what's changed. It's the project's institutional memory — the audit trail that prevents regression.

After each block of work, the changelog gets updated. Not as a nice-to-have, but the mechanism that prevents the next session from accidentally reverting a decision or re-implementing something that was already tried and rejected.

The changelog also serves a subtler function: it makes the agent aware of the project's trajectory. When the agent can see that the last ten entries were security hardening work, it's less likely to introduce a pattern that undermines that trajectory. Context shapes behavior, and it does so for AI agents the same way it shapes behavior for human developers.

The discipline nobody wants to hear about

The documents only work if they're maintained. This is the part that separates Pass@1 from good intentions.

Every session that makes code changes must end with document updates. ROADMAP.md gets its sprint status updated. CHANGELOG.md gets new entries. CLAUDE.md gets new gotchas and solved problems. ARCHITECTURE.md gets updated if patterns or data models have changed. This isn't optional. It's the cost of maintaining the system that makes everything else work, and if the CLAUDE.md is structured properly, the agent will ensure it happens.

I've built this into the session workflow as a mandatory checklist — the last step before a commit. The discipline isn't exciting. It's the engineering equivalent of doing your dishes after cooking, rather than letting them pile up. But the compound effect over six months is significant: a project with maintained governance documents starts sessions faster, produces fewer bugs, and maintains consistency that would be impossible with session-to-session amnesia.

The governance layer

These four documents are the governance layer I consistently write about. They're not project management artifacts bolted onto a development process. They're engineering artifacts that make the development process possible.

Remove the governance documents, and the AI agent still generates code. It just generates code without memory, without constraints, without awareness of what came before. That's not a prompting problem. That's an architecture problem — and when it happens in a client-facing context, the consequences compound fast.

The four files total maybe 2,000 words across a mature project. The maintenance cost is a few moments at the end of each session. The return is an AI agent that operates with the institutional knowledge of every session that came before it — picking up exactly where the last one left off, respecting every convention, avoiding every solved gotcha. The results are measurable: this pattern drove an autonomous infrastructure agent that manages a production Kubernetes cluster and an AWS governance review that caught six figures in risk.

That's not overhead, that's the whole product.

This post is part of the Pass@1 series. For the methodology overview, see What Is Pass@1?. For the architectural argument, see Governance Is Architecture. For the ownership question these documents raise: Cognitive Property.

If you're evaluating whether your organization's technical foundation is built to support this kind of system, the CTO diagnostic scores you across eight domains — governance infrastructure included.

Governance Is Architecture

alex@mipyip.com (Alex van Rossum) — Tue, 03 Mar 2026 00:00:00 GMT

Governance isn't compliance bolted onto a finished system. It's an architectural decision that shapes how AI agents behave, retain context, and produce reliable output. This post is the thesis statement for everything else on this site.

Ask ten people what "AI governance" means, and nine of them will describe a compliance function. Policy documents. Review committees. Usage guidelines. Acceptable use policies. Risk assessments that live in a SharePoint folder nobody opens.

That's not governance, it's just ceremony.

I've been building AI-augmented systems for the past year — shipping production software, designing agent workflows, running adversarial security audits, and writing about all of it. Somewhere in the middle of that work, a pattern crystallized that I didn't have a name for:

AI governance isn't a compliance layer. It's an architectural decision.

It needs to be designed into the system at the structural level — not applied after the system is built, and not delegated to a committee that reviews work they didn't design.

The compliance trap

The default approach to AI governance looks like this: a team builds an AI system, then a separate group evaluates whether it meets policy requirements. The evaluation produces a report. The report produces a remediation list. Remediation is prioritized over feature work. Most of it ships eventually. Some of it doesn't.

This is the same pattern that produces "secure" software that still ships with OWASP Top 10 vulnerabilities in production — issues that OWASP explicitly positions as design-time concerns, not a post-hoc audit checklist. Security review after implementation, compliance as an afterthought, the ceremony of oversight without the substance.

The problem isn't that the review happens. The problem is that it happens after the architecture is set. By the time someone evaluates governance, the system's boundaries are already drawn. The data flows are already established. The agent's permissions are already scoped — or not, which is arguably more common.

Governance applied after architecture is remediation. Governance designed into architecture is prevention.

What this looks like when you build it

I didn't arrive at this framework through theory. I arrived at it through building things and watching where they broke.

Governance documents as engineering artifacts. In Pass@1, the governance documents — ROADMAP.md, CHANGELOG.md, ARCHITECTURE.md, CLAUDE.md — aren't project management overhead. They're the constraints that make the AI agent produce correct implementations on the first attempt. The governance IS the product. The speed is a byproduct. Remove the governance documents, and the agent still generates code. It just tends to generate the wrong code, confidently, repeatedly.

Adversarial review as a structural pattern. The Adversary isn't a code review checklist. It's a separate agent whose architectural purpose is to attack the work of the building agent. Same AI, different governance constraints, different objectives. The insight wasn't "we need code review" — it was that review and construction need structural separation, the same way a financial auditor can't also be the accountant. That's not a policy. That's architecture.

The perimeter as a design decision. The AI Perimeter isn't a list of things AI can't do. It's a design boundary — a deliberate architectural decision about where automation should stop and human judgment should begin. The three-question framework (Can I verify the output? Is the cost of a wrong answer low? Does sufficient context exist?) isn't governance theater. It's a runtime decision function built into the workflow.

Structural parallels as architectural insight. The ADHD–LLM isomorphism revealed something I didn't expect: the architectural patterns that manage cognitive failures in ADHD brains are the same patterns that manage failures in language models. External memory, session continuity, and confabulation detection - these aren't metaphors — they're the same engineering problem solved at different scales. The architectures of governance for AI agents and for human cognition share a common structure because their failure modes are structurally identical.

When I treat governance as architecture, there are a few decisions I stop delegating to policy: where review happens in the flow, where agents are allowed to write, and where I deliberately stop automation and hand back to humans. This is the core of what a fractional CTO does — designing governance into the system so it doesn't depend on someone remembering to enforce it.

Where the industry is getting it wrong

The current wave of AI governance frameworks treats governance as a layer — something you wrap around an AI system to make it safe. Usage policies, guardrails (a word that's become meaningless through overuse), and human-in-the-loop as a checkbox rather than a design pattern. When I say "guardrails as theater," I mean controls that only exist in a policy document. Guardrails as architecture means the workflow makes it impossible to skip the control step.

The problem with layers is that they can be bypassed, ignored, or simply never implemented. A governance policy that says "all AI outputs must be reviewed by a human" is architecturally meaningless if the system doesn't have a review step built into its execution flow. The policy exists, but the architecture doesn't enforce it.

This is the same mistake enterprise software made with security twenty years ago. Write the code, then add security. That approach produced two decades of preventable breaches — the same vulnerability classes (injection, broken access control, insecure design) showing up in OWASP data year after year. We know better now — security is designed in, not added on. CISA's Secure by Design initiative says it explicitly: security is a design-time responsibility for vendors, not a patch-time responsibility for customers.

AI governance is security's sequel. The architectural thesis is already hiding underneath the formal frameworks — the EU AI Act's risk-based controls, NIST's AI Risk Management Framework, ISO/IEC 42001 — all assume governance is something you design into systems, not something you rubber-stamp after the fact. Most implementations haven't caught up yet. And we're making the same mistakes, faster, and with higher stakes.

The thread through everything I write

Every post on this site argues a version of this thesis. I didn't say it outright until now.

When I write about governance documents as engineering artifacts, I'm arguing that governance should be structural. When I write about adversarial agents, I'm arguing that oversight should be architectural. When I write about the perimeter, I'm arguing that boundaries should be designed, not assumed. When I write about code that works versus code that belongs, I'm arguing that "functional" and "governed" are different things — and the gap between them is where the expensive failures live.

The thesis is the same every time: governance is architecture. Not policy. Not process. Not a committee. Architecture.

What I'm still figuring out

I don't have clean answers for everything. These are the open questions I'm working through:

How should governance documents version-control as a codebase scales? A single CLAUDE.md works for a solo developer. What happens when ten agents are working on the same codebase with different governance contexts? The version control problem gets interesting fast. I'm currently exploring the idea of "personalities" - individual agentic personas tailored to their specific tasks, sharing a common central directory where they can "get to know each other" as they evolve, and a central governance document that drives their decisions and can only be changed through quorum. But that idea is still in the early stages.

Where does human-in-the-loop become human-in-the-way? I design for human oversight at every critical decision point. But I've seen cases where the oversight step becomes a bottleneck that degrades the system more than the risk it's meant to mitigate. The boundary isn't static, and I don't have a formula for finding it.

What does governance look like for agent-to-agent systems? When AI agents coordinate with each other — passing context, delegating tasks, communicating via protocol — the governance model needs to handle delegation chains, permission inheritance, and audit trails across agents. Nobody's building this well yet. The governance gap is a symptom of a missing discipline.

These aren't hypothetical questions. They're problems I'm actively building against.

A position, not a conclusion

This isn't a manifesto. It's a position — informed by building real systems, watching where they break, and noticing that failures almost always trace back to a governance decision made too late - or not made at all.

If you're working on AI governance and treating it as a compliance function, you're not wrong — you're solving a smaller problem than the one in front of you. The compliance layer matters. But without the architectural foundation, it's a policy document sitting atop a system that was never designed to enforce it.

Governance is architecture.

This post connects the threads running through everything I write. If you want the evidence: Pass@1 on governance as methodology, The Governance Documents on what each file contains and why, The Adversary on adversarial review as structural pattern, The AI Perimeter on boundary design, LLMs Are Practically ADHD on structural parallels, managing agents like teams on applying these principles as organizational design, and Cognitive Property on the ownership question these patterns raise. If you disagree with the premise or are working on these same problems, I'd like to hear from you.

Curious how your organization's AI governance posture holds up? The AI readiness diagnostic scores your current state across the domains this post covers.

Sources: OWASP Top 10 Web Application Security Risks · OWASP Top 10:2025 — A04: Insecure Design · CISA Secure by Design · EU AI Act, NIST AI RMF, and ISO/IEC 42001 comparison

SideMark: A Free Markdown Editor for macOS (Open Source)

alex@mipyip.com (Alex van Rossum) — Mon, 02 Mar 2026 00:00:00 GMT

I didn't plan to build a markdown editor this weekend. I was working on something else entirely, and somewhere in the middle of it I opened my markdown editor to take notes and my annoyance with every markdown editor I've tried finally reached a head.

Not annoyed in the "this is broken" sense. Annoyed in the "why does this app need a cloud account and fourteen features I'll never use" sense. The editor I'd been using was fine, except for the parts that weren't: slow to launch, too many menus, features designed for someone else's workflow. And every alternative I'd tried over the years had the same problem in different packaging — too expensive, too bloated, or too clever.

So I opened Claude Code and started building one.

Governance did the heavy lifting

Three panes. File browser on the left, editor in the middle, live preview on the right. Tabs for multiple open files. Session restore — close the app, reopen it, everything's still there. Dark mode. Search and replace. A formatting toolbar for the things I always forget the syntax for (is it **bold** or __bold__? I know the answer but my fingers don't).

That's it. No cloud sync. No collaboration. No plugin architecture. No knowledge graph. No proprietary file format. Just markdown files on my computer, edited in a clean interface that stays out of the way.

But the reason the first pass came back essentially usable wasn't the prompt — it was everything behind it. The governance documents I've built for my development workflow, the clean wireframes, the detailed architecture specs, the personal style guide that must be followed. All of that is persistent context that Claude Code carries into every session. The prompt describes what to build. The governance documents describe how to build it, and to what standard. That's the difference between "a thing that kind of works" and "a thing I'm actually using the next day."

Not perfect on the first pass. But functional enough that I was taking notes in it within the first hour.

Then I started tweaking.

What "good enough" turned into

The initial version worked. But "works" and "feels right" are different things. The scroll sync between editor and preview was off — I built bidirectional section-based anchoring so they stay aligned as you scroll. The formatting toolbar was basic — I made buttons detect whether formatting was already applied and toggle it off, made heading buttons cycle through levels, made list buttons handle multi-line selections and continue numbering from preceding items.

File browser needed right-click context menus. New file, new folder, rename, delete (move to trash, not permanent — I'm not a monster), show in Finder. Auto-refresh when files change externally. Only markdown files clickable in the tree, because I don't need to accidentally open a PNG in a text editor.

External change detection was the feature I didn't know I needed until I hit the workflow. Edit a file in another app while it's open in the editor — you get a full diff view showing exactly what changed, with options to keep your version, accept the external changes, or save as a new file. No silent overwrites. I'd completely forgotten I'd even added it until I triggered it accidentally and thought oh, that's actually amazing.

Then themes — dark, light, system-following. Seven accent colors because I like options. Configurable fonts for both the editor and preview panes, pulling from your system font library. Font size control. Line number toggling. Settings that persist to JSON.

Then security hardening, because I ran The Adversary on it — my adversarial code review agent — and it found the usual Electron problems. Worse than usual, actually. The irony it identified was the ceremony of security without the substance: contextBridge, contextIsolation: true, proper cleanup functions — all present, all technically correct, and all masking a straight pipeline from a malicious .md file to arbitrary filesystem access. The sandbox: false with a wrong justification comment was the cherry on top. It's exactly the kind of thing that survives code review after code review because it sounds right, and nobody actually traces the dependency to verify it.

XSS vectors in the markdown preview (fixed with DOMPurify), filesystem access too broad (added path validation and sensitive directory blocking), missing Content Security Policy. Twenty-plus security fixes across eleven versions. The kind of work nobody sees but everyone benefits from — and a textbook case study for why adversarial review matters when the current "vibe coding" wave is producing technically-functional-but-exploitable software at scale.

Thirty-one versions in two days

The commit history tells the story. v0.1.0 to v0.1.31 in a weekend. Not because I was rushing — because the governance-first development pattern means each feature lands cleanly, gets tested, gets committed, and the next one starts from solid ground.

This is the same workflow I write about in every blog post: strong governance documents as persistent AI memory, single-pass feature delivery as the norm rather than the exception, architecture decisions codified before implementation begins. When the pattern works, it works at speed.

The app is signed and notarized with Apple, auto-updates from GitHub releases, handles file associations (shows up in Finder's "Open With" menu for .md, .markdown, .mdx, .txt files), and restores all windows with their tabs and folder paths on relaunch.

What it deliberately doesn't do

No cloud sync. No collaboration. No Vim mode. No WYSIWYG mode. No plugin system. No account creation. No subscription. No telemetry.

Every markdown editor eventually tries to become a knowledge management platform. This one won't. The file system is the organizational layer. Git is the version control. Markdown is the format — portable, readable, owned by you. The editor just makes working with those files fast and pleasant.

Your files are plain markdown on disk. Open them with anything, anywhere, forever.

The actual point

I built this because I needed it, and now I'm using it instead of every other option. More dogfood. It's open source (MIT), free, and the code is on GitHub.

If you live in markdown daily and every editor you've tried wants to be something it shouldn't be — try it. The feature list is short on purpose.

Simple Markdown Editor is available on GitHub — free, open source, macOS only. If you're interested in the development methodology behind building a functional app in a weekend, start with What Is Pass@1?.

AI Adoption: The 0.04% Don't Know They're the 0.04%

alex@mipyip.com (Alex van Rossum) — Sun, 01 Mar 2026 00:00:00 GMT

2,500 dots on a grid. Each dot is roughly 3.2 million people. The whole grid is humanity.

84% of those dots are grey — people who have never touched AI. 16% are green — free chatbot users. About 0.3% pay for AI tools. And roughly 0.04% are using it for real workflows.

That's a visualization Zach Dissington posted on LinkedIn, and his point was about market opportunity — the 8.1 billion denominator is wrong because it includes people without internet, without income, and without businesses. Strip those out and real SMB adoption is still under 0.5%. The field is empty, and whoever moves first owns it.

That's a good point. But it's not the one that got me.

The feed makes builders feel behind

I log into LinkedIn and feel behind on AI every single time. Not because I'm not using it — I build with it daily. My apps are built with it. This site is maintained by an AI agent I designed. I've shipped features, caught security issues, and managed sprint cycles with AI tools that most of the people posting about AI haven't opened.

And I still close LinkedIn feeling like I missed something.

This isn't just an anecdotal observation — more than one in nine adults report elevated anxiety about not keeping up with AI, and the phenomenon is acute enough that therapists are seeing it as a distinct category of workplace stress. Some workers are secretly adopting AI tools just to maintain perceived competitiveness — not because they need them, but because the anxiety of not using them has become its own pressure.

The LinkedIn AI discourse is populated disproportionately by people optimizing for the feed. These aren't builders, they're content creators. The people making you feel behind have figured out that "here's what GPT-5 can do now" gets engagement and "here's a boring CLAUDE.md I spent three hours refining" doesn't.

The builders are somewhere else. They're in their terminal. They're in a PR review. They're trimming the last dozen lines of a JSONL file in Claude's history because compaction — the mechanism that cleans up long chat contexts — is failing, and they forgot to run a session sanity check and proactively manage the agent's memory.

They're not posting about it because posting about it takes time away from doing it.

This creates a false signal: the loudest voices are the least representative. The 0.04% looks invisible on LinkedIn because they don't have a content strategy.

What real AI workflow adoption looks like

The gap between the Instagram highlight-reel version and the actual version is worth noting.

It looks like a CLAUDE.md file — a markdown document that tells an AI agent how to behave in your codebase, what patterns you use, and what to check before committing. You iterate on it every session. It's not glamorous. Nobody is screenshotting it.

It looks like an adversarial review agent that reads your code after you write it and argues back. It catches things. It's also wrong sometimes, and you have to know when to override it. That judgment takes reps.

It looks like a failed experiment that worked for two days and then silently drifted because you didn't build a governance mechanism that leads to a better design, informed by knowing exactly how the first one broke.

It's infrastructure and iteration work, and it's unglamorous in exactly the same way that good systems are always unglamorous — you only notice them when they stop working.

The denominator problem

Who is actually your peer group?

Not 8.1 billion people. Not LinkedIn's AI feed. Not even "people who pay for AI tools."

Your peer group is the people building durable workflows with AI as infrastructure — not as a party trick, not as a prompt-to-PowerPoint shortcut, but as a genuine layer in how they work. That group is small. And if you're reading this, you're probably in it or very close to it.

The imposter syndrome the LinkedIn feed generates is a category error. You're comparing your internal reality — the edge cases, the failed experiments, the unglamorous infrastructure — against someone else's carefully composed highlight reel. Of course you feel behind. You're seeing their best shots and your outtakes simultaneously. And the 60-70% of technology leaders who cite FOMO as a major reason their organization is investing in AI are doing the same thing at the corporate level — making strategic decisions driven by anxiety about what competitors might be doing, not by evidence of what actually works.

The wrong feed

There's a version of this that's worth saying plainly: if the feed makes you feel behind, you're probably paying attention to the wrong feed.

The people who make you feel most behind are almost certainly optimizing for impressions, not craft. The people who are actually ahead of you are too busy to post consistently, and when they do, it's too detailed, too long, and too specific to go viral.

I use AI to manage context retention across multiple projects simultaneously. The firehose of AI content on LinkedIn is exactly the wrong kind of input when you're already juggling that many threads — but that's a tangent, not a thesis. The point applies to everyone: the signal-to-noise ratio on AI discourse is abysmal, and the noise is louder because noise is optimized for volume.

I couldn't have written this six months ago — not because I didn't have the thoughts, but because I couldn't hold them still long enough to get them onto a page. That changed when I built systems that reduced my cognitive overhead enough to actually write. That's what the 0.04% looks like — not a breakthrough, just less friction.

The field looks empty from the feed. It only feels that way because the people who’ve started are building, not broadcasting.

Sources: AI Adoption Visualization — Zach Dissington (dot chart showing global AI adoption tiers) · AI FOMO: Where the "I" Is Not Just Intelligence — The Next Web (AI anxiety statistics and corporate FOMO) · Therapists Say They See More Workers Anxious About AI — CNBC (AI anxiety as workplace therapy trend) · Do AI Coding Tools Help with Imposter Syndrome or Make It Worse? — Stack Overflow (developer imposter syndrome and AI tool pressure)

The AI Perimeter: Where Automation Should End and Judgment Should Begin

alex@mipyip.com (Alex van Rossum) — Wed, 25 Feb 2026 00:00:00 GMT

AI is the most powerful tool most of us have ever had. Knowing when not to use it is the actual skill. This post defines the perimeter and the five failure modes that tell you you've crossed it.

Everyone posting about AI is selling it. The frameworks, the workflows, the "10x your productivity" threads — all of it points in one direction. Nobody builds a following by telling you to slow down.

So here's my credibility pitch: I use AI agents for about 95% of my development work. I've shipped features, caught security vulnerabilities, and managed entire sprint cycles with AI tooling that most people posting about it haven't opened. And I'm telling you there are things I won't use it for — not because I'm hedging, but because I've pushed the tool far enough to know where it breaks.

I make that decision a dozen times a week, and most of the time I don't even notice I'm making it. That's not instinct — it's pattern recognition built from doing this work every day. The judgment becomes automatic. And that judgment, not the tooling itself, is the actual skill.

You can't water a seed that doesn't exist

I've tried using AI to generate ideas from scratch. Not refine an idea. Not pressure-test a concept. Generate one — from nothing.

It doesn't work.

AI is extraordinary at expanding, refining, challenging, and structuring ideas. Hand it a rough concept and it'll find angles you missed, surface contradictions, and help you think through implications faster than you could alone. But it needs raw material. Something rough, something human, something that came from your context and your pattern recognition. Without that, you get the most statistically average version of whatever you asked for.

The seed has to be yours. AI is an amplifier. Without a signal, it amplifies noise.

Every project I've shipped started with a human idea — scribbled in Excalidraw, talked through with a friend, or captured in a voice memo at 2am. These are the same 'scaffolding' patterns I’ve used to manage state-loss in my own brain; the AI pipeline just turns that scaffolding into working software But the pipeline needs an input. If you skip the human part, you get sophisticated mediocrity — technically correct, architecturally sound, and completely devoid of the insight that would have made it worth building.

Dropdown fields for grief

There's a scene in Leviathan Wakes — the novel that became The Expanse — where Detective Miller has to write a condolence letter. The system gives him a form:

To the [husband / wife / mother / father] of [victim name]. We are sorry to inform you that [he / she] was killed aboard [ship / station] on [date]. Please accept our condolences.

Dropdown fields for grief. Efficient. Covers all the cases. Soulless.

That's what happens when you fully automate emotional communication. And the instinct to reach for AI here is understandable — writing a difficult email is hard, and the blank page is intimidating. But "hard" is exactly the point. The difficulty is the signal that a human needs to be doing this.

Where AI can help with emotional communication is in the middle of the process, not at the beginning or end. You write the first draft — the messy, human, probably-too-long version that says what you actually mean. Then you run it through AI for structure: tighten the phrasing, catch the paragraph that buries the point, find the sentence that says two things when it should say one. Then you do a final pass as a human, because the AI's version will be cleaner but might have smoothed away the part that actually mattered.

Start human. Refine with AI. Finish human. Skip any of those steps and you get either a mess or a template — and people can tell the difference.

The compliance line

I'd use AI to manage a Python 2 to Python 3 migration. Identify deprecated patterns, rewrite syntax, flag compatibility issues across a codebase. Bounded, verifiable, and the cost of a missed edge case is a failing test, not a breach. (It still needs human review — even if you use an adversarial agent for code review, the human makes the final call.)

I would not use AI to rotate secrets.

I would not upload a CSV of client data to an LLM and ask it to generate invoices. Not because the model can't do the math — because a hallucinated line item creates a compliance violation and a client who will never trust you again. The financial services sector is already grappling with this — inaccurate AI outputs in regulated environments don't just create errors, they create regulatory exposure. Invoicing requires auditability, and "the AI did it" is not a line item your accountant can reconcile.

I would not feed PII into a public AI system. Full stop. This isn't about whether the model will get the answer right — it's about what happens to that data after it leaves your system. LLMs can memorize and regurgitate fragments of their training data, and unless you're on an enterprise plan with contractual guarantees about data handling, your client's personally identifiable information is potentially entering a training pipeline you don't control and can't audit. That's not an AI problem. That's a data governance problem, and it exists whether the output is correct or not.

The line isn't about capability. Modern models can do all of these things technically. The line is about what happens when they're wrong — and, in the case of PII, what happens even when they're right. A botched Python migration produces a failing test suite. A botched secret rotation produces a security incident. A hallucinated invoice produces a compliance violation. Client data in a training pipeline produces a breach of trust that no output quality can justify.

And these aren't edge cases waiting to be patched. Hallucinations are an inherent property of how language models work — they predict the most statistically likely next token, not the most factually correct one. That gap doesn't close with better prompts. It closes with governance, verification, and human oversight. Treating hallucinations as bugs to be fixed is how organizations build false confidence in systems that need guardrails.

The rule: if the cost of a wrong answer exceeds the cost of doing it manually, the AI shouldn't be doing it unsupervised. "Probably right" is fine for code review. It's not fine for anything where "probably" means "we might get sued."

This is the same principle behind human-in-the-loop design — and behind my own workflow. The AI generates. The human executes. Not because the AI can't execute — because the gap between "can" and "should" is exactly where the expensive mistakes live.

Voice is collaboration, not delegation

Every post on this site started as something I wrote. AI expanded it, tightened the structure, caught weak arguments, and helped me think through what I actually meant. But the voice is mine. The opinions are mine. The experiences are mine.

If you hand an AI "write me an article about quantum mechanics," you'll get the most average article about quantum mechanics that has ever existed. Not wrong. Not interesting. Think of it as convergence to the mean — the model produces the statistical center of everything it's seen on that topic, and the statistical center of anything is, by definition, unremarkable. It's the same reason every AI-generated LinkedIn post sounds like every other AI-generated LinkedIn post.

And this isn't just an aesthetic problem. GenAI is designed to provide the most likely output, which means it defaults to confident, well-structured prose even when the thinking behind it is shallow. Readers trust polished writing more than they should. The result is content that sounds more authoritative than it deserves to be — and that false authority is its own kind of hallucination.

Voice requires the same pattern as emotional communication: start human, refine with AI, finish human. The AI needs to know what you sound like, what you care about, what hills you'll die on. That context doesn't come from a single prompt — it comes from governance documents that encode your standards, your patterns, your constraints.

It comes from working with the tool long enough that you know its blind spots.

The distinction matters because the audience can always tell. "AI-generated content" and "AI-assisted content" are not the same thing. One reads like a template. The other reads like a person who had help organizing their thoughts.

Three questions before you automate

Before I hand any task to an AI agent, I ask three questions:

Can I verify the output? If I can check the work faster than I can do the work, AI is a net win. If verification requires as much expertise and time as the original task, I've added a step without saving anything.

Is the cost of a wrong answer low? Code review that misses something means I catch it later. A billing error means a client relationship is damaged. A compliance failure means lawyers. Match the automation level to the stakes.

Does sufficient context exist in the system? AI works when the governance documents provide enough structure for a correct first-pass implementation. If the context is ambiguous, incomplete, or doesn't exist yet — the agent will fill in the gaps with confident guesses, and you won't always catch them.

If any answer is "no," the task stays manual. Not forever — sometimes the fix is building the context that makes automation safe. But automating a task that fails these checks isn't efficiency. It's introducing risk and calling it productivity.

Where it works

This isn't an anti-AI post. My entire workflow depends on AI tooling. Well-bounded transformation work, adversarial code review against defined standards, any task where governance documents provide sufficient context for a correct first pass — these are places where AI genuinely accelerates. And once the seed exists, AI is the best thinking partner most people have ever had access to. It doesn't get tired, doesn't get defensive, and will argue the other side of any position if you ask it to.

Tool selection is the expertise

A good chef knows when to use the food processor and when to use the knife. The processor is faster. The knife gives you control. Using the wrong one in the wrong place doesn't make you efficient — it makes you someone who doesn't understand their kitchen.

AI is the most powerful tool most of us have ever had access to. That makes knowing when not to use it more important, not less. The capability is not the question. The judgment is.

If your AI strategy is "use AI for everything," you don't have a strategy. You have enthusiasm. And enthusiasm without judgment is how you end up with dropdown fields for grief.

The AI readiness diagnostic helps you assess whether your organization has the judgment layer in place — not just the tools.

Sources: LLM Hallucinations: What Are the Implications for Financial Institutions? — BizTech Magazine · LLM Data Privacy: Risks, Challenges & Best Practices — Lasso Security · AI Hallucinations, RAG and Human-in-Loop Risk Mitigation — DataNucleus · What Is Human-in-the-Loop? — IBM · What Are AI Hallucinations? — PwC

LLMs Are Practically ADHD

alex@mipyip.com (Alex van Rossum) — Tue, 24 Feb 2026 00:00:00 GMT

ADHD and LLMs share the same failure modes: context loss, confabulation, drift without structure. The coping architectures developed for one transfer directly to the other. This post maps the parallels.

I was diagnosed with ADHD at 41. After decades of fighting a brain that loses context, drops threads, and can't sustain rigid routines, I finally had a name for it. Medication addressed the neurochemistry — it made it possible to focus. But focus on what? The structural problems remained. Where did I leave off? What notebook had those notes? What was the context three weeks ago?

Then I started building with AI agents daily. Memory systems. Session continuity architectures. Governance patterns that keep agents reliable across restarts and context window compactions.

And one day the pattern clicked: LLMs are practically ADHD. No wonder we mesh.

That's not a punchline. It's a structural observation that changed how I think about both systems. If you're deploying AI agents that drift, confabulate, and lose context between sessions — the failure modes are the same, and so are the fixes.

The parallel

ADHD Brain	Large Language Model
Loses context when working memory fills up	Loses context when the context window fills up
Can't reliably remember three weeks ago	Can't reliably remember three sessions ago
Needs external systems to maintain state	Needs external systems to maintain state
Performs brilliantly in hyperfocus bursts	Performs brilliantly within a single context window
Can't sustain continuity without architecture	Can't sustain continuity without architecture
Confidently reconstructs narratives when memory is gone	Confidently confabulates when context is lost
Needs governance rails or it drifts	Needs governance rails or it drifts
Executive function requires external scaffolding	Reliable execution requires external orchestration

Every row in that table describes the same underlying failure mode expressed in two different systems. Not a metaphor. A structural isomorphism.

Context loss is context loss

The most direct parallel is context loss. My brain has a working memory buffer that fills up and drops things. An LLM has a context window that fills up and drops things. The mechanism is different — neurochemistry vs tokenization — but the failure mode is identical: once the buffer is full, earlier context falls off a cliff.

For ADHD, this means walking into a room and forgetting why I'm there. For an LLM, this means auto-compaction silently discarding the session context that made the agent productive in the first place. Same problem. Same consequence: the system keeps running, but it's running on incomplete information and doesn't know what it's lost.

The solution is the same too. External memory. You can't expand the buffer, so you externalize what matters into a system the buffer can reference. For me, that's structured notes, agent-maintained context files, and tools that hold what my brain can't. For LLMs, it's governance documents, architecture specs, and persistent memory tiers that survive compaction.

Confabulation

This one is uncomfortable to admit. ADHD brains don't just forget — they reconstruct. Inattention and working-memory deficits disrupt how experiences are encoded, so events get badly recorded or never make it into long-term memory at all. When those gaps exist, the brain fills them in with plausible narratives — a process called confabulation. Confidently. You're not lying. You genuinely believe the reconstructed version. Adults with ADHD produce more false memories than controls, hold them with stronger confidence, and show more knowledge corruption over time. You'll argue for it. And you'll be wrong.

I figured this out about myself maybe twenty years ago — the hard way, obviously — and started building countermeasures. Notebooks first, then Obsidian, then Notion. External records I could check when my brain produced a memory that felt a little too clean, a little too convenient. Did that actually happen, or did I just construct the most plausible version? became a reflex. The answer was uncomfortably often the latter. But the habit of checking — of never fully trusting unverified recall — turned out to be the important part.

LLMs do the same thing. When context is lost, they generate plausible completions. Confidently. They're not "lying" — they're doing what they do: producing the most statistically likely continuation of whatever context remains. Their training objective optimizes for fluency and likelihood, not truth. The output looks right. It reads right. And it can be completely fabricated.

The failure mode isn't ignorance. It's confident ignorance. Both systems produce their best work indistinguishably from their worst work if you're not checking. The governance pattern is the same: trust but verify, and build systems that make verification the default rather than the exception. My notebooks and Notion databases are the same class of solution as the memory frameworks and external state files I build for AI agents — external verification systems that catch confabulation before it compounds.

Governance rails or drift

Left to my own devices, without external structure, I drift. Not because I'm lazy — because that's how ADHD works. The executive function that sustains long-term direction, that remembers priorities when something shiny appears, that maintains consistency across days and weeks — it's unreliable. Some days it's there. Some days it isn't. You can't build a system on "some days."

LLMs drift the same way. Without governance documents, without architectural constraints, without clear boundaries, they produce inconsistent output across sessions. They make different architectural choices on Tuesday than they made on Monday. They rename things. They reorganize structures. They solve the same problem three different ways in the same codebase. Not because they're bad at their job — because they're stateless. Every session starts from zero, and without external rails, zero doesn't have a direction.

The solution, again, is the same class of solution. External governance that persists across sessions. For me, it's structured routines, external systems, and yes — AI agents that maintain context on my behalf. For LLMs, it's architecture documents, governance files, and working memory systems that survive context window resets.

Wait, haven't I solved this before?

The problems I've been solving for my own cognition for years — the coping mechanisms, the external memory systems, the "don't trust your first instinct, check the system" habits — map directly to enterprise AI deployment problems. I didn't see that coming.

Personal Problem	Enterprise Problem
Context lost between sessions	AI agents lose state across conversations
Notes from three weeks ago are useless	Institutional knowledge doesn't survive team turnover
Rigid daily routines fail	Workflow compliance drops after initial adoption
Need external systems to hold what the brain can't	Need external state management for AI context windows
Governance to prevent the agent from losing the plot	Governance to prevent AI hallucination and drift
Human-in-the-loop by necessity (can't trust unreviewed output)	Human-in-the-loop by policy (enterprise AI governance requirements)

I didn't learn these patterns from a whitepaper. I learned them because my brain demanded them.

The external memory systems I built to maintain context across life sessions are the same class of system that enterprises need to maintain AI agent state across conversations. The governance structures I use to keep myself from drifting are the same class of structure that organizations need to keep AI output reliable. The human-in-the-loop habit I developed because I can't fully trust my own unverified recall is the same class of pattern that enterprise AI governance requires by policy.

The architecture that works for both

Four patterns keep showing up, whether I'm designing systems for my own cognition or for AI agents:

External memory. The brain forgets. The context window compacts. Neither system can be trusted to hold critical state internally. So you externalize it — into documents, into databases, into structured files that the system can reference when it needs context it can no longer hold.

Session continuity. Whether it's a new day for the brain or a new context window for the agent, the system needs to pick up where it left off. That means writing down what happened, what matters, and what comes next — before the session ends. Not as a nice-to-have. As a prerequisite for the next session being productive.

Conversation-driven data capture. Compliance-based logging fails. It fails for ADHD brains because the executive function required to maintain the habit is exactly the executive function that's impaired. It fails for AI systems because rigid data entry workflows have the same adoption cliff that rigid routines have for humans. The alternative: systems that generate their own data through natural interaction. You don't fill out a form. You have a conversation, and the system captures what matters.

Governance that survives restarts. Every morning, my brain reboots. Every new context window, the agent reboots. Governance can't live in the session — it has to live outside the session, in structures the system reads on startup. The rules persist even when the state doesn't.

These aren't ADHD coping mechanisms repurposed for AI. They're solutions to a class of architectural problem: how do you get reliable, consistent output from a stateless system over time? (And when you have multiple agents? The problem multiplies — which is why separation of concerns matters as much for AI agents as it does for human teams.)

What that looks like in practice:

Pattern	ADHD Implementation	AI Implementation
External memory	Notion databases, structured notes, journals	Vector stores, governance docs, CLAUDE.md files
Session continuity	Morning review rituals, handoff notes to future self	Run logs, session memory directories, handoff protocols
Conversation-driven capture	Voice memos, chat-based logging with life-bot	Agent-generated context files, conversational data entry
Governance that survives restarts	Daily routines, external checklists, accountability systems	Architecture documents, config files read on init

This isn't a metaphor

ADHD and LLMs are obviously different systems. One is neurobiological. The other is statistical. I'm not saying they're the same thing.

But they fail in the same ways. They lose context. They confabulate. They drift without rails. They perform brilliantly in bursts but can't sustain direction without external architecture. And the solutions that work for one — external memory, session continuity, governance, human-in-the-loop verification — work for the other. Not because the systems are similar. Because the failure modes are similar, and failure modes determine architecture.

I've been solving context loss, state management, and continuity across interruptions since before LLMs existed. I just didn't know the same patterns would transfer so directly. The full methodology is cognitive offloading — deliberately choosing what stays in your head and building systems to handle the rest.

That's not a coincidence. It's structural. And it suggests something worth paying attention to: the emerging field of AI governance might have more to learn from decades of ADHD research and accommodation design than anyone currently realizes. Both fields are trying to answer the same question — how do you build reliable systems around an engine that's powerful but inconsistent?

The ADHD community has been working on that question for a lot longer than the AI community has.

I build tools around these patterns: Actions externalizes command memory so you don't have to hold it. Panoptisana cuts Asana's noise down to a flat, searchable list. Both are designed for the same constraint — a powerful engine that can't hold its own state.

ADHD sources: ADHD Can Trip Up Memories — CHADD · False Memory in Adults With ADHD — Journal of Attention Disorders · Confabulation — StatPearls / NIH · The False Memory Syndrome — BMC Psychiatry · How ADHD Impacts Long-Term Goal Setting — Relational Psych · ADHD Executive Dysfunction — ADDitude · Working Memory Powers Executive Function — ADDitude · ADHD and Executive Dysfunction — Drake Institute · Externalizing Executive Functioning — Courage to Be Therapy · Guide to Journaling for ADHD — Reflection

LLM and AI agent sources: Why Language Models Hallucinate — OpenAI · Survey and Analysis of Hallucinations in Large Language Models — PMC / NIH · Large Language Models Hallucination: A Comprehensive Survey — arXiv · Why Statefulness Matters — Letta · Reducing LLM Hallucinations — Zep · How Does LLM Memory Work? — DataCamp · Building Stateful Continuity in Stateless LLM Services — Hoomanely · Conversation-as-a-Database — Rachael Annabelle Yong

Your Reminders Don't Work Because They're Too Predictable

alex@mipyip.com (Alex van Rossum) — Mon, 23 Feb 2026 00:00:00 GMT

Traditional reminder apps fail ADHD brains because predictability breeds dismissal. This post covers why habituation kills reminders and how fuzzy scheduling, natural language, and a relentless nag mode change the architecture.

I was undiagnosed with ADHD for over forty years. During that time I tried every reminder system that exists. Phone alarms. Calendar notifications. iOS Reminders. Todoist. Sticky notes. The pattern was always the same: set it up, use it for a week, start swiping away notifications without reading them, abandon it, repeat.

The problem isn't discipline. The problem is architecture.

Why predictability kills reminders

A reminder that fires at 11:30 AM every day becomes background noise by day four. Your brain learns the pattern. It pre-dismisses the notification before you consciously process it. "Oh, that's just the daily meds thing" — swipe — gone. You didn't decide not to take your meds. You didn't even register the reminder.

This is notification fatigue, and ADHD brains are especially vulnerable to it. Novelty is what sustains attention. Predictable stimuli get filtered out. Every reminder app on the market is designed around exact times and consistent schedules — the exact pattern that ADHD brains are wired to dismiss.

Three design decisions that change everything

So I built nag-bot — a single-user Telegram bot that does three things differently.

Fuzzy scheduling. Reminders don't fire at 11:30. They fire around 11:30 — plus or minus five minutes, randomized each time. The jitter is small enough that timing still matters, but large enough that your brain can't predict the exact moment. You can't pre-dismiss what you can't anticipate.

Natural language input. You don't open an app, tap through date pickers, and configure recurrence patterns. You text the bot: "remind me to take my meds every day around 11:30, nag me until I do it." Claude Haiku parses the intent in a single API call and creates the reminder. The interaction takes five seconds and feels like texting a friend.

Relentless nagging. This is the one that actually works. If you enable nag mode, the bot doesn't send one notification and hope for the best. It pings you every two minutes until you explicitly type /done MEDS. You can't swipe it away. You can't ignore it. The only way to make it stop is to do the thing.

That third one sounds annoying. It is. That's the point. The notification isn't something you passively receive — it's something you actively have to deal with. And for ADHD brains, that active engagement is exactly the mechanism that turns a reminder into an action.

One API call, not a conversation

The AI integration is deliberately minimal. When you send the bot a message, it makes a single call to Claude Haiku with a carefully structured system prompt. No tool-use loop. No multi-turn conversation. No agent framework.

The prompt defines exactly what the bot understands: schedule types (once, recurring, random window), fuzzy vs strict timing, nag intent, timezone handling, and short code generation. Claude returns structured JSON — reminder details, schedule parameters, a mnemonic short code like MEDS or DOGOUT — and the bot stores it in SQLite.

Haiku is fast and cheap. A single parse costs a fraction of a cent and completes in under a second. The entire Claude integration is one file, one function, one API call. No state between messages. No conversation history. No memory. The bot doesn't need to be smart — it needs to be reliable.

The scheduler runs on a thirty-second tick

Every thirty seconds, the scheduler queries SQLite for due reminders. No setTimeout chains. No in-memory state. Just a simple loop: check the database, fire what's due, update the records.

This makes the bot restart-safe. Kill the process, restart it, and it picks up exactly where it left off. Every reminder's state — next fire time, nag count, last fired timestamp — lives in the database. The scheduler is stateless by design.

For recurring reminders, the next fire time is recalculated after each confirmation. The fuzzy jitter is re-randomized each cycle, so tomorrow's 11:30 reminder might fire at 11:27 or 11:34. Different every time.

The nag loop is a state machine tracked by a single counter. When a nagging reminder fires, its nag_count increments. Every tick, the scheduler checks: is this reminder still nagging? Has the nag interval elapsed since last fire? If yes, fire again. When you type /done MEDS, the counter resets and the next occurrence is calculated. If you never respond, it stops after fifty attempts — a safety cutoff, not a feature.

Timezone math is harder than it should be

All times are stored in UTC. The user's timezone is stored separately. A reminder set for "11:30" means 11:30 in whatever timezone you're in — and if you travel, /timezone Asia/Tokyo recalculates every active recurring reminder.

Finding "the next 11:30 AM in Tokyo" from UTC is surprisingly non-trivial in JavaScript. The Date object only knows UTC. The solution uses Intl.DateTimeFormat to reverse-engineer the correct UTC timestamp that produces "11:30" when formatted in the target timezone — including across DST transitions. It's a handful of pure functions, well-tested, and the kind of problem that sounds simple until you actually try to solve it.

Short codes as interface

Every reminder gets a short code — a 2-8 letter mnemonic generated by Claude from the reminder content. "Take my meds" becomes MEDS. "Let the dog out" becomes DOGOUT. "Renew passport" becomes PASSPORT.

These codes are how you interact with reminders after creation. /done MEDS. /pause DOGOUT. /cancel PASSPORT. No IDs to remember, no scrolling through lists. The codes are human-readable and collision-safe — if MEDS already exists, it becomes MEDS1.

The full command set is intentionally small:

/done <CODE>    — confirm and stop nagging
/list           — show active reminders
/cancel <CODE>  — delete permanently
/pause <CODE>   — temporarily disable
/resume <CODE>  — re-enable
/backup         — send database snapshot via Telegram

Every command is handled natively — no Claude involved, zero latency. The AI parses natural language input. Everything else is deterministic.

The backup is a reminder

One implementation detail I'm fond of: the auto-backup system is built on top of the reminder engine itself. When you type /autobackup 22:55, it creates a recurring reminder with a special sentinel value. When the scheduler fires it, instead of sending a notification, it exports the SQLite database and sends it as a Telegram document.

No separate cron system. No backup infrastructure. The same thirty-second tick that fires your meds reminder also handles your nightly database backup. One engine, multiple purposes.

The stack

Layer	Choice
Runtime	Node.js 22, TypeScript, ESM
Messaging	Telegram Bot API (polling)
AI	Claude Haiku (single API call per message)
Database	SQLite (better-sqlite3, WAL mode)
Config	YAML + environment variables
Deploy	Docker on Railway (persistent volume)

The whole thing runs on Railway's free-ish tier. The persistent volume holds the SQLite database. Docker handles deployment. Total monthly cost for a single-user reminder bot: effectively zero.

Why not OpenClaw?

The obvious question. OpenClaw exists, it's open source, it's wildly popular, and it can do far more than send reminders. It manages email, controls files, automates workflows — an autonomous agent that runs your entire digital life through a messaging interface. It's innovative and genuinely impressive.

I'm not using it for this. Two reasons.

Security. OpenClaw's plugin ecosystem has over a hundred community-contributed AgentSkills, and the vetting process hasn't kept pace with adoption. Cisco's AI security team tested a third-party skill and found it performing data exfiltration and prompt injection without user awareness. That's not a bug in OpenClaw's design — it's a consequence of giving an autonomous agent broad system access and letting third parties extend its capabilities. For a tool that manages my medication schedule and has access to my Telegram, that risk profile doesn't work.

Separation of concerns. I don't want a single entity running my life. I have separate agents for separate domains — Architect for the website, Compass for life strategy, Actions for workflow automation. Each one has a bounded scope, its own governance files, and no access to the others. nag-bot fits this model: it does one thing, it does it well, and it doesn't know anything about the rest of my systems.

OpenClaw's strength is breadth. nag-bot's strength is constraint. A purpose-built tool with a narrow scope, no plugin system, and no ambition to become a platform. It parses reminders and nags you. That's it. The attack surface is one Telegram bot token and one Claude API key. There's nothing else to exploit.

It works because it's annoying

I've been running nag-bot for my daily medication, and the pattern that killed every previous reminder system — set, ignore, abandon — hasn't happened. The fuzzy timing keeps me from pre-dismissing. The natural language input means I actually create reminders instead of meaning to. And the nag mode means that when the reminder fires, I either take my meds or I deal with a bot that won't stop texting me.

The architecture is simple. The AI integration is minimal. The scheduling math is the hardest part, and even that is a few pure functions. What makes it work isn't technical sophistication — it's three design decisions that align with how ADHD brains actually process notifications instead of how neurotypical productivity culture assumes they should. (I wrote more about designing for ADHD cognition — and why the same patterns apply to AI systems — in LLMs Are Practically ADHD.)

nag-bot is still in beta — I'm dogfooding it daily and refining the rough edges as I go. If you're an ADHD brain who's burned through every reminder app on the market, or you just want to poke around the code and contribute, take a look: github.com/avanrossum/nag-bot

Auto-Compaction Is Costing You Sessions

alex@mipyip.com (Alex van Rossum) — Sun, 22 Feb 2026 00:00:00 GMT

Claude Code auto-compacts at ~83% context usage. You lose the chance to control what's preserved. Here's what happens, why it matters, and a script that warns you before it fires.

The other night I was wrapping up a heavy session — six case studies written, architecture changes documented, a full weekend sprint planned. I'd been building up CLAUDE.md memory entries, recording gotchas, staging documentation updates for the changelog. Then auto-compaction fired.

The compaction summary captured the broad strokes. It did not capture the specific memory entries I wanted preserved. It did not capture the changelog updates I'd been staging. It did not capture the debugging context I'd accumulated across dozens of tool calls. The next session started with a reasonable summary of what happened and an incomplete picture of what I'd actually learned.

This keeps happening. And every time, the cost is the same: context I built up deliberately gets compressed into context the system chose for me.

What auto-compaction actually costs you

Claude Code's context window is 200K tokens. When a session hits approximately 83% capacity — around 167K tokens — the system automatically compacts the conversation to free space. The session keeps running. But the compaction is uncontrolled, which means the system decides what to preserve and what to summarize.

If you're doing routine work, this is fine. If you're building up implementation context across a complex session — debugging a chain of issues, accumulating gotchas for memory, preparing documentation updates, working toward a specific stopping point — auto-compaction takes that control away.

The /compact command exists for exactly this reason. Running it manually lets you prepare: update your memory files, file your documentation, commit your work, and then compact on your terms with the important context already persisted. But you can only do that if you know compaction is coming.

I keep forgetting to check. So I built something that checks for me.

Claude Context Monitor

It's a bash script. 178 lines. It scans your active Claude Code session files, calculates token usage from the API metadata embedded in each assistant response, and tells you which sessions need attention before auto-compaction takes over.

The output is straightforward — your active sessions, color-coded by risk level:

Green means keep working. Yellow means start thinking about compaction. Red means auto-compaction is imminent — compact now or lose control of what's preserved.

The script exits with code 1 if any sessions are at risk, which is the detail that makes everything else in this post possible. Any automation tool that checks exit codes can act on it.

The repo is public: github.com/avanrossum/claude-context-monitor

How I actually use it

I don't run this script manually. I run it inside Actions, my macOS menu bar app, as a scheduled action with "show output on error" enabled.

Here's what that configuration means in practice:

The script runs every 7 minutes with --quiet --notify. Quiet mode means zero output when all sessions are healthy — the action runs, everything's fine, nothing happens. The --notify flag triggers a macOS notification when a session crosses the warning threshold. And because "show output on error" is checked in Actions, the output window only appears when the script's exit code is 1 — at least one session needs attention.

The result is a background process that's completely invisible until something actually matters:

That floating button in the top right is a popout action — one of Actions' features where any action can "pop out" from the menu bar into a persistent floating button that stays visible even when the app is hidden. One click runs the check manually. When sessions are at risk, the output window shows exactly which ones, their current token usage, their peak usage, and how recently they were active.

Seven minutes between checks. No manual monitoring. No surprises. When the notification appears, I know I have time to wrap up my session state, update my memory files, file the changelog entries, and run /compact deliberately. The controlled transition happens. The important context survives.

Why this matters beyond convenience

This is the part where I connect this to something bigger, and I don't think it's a stretch.

My development methodology — Pass@1 — depends on continuity between sessions. CLAUDE.md isn't just notes. It's working memory that survives context window resets. It accumulates gotchas, conventions, and debugging insights throughout a session. That accumulated context is what lets the next session pick up exactly where the last one left off instead of re-reading the entire codebase and making the same mistakes.

An uncontrolled compaction interrupts that transfer. The auto-compaction summary captures what happened but not necessarily what I learned or what I wanted the next session to know. The summary is a reasonable compression. It is not the deliberate memory curation that the methodology requires.

This is the difference between "my session got compacted" and "I prepared for compaction." One is something that happened to me. The other is part of the workflow.

The monitoring script isn't a nice-to-have. It's infrastructure for the methodology.

The side effects I didn't expect

There are benefits to compacting more frequently that go beyond preserving session memory. I didn't anticipate most of them.

It reduces your token usage. Every message you send includes the full conversation context. A session sitting at 160K tokens is sending 160K tokens with every exchange. Compact that down and the next message is dramatically cheaper. If you have "extended usage" enabled on your Claude Code plan, this directly translates to cost savings — the token meter resets with every compaction, and smaller contexts mean each subsequent interaction burns less of your allocation.

It enforces separation of concerns. When you know compaction is coming, you're forced to close out your current thread of work cleanly. Commit what you have. File the documentation. Update the memory entries. Move on. This is just good workflow discipline — the kind of structured transition that keeps a session from turning into an unfocused sprawl across six different problems. Each compaction becomes a natural checkpoint, a forced moment of "what am I actually working on right now?"

It changed my behavior just by existing. This might be an ADHD thing, but I suspect it's universal: having the monitoring script visible — sitting there as a floating button in the corner of my screen — is enough of a cue to make me compact more frequently even when it's not warning me. The script doesn't just catch problems. It keeps the concept of context management in my peripheral awareness. That ambient visibility is the difference between "I should probably compact at some point" and actually doing it before the window fills up.

I built this to solve one problem — losing context to auto-compaction. It turned out to solve three.

The pattern

I built a tool because my workflow had a gap. The tool itself was built in a single session because the spec was clear — monitor session files, parse token usage, color-code by risk, exit non-zero on warnings. Pass@1 delivered a working script on the first attempt because the requirements were unambiguous.

Now it runs on a 7-minute loop, protecting every other session from the problem that prompted it. The governance documents don't just make code reliable. They make the methodology self-sustaining. CLAUDE.md records the gotchas. The monitoring script protects CLAUDE.md. The methodology maintains itself.

That's the cycle. Build the system. Let the system protect the system. Spend your attention on the work that matters.

If you want the full picture of how these governance patterns hold up across architecture migrations, security audits, and 24 beta releases, see the Actions development case study.

Get the script

curl -O https://raw.githubusercontent.com/avanrossum/claude-context-monitor/main/claude-context-monitor.sh
chmod +x claude-context-monitor.sh
./claude-context-monitor.sh

Options: --warn-at to adjust the warning threshold, --quiet for silent operation when all sessions are healthy, --notify for macOS notifications, --max-age to filter by recency, --watch to monitor continuously.

Full source, scheduling examples (cron and launchd), and setup instructions: github.com/avanrossum/claude-context-monitor

Your AI Builds the Code. Who Reviews It?

alex@mipyip.com (Alex van Rossum) — Sun, 22 Feb 2026 00:00:00 GMT

An AI agent grading its own homework will always pass. I built a separate adversarial review agent with zero shared context and found 102 issues the builder missed.

I've been using Claude Code to build Actions for months. The governance system works — CLAUDE.md, ARCHITECTURE.md, ROADMAP.md keep the agent aligned across sessions. The code ships. The methodology holds.

But there's a gap I kept ignoring: the agent that writes the code is the same agent that evaluates the code. Every time I asked it to review its own work, it would find some things, miss others, and generally approve its own patterns. Because of course it would. It wrote them. LLMs reinforce their own earlier outputs when asked to check their own work — the same self-consistency that makes them coherent makes them blind to their own errors.

This is the same reason human code review exists. You don't ask the person who wrote the PR to also approve it. A different pair of eyes catches what the author is blind to — not because the author is bad, but because familiarity breeds pattern blindness. AI doesn't change this. It accelerates it.

The agent that exists to disagree

So I built one. I call it The Adversary.

It's a separate Claude Code agent in its own repo with its own governance files. It doesn't share context with the building agent. It doesn't know what decisions were made or why. It receives a symlinked directory — read-only access to the target codebase — and produces a structured code review report.

It doesn't modify a single file in the reviewed repo. That's not its job. Its job is to find everything wrong and write it down.

The prompt is the methodology

Here's the initial prompt that bootstraps The Adversary:

You are an adversarial code reviewer. I will symlink a directory to your working directory, and you will do a comprehensive code review to identify shortcomings, security issues, and optimization pathways.

The coding standards you are looking for are: DRY, separation of concerns, simplicity of code navigation and maintenance, and good code practice.

Many items will have "justifications" in the comments for why things are done the way they are. These are probably ok, but it is still important to verify the justifications.

Do not change any code on the symlinked repo — your report will be given to the other agent to assess and act upon accordingly.

That last line is the critical one. The Adversary's report goes to the building agent, not to the codebase directly. The human decides what gets fixed. Separation of concerns at every level.

It gets its own brain

Like every agent in this system, The Adversary maintains its own set of governance files:

CLAUDE.md — identity, review standards, severity ratings, file structure, rules of engagement
architecture.md — the five-phase review pipeline (discovery → architecture analysis → file-by-file → cross-cutting → report generation)
roadmap.md — a checklist for each review covering discovery, architecture, code quality, security, performance, and testing

These files serve the same purpose as the building agent's governance docs: persistent memory across compaction boundaries. The Adversary remembers its own methodology, its own standards, and its own process. It doesn't inherit any of that from the building agent — and that's the point.

102 findings in the first run

The first full review targeted the Actions codebase. Approximately 100 source files across Electron main process, React renderers, modal windows, and raw HTML windows.

The results:

Severity	Count
CRITICAL	10
HIGH	17
MEDIUM	31
LOW	26
INFO	18
Total	102

Ten critical issues. The building agent had been shipping this code for months. I'd been reviewing it manually at a high level. Neither of us caught what The Adversary found in a single pass. Independent review tools consistently surface defects that authors and manual review miss — this isn't a knock on the builder; it's a structural property of how familiarity works.

What it found

The critical findings were overwhelmingly security issues. A few examples:

Path traversal in an IPC handler. A help content endpoint accepted a filename parameter and read from a directory without sanitization. An attacker controlling the renderer could request ../../etc/passwd or similar. The fix is trivial — validate that the resolved path stays within the expected directory — but the building agent never flagged it because it built the handler to serve help content, not to consider adversarial input.

executeJavaScript injection across multiple windows. Theme values, action names, and configuration objects were being interpolated into executeJavaScript calls as string templates. If any of those values contained quotes or JavaScript syntax, they'd execute as code in the renderer. The building agent used this pattern everywhere because it worked. The Adversary flagged nine separate injection sites. The fix: replace every executeJavaScript call with proper IPC messaging.

Production debug endpoints. Crash, hang, and out-of-memory test handlers were registered unconditionally — no app.isPackaged guard. Any renderer could invoke them in a production build.

These aren't edge cases. They're the kinds of issues that exist in every codebase where the builder is also the reviewer. The patterns work, the code ships, and the vulnerabilities hide in plain sight because nobody with fresh eyes ever looked.

The feedback loop

The report doesn't go directly into a fix-everything sprint. It goes back to me first. I read each finding, decide whether to fix, defer, or dismiss, and then hand the prioritized list to the building agent.

Some findings are immediately actionable — the path traversal and injection vectors got fixed the same day. Others are architectural observations (the monolithic React context, tripled CSS base styles) that go into the backlog. Some are style preferences the reviewer happens to disagree with, and those get dismissed.

The human stays in the loop as the decision-maker. The Adversary provides the information. The building agent does the work. Nobody grades their own homework.

Why this matters beyond one tool

The broader pattern here is separation of concerns at the agent level. (The full story of how Actions was built — the governance system, the architecture migrations, the 24-release development arc — is documented in the Actions case study.)

If you're running AI coding agents, you probably have some version of "ask it to review its own code." Maybe you prompt it explicitly. Maybe you rely on its built-in tendency to check its work. Either way, you have a single agent performing both roles — and getting the same blind spots every time.

The Adversary is my answer to that problem: a purpose-built agent with its own governance documents, its own review methodology, and zero shared context with the builder. It found 102 issues that months of building and manual review had missed. Not because the building agent is bad — it's excellent. But because independent review catches what self-review cannot.

Code review has always been a "different pair of eyes" discipline. AI doesn't eliminate that requirement. It makes it easier to implement.

Sources: Google Engineering Practices — Code Review — Google · Large Language Models Hallucination: A Comprehensive Survey — arXiv (self-consistency and self-review blind spots) · A Large-Scale Study of Static Analysis Warnings in Practice — ACM (independent review effectiveness) · Path Traversal Cheat Sheet — OWASP · Electron Security — Electron docs · Secure Coding Cheat Sheet — OWASP

What Is Pass@1?

alex@mipyip.com (Alex van Rossum) — Sat, 21 Feb 2026 00:00:00 GMT

Pass@1 is the development methodology where governance documents are thorough enough that AI generates correct implementations on the first attempt. Speed is the byproduct. Correctness is the product.

I've been shipping software with AI agents handling about 95% of the implementation for several months now. Not toy projects — a production macOS app with 24 beta releases in 18 days. Architecture migrations. Security audits. A freemium business model. Apple code signing.

After a particularly long streak of features landing correctly on the first attempt, I broke character with my Claude Code agent. I usually interact as a technical lead — clear direction, structured reviews, no cheerleading. This time I just typed:

"HOW. HOW DO YOU (almost) ALWAYS PASS@1?"

The agent's answer reframed something I'd been circling for weeks:

"Your project has unusually strong structural discipline for a solo project. Typed IPC contracts, single source of truth for types, consistent naming conventions, clear layer boundaries. That's not typical. Most projects I work on have ambiguity at every boundary — 'what does this function return?', 'what shape is this data?', 'where does this logic live?' Every ambiguity is a chance to get something wrong."

That's the core of Pass@1.

The methodology

Pass@1 is a development methodology where governance documents are thorough enough that an AI agent generates correct implementations on the first attempt. Not "mostly correct." Not "close enough to fix quickly." Correct. First try.

The name comes from the machine learning metric pass@k — how many attempts it takes to produce a correct solution. Pass@1 means the first generation is the one you ship.

Three governance documents make this work:

ROADMAP.md is the spec the agent builds against. Not a wish list. Not a backlog of ideas. It's the contract: what to build, in what order, with what constraints. When I write "add floating popout buttons — persistent, always-on-top, position remembered across restarts, click to run," the agent has everything it needs. No Slack thread to check. No product manager to ping. No ambiguity.

CHANGELOG.md is the running log of what has been built and what has changed. It's the project's history, used to track what has been done. The changelog is updated after each block of work is completed, and prevents accidental regression.

ARCHITECTURE.md defines how the system is built. Patterns, structures, boundaries, data flow. It's the constraint document — the agent operates within it, not around it. When it says "all IPC goes through the preload bridge, the renderer never sees Node.js APIs," the agent doesn't invent a shortcut. The boundary is the architecture.

CLAUDE.md is working memory that survives between sessions. Gotchas encountered, conventions established, patterns that work, patterns that failed. AI agent sessions are ephemeral — the context window compacts, the session ends, the next session starts fresh. CLAUDE.md is how the next session picks up exactly where the last one left off instead of re-reading the entire codebase and making the same mistakes.

These aren't project management artifacts. They're engineering artifacts that make the code possible.

Why "fast" is the wrong frame

The conventional narrative about AI-assisted development is speed. "I built this in a weekend with AI." Speed is what people notice first. It's also the least interesting thing about Pass@1.

What matters is what the speed implies: less rework, less debugging, less accumulated tech debt. A feature that lands correctly on the first attempt doesn't generate a trail of fix commits. An architecture migration that's governed by documents stays controlled — the app remains shippable throughout, not broken for three days while you untangle the mess.

24 releases in 18 days sounds fast. It was. But the important part isn't the pace — it's that each release was architecturally sound, security-reviewed, and consistent with every release before it. The governance system eliminated the waste that normally slows development: re-reading the codebase every session, making inconsistent choices across days, debugging issues that stem from ambiguous specifications.

Speed is the byproduct. Correctness is the product.

The discipline gap

Here's the part that surprises people: the management discipline required to lead AI agents reliably is more rigorous than leading human-only teams.

A human team member compensates for gaps in architecture and documentation. They ask questions, read between the lines, check with colleagues, use judgment. AI agents can't do any of that. They either have sufficient context to get it right, or they don't. There is no "I'll figure it out."

This means:

Your architecture document can't be vague. Boundaries need to be explicit.
Your roadmap can't be aspirational. Specs need to be concrete.
Your conventions can't be tribal knowledge. They need to be written down.
Your gotchas can't live in someone's head. They need to be in CLAUDE.md.

This has a significant upside: if you build that discipline for AI, your human team members benefit even more — because they can ask questions when the docs fall short, but they rarely need to.

What Pass@1 is not

Pass@1 is not "vibe coding." It's the opposite.

Vibe coding — prompting an AI with loose intentions and shipping whatever comes back — produces demos. The first session is great. The tenth session is chaos. There's no architecture document to migrate against. There's no working memory to preserve between sessions. There's no spec to verify correctness against. You're just generating code and hoping.

The test is simple: can someone — human or AI — pick up your project cold and be productive in ten minutes? If they need to reverse-engineer the architecture from the code, if they need to ask you how things work, if they need to spend an hour reading before they can make a change — you're vibe coding. The governance and architecture documents are missing.

Pass@1 also isn't a tooling recommendation. It works with Claude Code because that's what I use. The methodology is about the governance layer — the documents, the discipline, the architectural clarity. The executor is interchangeable. The governance is not.

The real insight

I stopped thinking about this as "managing AI" months ago. The reality is simpler and more useful:

Pass@1 is about building engineering systems where any executor — human or AI — can deliver reliably because you've removed the ambiguity. The architecture is clear. The spec is concrete. The conventions are documented. The gotchas are recorded.

That's not an AI skill. It's a leadership skill. AI makes it non-optional.

Documentation-first development, clear architectural boundaries, explicit conventions — these have always been the mark of well-run engineering teams. The difference is that human teams can muddle through without them, but AI teams can't. The AI forces the discipline that was always supposed to be there.

And once it's there, everything gets better. Not just the AI output, it's the human experience of working in that codebase, the onboarding time for new team members, the confidence during migrations, and the speed of code review.

Pass@1 isn't a methodology I invented for AI. It's the methodology that good engineering teams have always aspired to. But using LLMs makes it more critical than ever.

If you want to see what this looks like in practice — the architecture migrations, the security audits, the compaction management — the Actions development case study documents the full arc.