Case Study

Designing an AI Sprint Management System

A human-in-the-loop agent that orchestrates sprint workflows, translates between technical and client-facing language, and maintains continuity across sessions.

~3 hrs → <1 hr

Daily PM time reduced

0 dropped

Tasks forgotten since deployment

5-tier

Persistent memory architecture

The Short Version

Three hours a day on project management. With ADHD, that's not just overhead — it's a cascade risk.

I was tracking tasks across multiple clients, writing status updates, logging time, making sure nothing fell through the cracks. Context loss wasn't a theoretical risk — it was a daily one. So I designed a system to handle it.

An AI agent that orchestrates my sprint workflow — it holds the state of every active task, generates the operations I need to execute, translates between technical and client-facing language, and maintains continuity across sessions even when its own context window resets.

The key design decision: the agent can think, plan, and generate. It cannot execute. Every operation goes through me.

The Problem

Project management was eating my day

Not because the tasks were hard, but because the volume and context-switching cost was high. As the sole technologist at a digital agency, I managed infrastructure, development, and operations for multiple concurrent projects — each with their own task queues, priorities, and communication expectations.

Constant Context Switching

Multiple clients, each with their own task queues, priorities, and communication expectations.

Sprint Lifecycle

Intake, prioritization, capacity planning, mid-sprint adjustments, time tracking — all manual.

Communication Translation

Status updates needed to be technically accurate internally but client-appropriate externally.

Time Tracking Overhead

Everhour isn't native to Asana — additional steps and another thing to remember every time work was logged.

The ADHD Tax

When you're managing a dozen workstreams over capacity, the cost of forgetting one task is a cascade of trust damage and rework.

"What Am I Forgetting?"

The constant low-grade anxiety. I needed a system that held state better than my brain could under load.

Design Constraints

Human-in-the-loop by design, not by necessity

Before building anything, I made a decision that shaped every architectural choice that followed: the AI agent would not be allowed to execute operations directly. I could have given it API access. I chose not to.

This constraint — human-in-the-loop by design — produced better architecture than full automation would have. Every pattern described below exists because the agent generates and the human executes.

Operational Safety

Time logging isn't idempotent — duplicates mean incorrect billing. Task comments reach clients through intermediaries. These are operations where a mistake has a cost.

Trust Calibration

The review-and-run loop let me calibrate trust incrementally. As trust was established in specific areas, I selectively granted direct execution — but only where the risk profile justified it.

Auditability

Every operation is visible, reviewable, and logged. There is no "the AI did something and I'm not sure what." The execution trail is explicit by design.

Architecture

Three-Tier Integration

The system integrates with Asana through three channels, each serving a different purpose. Simple reads go through MCP. Single operations go through the bridge. Complex sequences get scripted for review and execution as a unit.

Eyes

MCP Tools

Direct Asana API — read operations, task lookups, searches. The agent can see everything.

Hands

HTTP Bridge

A custom proxy for create, comment, and move operations. The hands are mine, not the agent's.

Batch

Generated Scripts

Complex multi-step operations — the agent writes them, I review and run them.

State Management

Multi-layered caching with eventual consistency

Asana is the source of truth — even though it's not great for agile. API calls are expensive. The solution is a caching architecture where each layer serves a distinct purpose.

Asana API (source of truth)
    ↓
sprint-snapshot.sh (hourly sync + on-demand)
    ↓
snapshots/*.json (timestamped state captures)
    ↓
task-cache/active/*.json (field-level change tracking)
    ↓
CHANGELOG.md (human-readable activity log)
    ↓
CLAUDE.md (working memory — rules, identifiers, patterns)
Snapshots

Enable change detection. Compare two snapshots and you know what moved, what was added, what someone else modified.

Task Cache

Field-level detail without hitting the API. Current section, assignee, story points, comment history — all from local state.

CHANGELOG

Human-readable audit trail. What happened today. What's pending. What decisions were made and why.

CLAUDE.md

The agent's working memory. Rules, identifiers, patterns. When a context window compacts, this file is how the next session reconstructs operational awareness.

The system accepts eventual consistency. Local cache may diverge briefly between snapshots — but before any write operation, a quick MCP call checks current state. The agent never writes based on stale data.

Signature Pattern

Idempotent self-modifying scripts

This is the pattern I'm proudest of — and it exists entirely because of the human-in-the-loop constraint. When the agent generates a batch script, it structures operations in isolated sections:

# ===BEGIN_SECTION: update-task-12345-section ===
curl -X POST "https://app.asana.com/api/..." \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"data": {"task": "12345"}}'
if [ "$http_code" = "200" ]; then
  comment_out_section "update-task-12345-section"
fi
# ===END_SECTION: update-task-12345-section ===

On success, the section comments itself out. On failure, it stays active. This means:

Safe to re-run

If a script partially fails, completed sections are already commented out. Only pending work executes on the next run.

Self-documenting

Commented sections = completed. Active sections = pending. The script is simultaneously an operation plan and a status report.

Non-idempotent ops protected

Time logging would create duplicates on re-run. Auto-commenting ensures each entry is logged exactly once.

Clean archive trail

Completed scripts are preserved for history. New scripts start clean. A running record of every operation the system has performed.

Memory Architecture

Five tiers of persistent memory

AI agents have a fundamental problem: they forget. Context windows compact. Sessions end. The system uses five tiers of persistent memory to solve this.

Tier Location Purpose
1 CLAUDE.md Working memory — rules, identifiers, patterns
2 CHANGELOG.md Activity log — what happened, what's pending
3 snapshots/*.json Sprint state at points in time
4 task-cache/*.json Full task details with change history
5 sprint-mgmt.sh Commented sections = completed work

A "sleep cycle" procedure syncs all five tiers before a session ends. When a new session starts, the agent reads the top tiers to reconstruct operational awareness, then validates against the latest snapshot. Session continuity without relying on the context window.

Daily Operations

What the system does every day

~10 min

Morning Sync

Pull latest snapshot, diff against previous, identify overnight changes. Generate a summary of what shifted and what needs attention.

~5 min

Intake Processing

Hourly scan for new tasks. Strip HTML formatting from forwarded emails, generate executive summaries, recommend subtask splits. Agent generates the script — I run it.

~15-30 min

Sprint Operations

The bulk — moving tasks through lifecycle stages, updating story points, managing retainer capacity, logging time with decimal precision, posting status comments.

As needed

Communication Translation

Client-facing comments drafted by the agent. "Divi Blog Module cache dependency invisibility" becomes "Custom blog pages required explicit cache refresh configuration." Direct, clear, CTO-level.

~5 min

End of Day

Sleep cycle. Sync all state layers, update CHANGELOG, flag anything pending for tomorrow.

Retrospective

What I'd do differently

This system has been iterated on continuously. Some of the things I'd change, I've already changed. That's the nature of building a tool you use every day.

Simplify the HTTP bridge

It started as a quick solution and accumulated complexity. Rebuilding today, I'd consolidate more operations through MCP tools.

Higher snapshot frequency from day one

Started at 2-3x daily — too wide a consistency window during heavy sprints. Now runs hourly. Design for the busy case, not the average.

Automate memory hygiene earlier

CLAUDE.md accumulated stale rules over time. I've since built cleanup logic into the sleep cycle. Early investment would have saved messy sessions.

Accept platform limitations gracefully

Sprint transition is half-automated, half-manual — and the manual parts are Asana's fault. Some operations simply aren't exposed via API.

The Design Insight

The best architecture isn't the one with the fewest constraints. It's the one that turns its constraints into structural advantages.

The agent's job is to think — hold state, reason about priorities, generate operations. The human's job is to act — review, execute, and feed results back. This separation produces properties that full automation doesn't.

Every operation is auditable

Because a human reviewed and ran it.

Failures are contained

Because the human saw the error in real time.

Trust is earned incrementally

Because the human watches the agent's reasoning before extending its scope.

Architecture is forced toward resilience

Because scripts must be safe to re-run, operations idempotent, state persistent.

Technical Stack

AI Claude (Anthropic) via Claude Code + API
Project Mgmt Asana (API + MCP)
Time Tracking Everhour (API integration)
State JSON snapshots, file-based cache, markdown memory
Scripting Bash with AWK-based self-modification
Integration Custom HTTP bridge (Python/FastAPI), MCP tools

More case studies

See how I approach different kinds of problems — from crisis infrastructure rescue to AI-governed development.