The Agent That Kept Breaking Things
Last year I watched a Claude Code agent implement a feature across 14 files. It passed tests. The PR looked clean. We merged it.
Two days later, three other features broke.
The agent hadn't done anything wrong, exactly. It just hadn't thought about the blast radius. It didn't know which files other teams depended on. It didn't check whether its changes leaked internal details across module boundaries. It coded like a contractor who finishes the bathroom remodel without checking if the plumbing still works in the kitchen.
This is the dirty secret of AI-assisted development in 2026: agents can write code, but they can't think about systems. Not yet.
So I built something to fix that.
What Are Dev-Skills and TAP-Skills?
They're two Claude Code plugins that work together as an operating system for human+agent development teams:
- Dev-skills handles the product-to-code pipeline — discovery, shaping work, planning, implementation, QA
- TAP-skills handles the meta-layer — repo readiness, blast radius analysis, system health, retrospectives
Together, they give your AI agents the same structured thinking that good senior engineers have. Not by making the AI smarter — but by giving it a process that catches the mistakes raw intelligence misses.
Both plugins are built on a foundation that might surprise you: a software design textbook.
Why A Philosophy of Software Design Changes Everything for AI Agents
John Ousterhout's A Philosophy of Software Design is a book about how humans should write software. But its principles turn out to be even more important when agents write software.
Here's why.
An AI agent is, in Ousterhout's terms, the ultimate tactical tornado — it codes fast, produces working features, and leaves a trail of complexity in its wake. Without guardrails, agents create exactly the problems the book warns about:
- Shallow modules — agents create lots of small wrappers that add nothing
- Information leakage — agents spread the same knowledge across multiple files
- Pass-through methods — agents add layers that just forward calls
- Change amplification — one conceptual change requires touching dozens of files
The dev-skills and tap-skills plugins encode Ousterhout's principles as agent workflow checks. Not as optional guidelines — as structural constraints on how work gets shaped, planned, and reviewed.
Deep Modules, Not Shallow Wrappers
┌─────────────────────┐ ┌─────────────────────┐
│ Shallow Module ✗ │ │ Deep Module ✓ │
│ │ │ │
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │
│ │ Interface │ │ │ │ Interface │ │
│ │ (many params, │ │ │ │ (simple, few │ │
│ │ complex types) │ │ │ │ parameters) │ │
│ ├─────────────────┤ │ │ ├─────────────────┤ │
│ │ Implementation │ │ │ │ │ │
│ │ (thin, just │ │ │ │ Implementation │ │
│ │ forwards) │ │ │ │ (significant │ │
│ └─────────────────┘ │ │ │ functionality │ │
│ │ │ │ hidden here) │ │
└─────────────────────┘ │ │ │ │
│ └─────────────────┘ │
└─────────────────────┘
Every feature shaped through dev-skills gets checked: does this primitive have a simple interface hiding significant functionality? Or is the interface as complex as what's behind it?
When an agent proposes splitting a system into 12 tiny services, the skill flags it: "These primitives are shallow — the interface is as complex as the implementation. Consider consolidating into 3 deep modules."
Information Hiding at Every Layer
When dev-skills shapes work into implementation plans, it checks: does each module encapsulate a design decision? Can the internals change without rippling outward?
When tap-skills reviews a PR with blast-radius analysis, it traces: which modules depend on the changed interfaces? What shared state was touched? Are internal details leaking?
Define Errors Out of Existence
When acceptance criteria are written, the skill pushes toward designs where error cases can't happen. Deleting something that doesn't exist? That's a no-op, not an error. Missing optional field? Default, don't crash.
This isn't just good design — it's what makes agent-written code actually reliable in production.
The Dev-Skills Pipeline: From Idea to Verified Feature
Dev-skills provides six skills that form a complete development pipeline:
Product Thinking → Discovery → Shape → Plan → Implement → QA
1. Product Primitives — Break Systems Into Building Blocks
Before writing any code, break the system into deep, composable primitives. Each primitive encapsulates a design decision and hides its internals.
/dev-skills:product-primitives
> Break down our notification system into primitivesThe skill identifies building blocks, their interfaces, and how they compose. It flags red flags from Ousterhout: shallow primitives, information leakage, temporal decomposition (splitting by execution order instead of knowledge domain).
This is the step most teams skip — and the reason most agent-built features create tech debt.
2. Product Discovery — Validate Before Building
Not every idea deserves code. Discovery tests four risks: value, usability, feasibility, viability.
/dev-skills:product-discovery
> Should we build real-time collaboration for our editor?It designs cheap experiments (interviews, prototypes, fake doors) and sets evidence gates: proceed if X, pivot if Y, stop if Z. Every experiment should cost 10x less than building the feature.
3. Shaping Work — Define What to Build
Takes messy inputs — Slack threads, PRDs, customer complaints — and produces clear work definitions with acceptance criteria. No implementation details, no jargon, just what needs to be true when this is done.
/dev-skills:shaping-work
> Shape this customer request into a work definition:
> "Users keep asking for a way to export their data as CSV"The output: user stories, acceptance criteria, risks, unknowns. Ready for an agent to plan against.
4. Implementation Planning — Design the Technical Approach
This is where research meets architecture. The skill spawns three parallel sub-agents:
- Locate — finds relevant files without reading them
- Patterns — finds similar implementations to model after
- Analyze — traces how a related feature works end-to-end
Then it produces a phased plan where each phase is independently verifiable. If you've used the Research → Plan → Implement framework, this is the planning phase on steroids.
5. Implement Change — Execute Phase by Phase
Takes the plan and builds it. One phase at a time. Verifies after each phase. Fixes issues before proceeding. Adapts when reality doesn't match the plan.
The key difference from raw agent coding: it reads all relevant files before writing anything. It follows existing patterns. It doesn't make unrelated "improvements." It tracks progress with checkboxes in the plan file.
6. QA Test — Verify in a Real Browser
After implementation, the skill verifies acceptance criteria in an actual browser using Chrome DevTools MCP:
/dev-skills:qa-test
> Verify the CSV export feature worksIt gathers criteria from the PR or work definition, opens Chrome, exercises each criterion, and reports pass/fail with evidence. Console errors, failed network requests, DOM state — all checked automatically.
The TAP-Skills Meta-Layer: Making the System Smarter Over Time
Dev-skills handles individual features. TAP-skills handles the system — is the team healthy? Are agents getting more autonomous? Are we catching problems before production?
Tap Audit — Assess Repo Readiness
Before an agent starts working in a new repo, tap-audit scores how ready the codebase is for autonomous work:
/tap-skills:tap-auditIt evaluates documentation, MCP servers, CLI tools, permissions, test infrastructure, and design complexity. The output: a readiness level (FULL / PARTIAL / MINIMAL) and 3-5 leverage points — the cheapest fixes that would most improve agent autonomy.
It also generates a compressed architecture file (.tap/architecture.md, ~50 lines) that agents read before starting work. Think of it as a CLAUDE.md specifically optimized for agent consumption.
Blast Radius — Know What Your PR Actually Affects
This is the skill that would have saved me from that 14-file incident.
/tap-skills:blast-radius
> Review PR #247It traces impact outward: direct changes → dependents → their dependents. It checks shared state (database schema, API contracts, config, global state, styles). It assigns a risk level and generates a verification checklist.
The critical design decision: agents will not approve PRs. That's the human's job. Blast-radius tells the human where to focus their attention so they're not rubber-stamping 500-line diffs.
Systems Health — Diagnose What's Slowing You Down
Measures the development system using stocks, flows, and feedback loops from systems thinking:
/tap-skills:systems-healthIt pulls data from git, GitHub, and CI. Are PRs backing up? Is cycle time increasing? Are bugs accumulating faster than they're resolved? Are tests being disabled instead of fixed?
It also measures complexity signals inspired by Ousterhout:
- Change amplification — are commits touching more files over time?
- Cognitive load — are large files accumulating more churn?
- Shotgun surgery — what percentage of commits touch 5+ files across 3+ directories?
Every claim backed by data. No vibes.
Retrospective — Learn From Every Event
After a feature ships, an incident resolves, or an agent pattern fails, run a retrospective:
/tap-skills:retrospectiveThe core question: "What happened that an agent couldn't handle autonomously, and what's the cheapest fix so it can next time?"
It classifies root causes into five gap types:
| Gap type | Meaning | Typical fix |
|---|---|---|
| Context | Missing CLAUDE.md guidance | Add patterns/conventions to docs |
| Harness | Missing MCP server or CLI tool | Install the tool, configure access |
| Feedback | No tests for the affected area | Add test coverage |
| Design | High coupling, inconsistent patterns | Refactor module boundaries |
| Scope | Work definition too ambiguous | Re-shape with clearer criteria |
Learnings append to .tap/learnings.md. Agents read this file before starting work. The system genuinely improves over time — each failure becomes a structural fix.
How It All Fits Together
Agent enters repo
│
▼
tap-audit ──► assess readiness, identify gaps
│
▼
dev-skills pipeline
(primitives → shape → plan → implement → QA)
│
▼
Agent opens PR
│
▼
blast-radius ──► human reviews impact, merges
│
▼
systems-health ──► periodic health check
│
▼
retrospective ──► capture learnings, file improvements
│
▼
(improvements ship, agents need less help next time)
The .tap/ directory is project memory:
tap-audit.md— readiness assessment (cached, refreshed when repo changes)architecture.md— compressed architectural decisionssystem-health.md— latest health metricslearnings.md— append-only retrospective insights
Every cycle through this loop makes the next cycle faster. That's not marketing — it's systems thinking applied to development process.
The Design Philosophy Advantage
Most AI agent workflows are built by prompt engineers. These plugins are built by software designers.
That difference shows up everywhere:
| Problem | Typical approach | This approach |
|---|---|---|
| Code quality | "Tell the agent to write good code" | Encode what "good code" means structurally — deep modules, information hiding, minimal interfaces — and check at every phase |
| PR review | Review the agent's PR manually | Automatically trace blast radius, surface blind spots (event emitters, dynamic dispatch, env-var behavior), generate verification checklists ranked by risk |
| Learning from failures | Hope the agent doesn't repeat mistakes | Classify failures by root cause, file improvement tickets, append to a learnings file that agents read before starting |
Ousterhout wrote that the reward for being a good designer is that you can automate more of the coding. These plugins are proof of that claim.
Getting Started
Both plugins are available as Claude Code plugins:
- Install dev-skills — provides the product-to-code pipeline
- Install tap-skills — provides the meta-layer for team health
Start with tap-audit on your repo to see where you stand. Then use the dev-skills pipeline on your next feature. The ROI shows up on the first PR that doesn't break anything else.
If you're already using the Research → Plan → Implement framework, dev-skills is the natural evolution — same philosophy, deeper integration, team-scale thinking.
Running multiple agent sessions? Claude Peek lets you monitor and approve permissions across all sessions from your Mac's notch — essential when you have agents working in parallel.
Frequently Asked Questions
Do I need both plugins or can I use them separately?
They work independently. Dev-skills handles the build pipeline, tap-skills handles the meta-layer. But they compound — tap-skills retrospectives identify gaps that dev-skills workflows prevent, and dev-skills implementation quality reduces tap-skills blast radius findings. Start with whichever matches your biggest pain point.
How is this different from Claude Code's built-in agent teams?
Claude Code's agent teams handle parallel task execution — multiple agents working on subtasks simultaneously. Dev-skills and tap-skills handle the workflow wrapping that execution: what to build (shaping), how to build it (planning), whether it's safe (blast radius), and what to improve (retrospectives). They're complementary, not competing.
What does "grounded in A Philosophy of Software Design" actually mean in practice?
Every skill checks for specific design quality signals. When shaping work, it flags designs where the same knowledge appears in multiple places (information leakage). When planning, it checks that each module has a simple interface hiding significant functionality (deep modules). When reviewing PRs, it measures change amplification and shotgun surgery. These aren't abstract principles — they're concrete checks built into the workflow.
How does the system get smarter over time?
Through the .tap/learnings.md file. Every retrospective appends classified findings. Agents read this file before starting work, which means they avoid known pitfalls automatically. After 10 retrospectives, you'll notice agents making fewer of the same mistakes. After 20, the system is noticeably more autonomous.
Can I customize the skills for my team's conventions?
Yes. Skills are markdown files — edit them to match your team's patterns, add project-specific checks, or adjust the workflow stages. The design philosophy principles are baked in, but the implementation details are yours to tune.
What size team benefits most from this?
Teams of 2-10 developers who ship regularly and use Claude Code as part of their workflow. Solo developers benefit from the dev-skills pipeline. The tap-skills meta-layer shines when multiple people (or agents) contribute to the same codebase and need coordination.
These plugins are built by Team Brilliant, where we build tools for human+agent development teams. Questions? alex@teambrilliant.ai