I use a plan skeleton file to give Claude Code a structure and format to write plans based on my requests. I used my previous revision to create 44 code changes over the past two weeks. Yesterday afternoon, I analyzed the resulting plans to eliminate the parts from the skeleton that proved to be of little use and focus on the most useful conventions.
┌──────────┬──────────────┬───────┬─────────────┬──────────────────────┐
│ Prefix │ Convention │ Plans │ Occurrences │ Verdict │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ step-N │ Steps │ 44/44 │ 293 │ KEEP │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ dNN- │ Decisions │ 44/44 │ 343 │ KEEP │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ tNN- │ Tables │ 29/44 │ 91 │ KEEP │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ sNN- │ Specs │ 28/44 │ 120 │ KEEP │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ rNN- │ Risks │ 24/44 │ 53 │ KEEP │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ lNN- │ Lists │ 9/44 │ 15 │ KEEP │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ mNN- │ Milestones │ 5/44 │ 18 │ KEEP │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ qNN- │ Questions │ 3/44 │ 6 │ KEEP │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ cNN- │ Concepts │ 1/44 │ 2 │ DROP (only skeleton) │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ diagNN- │ Diagrams │ 1/44 │ 3 │ DROP (1 real plan) │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ op- │ Refactor ops │ 0/44 │ 1 │ DROP (skeleton only) │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ cmd- │ CLI commands │ 0/44 │ 2 │ DROP (skeleton only) │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ type- │ Schema types │ 0/44 │ 2 │ DROP (skeleton only) │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ seq- │ Sequences │ 0/44 │ 1 │ DROP (skeleton only) │
├──────────┼──────────────┼───────┼─────────────┼──────────────────────┤
│ fixture- │ Fixtures │ 0/44 │ 4 │ DROP (skeleton only) │
└──────────┴──────────────┴───────┴─────────────┴──────────────────────┘
The conclusion was: 8 prefixes actively used, 7 dead. I also simplified the numbering system made a couple other small edits. The result is a more concise jumping-off point for detailed plans to accomplish complex code changes.
The new plan skeleton v2 shown below.
Files:
Plan Skeleton v2 (447 lines)
<!-- tugplan-skeleton v2 -->
## <Plan Title> {#phase-slug}
**Purpose:** <1–2 sentences. What capability ships at the end of this phase?>
---
### Plan Metadata {#plan-metadata}
| Field | Value |
|------|-------|
| Owner | <name> |
| Target branch | <branch> |
| Last updated | <YYYY-MM-DD> |
---
### Phase Overview {#phase-overview}
#### Context {#context}
<1–2 paragraphs. What problem are we solving, and why now?>
#### Strategy {#strategy}
<3–7 bullets. The approach and sequencing philosophy for this phase.>
#### Success Criteria (Measurable) {#success-criteria}
> Make these falsifiable. Avoid "works well".
- <criterion> (how to measure / verify)
- <criterion> (how to measure / verify)
#### Scope {#scope}
1. <Scope item>
2. <Scope item>
3. <Scope item>
#### Non-goals (Explicitly out of scope) {#non-goals}
- <Non-goal>
- <Non-goal>
#### Dependencies / Prerequisites {#dependencies}
- <Dependency>
- <Prerequisite>
#### Constraints {#constraints}
- <platform/tooling/perf/security constraints>
#### Assumptions {#assumptions}
- <assumption>
- <assumption>
---
### Reference and Anchor Conventions (MANDATORY) {#reference-conventions}
This plan format relies on **explicit, named anchors** and **rich `References:` lines** in execution steps.
#### 1) Use explicit anchors everywhere you will cite later
- **Technique**: append an explicit anchor to the end of a heading using `{#anchor-name}`.
- Example:
- `### Design Decisions {#design-decisions}`
- `#### [D01] Workspace snapshots are immutable (DECIDED) {#d01-snapshots-immutable}`
- **Why**: do not rely on auto-generated heading slugs; explicit anchors are stable when titles change.
#### 2) Anchor naming rules (lock these in)
- **Allowed characters**: lowercase `a–z`, digits `0–9`, and hyphen `-` only.
- **Style**: short, semantic, **kebab-case**, no phase numbers (anchors should survive renumbering).
- **Prefix conventions (use these consistently)**:
- **`step-N`**: execution step anchors, e.g. `{#step-1}`, `{#step-2}`, `{#step-3}`
- **`step-N-M`**: substep anchors, e.g. `{#step-2-1}`, `{#step-2-2}`
- **`dNN-...`**: design decisions (`[D01]`) anchors, e.g. `{#d01-sandbox-copy}`
- **`qNN-...`**: open questions (`[Q01]`) anchors, e.g. `{#q01-import-resolution}`
- **`rNN-...`**: risk notes (`Risk R01`) anchors, e.g. `{#r01-perf-regression}`
- **`lNN-...`**: lists (`List L01`) anchors, e.g. `{#l01-supported-ops}`
- **`mNN-...`**: milestones (`Milestone M01`) anchors, e.g. `{#m01-first-ship}`
- **`sNN-...`**: specs (`Spec S01`) anchors, e.g. `{#s01-command-response}`
- **Domain anchors**: for major concepts/sections, use a clear noun phrase, e.g. `{#cross-platform}`, `{#config-schema}`, `{#error-scenarios}`
#### 3) Stable label conventions (for non-heading artifacts)
Use stable labels so steps can cite exact plan artifacts even when prose moves around:
- **Design decisions**: `#### [D01] <Title> (DECIDED) {#d01-...}`
- **Open questions**: `#### [Q01] <Title> (OPEN) {#q01-...}`
- **Specs**: `**Spec S01: <Title>** {#s01-slug}` (or make it a `####` heading if you prefer)
- **Tables**: `**Table T01: <Title>** {#t01-slug}`
- **Lists**: `**List L01: <Title>** {#l01-slug}`
- **Risks**: `**Risk R01: <Title>** {#r01-slug}`
- **Milestones**: `**Milestone M01: <Title>** {#m01-slug}`
Numbering rules:
- Always use **two digits**: `D01`, `Q01`, `S01`, `T01`, `L01`, `R01`, `M01`.
- Never reuse an ID within a plan. If you delete one, leave the gap.
#### 4) `**Depends on:**` lines for execution step dependencies
Steps that depend on other steps must include a `**Depends on:**` line that references step anchors.
**Format:**
```markdown
**Depends on:** #step-1, #step-2
```
**Rules:**
- Use **anchor references** (`#step-N`), not step titles or numbers
- Omit the line entirely for steps with no dependencies (typically Step 1)
- Substeps implicitly depend on their parent step; only add explicit dependencies for cross-substep relationships
- Multiple dependencies are comma-separated
- Dependencies must reference valid step anchors within the document (validated by `tug validate`)
---
#### 5) `**References:**` lines are required for every execution step
Every step must include a `**References:**` line that cites the plan artifacts it implements.
Rules:
- Cite **decisions** by ID: `[D05] ...`
- Cite **open questions** by ID when the step resolves/de-risks them: `[Q03] ...`
- Cite **specs/lists/tables/risks/milestones** by label: `Spec S15`, `List L03`, `Tables T27-T28`, `Risk R02`, `Milestone M01`, etc.
- Cite **anchors** for deep links in parentheses using `#anchor` tokens (keep them stable).
- **Do not cite line numbers.** If you find yourself writing "lines 5–10", add an anchor and cite that instead.
- Prefer **rich, exhaustive citations**. Avoid `N/A` unless the step is truly refactor-only.
**Good References examples:**
```
**References:** [D05] Sandbox verification, [D12] Git-based undo, Spec S15, Tables T21-T25,
(#session-lifecycle, #worker-process-mgmt, #config-precedence)
```
```
**References:** [D01] Refactoring kernel, [D06] Python analyzer, List L04,
Table T05, (#op-rename, #fundamental-wall)
```
**Bad References examples (avoid these):**
```
**References:** Strategy section (lines 5–10) ← uses line numbers
**References:** See design decisions above ← vague, no specific citations
**References:** N/A ← only acceptable for pure refactor steps
```
---
### Open Questions (MUST RESOLVE OR EXPLICITLY DEFER) {#open-questions}
> Open questions are tracked work. If a question remains open at phase-end, explicitly defer it with a rationale and a follow-up plan.
#### [Q01] <Question title> (OPEN) {#q01-question-slug}
**Question:** <what is unknown / undecided?>
**Why it matters:** <what breaks or becomes expensive if we guess wrong?>
**Options (if known):**
- <option>
- <option>
**Plan to resolve:** <prototype / benchmark / spike / research / decision meeting>
**Resolution:** OPEN / DECIDED (see [DNN]) / DEFERRED (why, and where it will be revisited)
---
### Risks and Mitigations {#risks}
| Risk | Impact | Likelihood | Mitigation | Trigger to revisit |
|------|--------|------------|------------|--------------------|
| <risk> | low/med/high | low/med/high | <mitigation> | <trigger> |
**Risk R01: <Title>** {#r01-risk-slug}
- **Risk:** <1 sentence>
- **Mitigation:** <1–3 bullets>
- **Residual risk:** <what remains true even after mitigation>
---
### Design Decisions {#design-decisions}
> Record *decisions* (not options). Each decision includes the "why" so later phases don't reopen it accidentally.
#### [D01] <Decision Name> (DECIDED) {#d01-decision-slug}
**Decision:** <One sentence decision statement>
**Rationale:**
- <Why>
- <Why>
**Implications:**
- <What this forces in APIs / storage / tests>
---
### Deep Dives (Optional) {#deep-dives}
> Use this section for structured analysis that is not quite "decision" or "spec", but is critical for implementation alignment.
>
> Examples: operation analysis, end-to-end flows, protocols, schemas, sequence diagrams, CI/CD shape, cross-platform strategy, perf notes, rejection rationale.
#### <Topic Title> {#topic-slug}
<Write-up, diagrams, tables, and any referenced specs/lists/tables.>
---
### Specification {#specification}
> This section is the contract. Pick the subsections that apply to your plan; omit the rest.
- **Inputs and Outputs**: data model, invariants, supported formats
- **Terminology and Naming**: key terms and their definitions
- **Supported Features**: exhaustive list; include what is explicitly not supported
- **Modes / Policies**: behavioral variants, flags, policies
- **Semantics**: normative rules, traversal order, edge cases
- **Error and Warning Model**: error fields, warning fields, path formats
- **Public API Surface**: Rust/Python/language signatures
- **Internal Architecture**: component relationships, pipeline, ownership
- **Output Schemas**: CLI output, API responses, wire formats (contract)
- **Configuration Schema**: config file format, precedence, CLI flag mapping
---
### Compatibility / Migration / Rollout (Optional) {#rollout}
> Use this section when you are changing public APIs, config formats, CLI contracts, or anything that affects adopters.
- **Compatibility policy**: <semver? schema versioning?>
- **Migration plan**:
- <what changes>
- <who is impacted>
- <how to migrate, and how to detect breakage>
- **Rollout plan**:
- <opt-in flag / staged rollout / canary / feature gate>
- <rollback strategy>
---
### Definitive Symbol Inventory {#symbol-inventory}
> A concrete list of new crates/files/symbols to add. This is what keeps implementation crisp.
#### New crates (if any) {#new-crates}
| Crate | Purpose |
|-------|---------|
| `<crate>` | <purpose> |
#### New files (if any) {#new-files}
| File | Purpose |
|------|---------|
| `<path>` | <purpose> |
#### Symbols to add / modify {#symbols}
| Symbol | Kind | Location | Notes |
|--------|------|----------|-------|
| `<Name>` | enum/struct/fn | `<path>` | <notes> |
---
### Documentation Plan {#documentation-plan}
- [ ] <Docs update>
- [ ] <Examples / schema examples / API docs>
---
### Test Plan Concepts {#test-plan-concepts}
> Describe the kinds of tests that prove the spec. Leave the actual enumeration of tests to the Execution Steps below.
#### Test Categories {#test-categories}
| Category | Purpose | When to use |
|----------|---------|-------------|
| **Unit** | Test individual functions/methods in isolation | Core logic, edge cases, error paths |
| **Integration** | Test components working together | End-to-end operations, CLI commands |
| **Golden / Contract** | Compare output against known-good snapshots | Schemas, APIs, parsers, serialization |
| **Drift Prevention** | Detect unintended behavior changes | Regression testing, API stability |
---
### Execution Steps {#execution-steps}
> Execution comes last. Each step should be executable, with a clear commit boundary and a checkpoint.
>
> **Commit after all checkpoints pass.** This rule applies to every step below.
>
> **Patterns:**
> - If a step is big, split into **substeps** (`Step 2.1`, `Step 2.2`, …) with separate commits and checkpoints.
> - After completing a multi-substep step, add a **Step N Summary** block that consolidates what was achieved and provides an aggregate checkpoint.
>
> **References are mandatory:** Every step must cite specific plan artifacts ([D01], Spec S01, Table T01, etc.) and anchors (#section-name). Never cite line numbers—add an anchor instead.
#### Step 1: <Prep Step Title> {#step-1}
<!-- Step 1 typically has no dependencies (it is the root) -->
**Commit:** `<conventional-commit message>`
**References:** [D01] <decision name>, (#strategy, #context)
**Artifacts:** (what this step produces/changes)
- <new files / new commands / new schema fields / new docs>
**Tasks:**
- [ ] <task>
- [ ] <task>
**Tests:**
- [ ] <T test>
- [ ] <T test>
**Checkpoint:**
- [ ] <command>
- [ ] <command>
---
#### Step 2: <Step Title> {#step-2}
**Depends on:** #step-1
**Commit:** `<conventional-commit message>`
**References:** [D02] <decision>, [D03] <decision>, Spec S01, List L01, (#terminology, #semantics)
**Artifacts:** (what this step produces/changes)
- <new files / new commands / new schema fields / new docs>
**Tasks:**
- [ ] <task>
- [ ] <task>
**Tests:**
- [ ] <T test>
- [ ] <T test>
**Checkpoint:**
- [ ] <command>
- [ ] <command>
---
#### Step 3: <Big Step Title> {#step-3}
**Depends on:** #step-2
> If this step is large, break it into substeps with separate commits and checkpoints.
> The parent step explains the structure; each substep has its own commit and checkpoint.
> Substeps implicitly depend on their parent step; explicit **Depends on:** only needed for cross-substep dependencies.
##### Step 3.1: <Substep Title> {#step-3-1}
**Commit:** `<conventional-commit message>`
**References:** [D04] <decision>, Spec S02, Table T01, (#inputs-outputs)
**Artifacts:** (what this substep produces/changes)
- <artifact>
**Tasks:**
- [ ] <task>
**Tests:**
- [ ] <test>
**Checkpoint:**
- [ ] <command>
---
##### Step 3.2: <Substep Title> {#step-3-2}
**Depends on:** #step-3-1
**Commit:** `<conventional-commit message>`
**References:** [D05] <decision>, (#public-api)
**Artifacts:** (what this substep produces/changes)
- <artifact>
**Tasks:**
- [ ] <task>
**Tests:**
- [ ] <test>
**Checkpoint:**
- [ ] <command>
---
#### Step 3 Summary {#step-3-summary}
> After a multi-substep step, add a summary block to consolidate what was achieved.
After completing Steps 3.1–3.N, you will have:
- <capability or artifact 1>
- <capability or artifact 2>
- <capability or artifact 3>
**Final Step 3 Checkpoint:**
- [ ] `<aggregate verification command covering all substeps>`
---
### Deliverables and Checkpoints {#deliverables}
> This is the single place we define "done" for the phase. Keep it crisp and testable.
**Deliverable:** <One sentence deliverable>
#### Phase Exit Criteria ("Done means…") {#exit-criteria}
- [ ] <criterion> (verification)
- [ ] <criterion> (verification)
**Acceptance tests:**
- [ ] <T test>
- [ ] <T test>
#### Roadmap / Follow-ons (Explicitly Not Required for Phase Close) {#roadmap}
- [ ] <follow-on item>
- [ ] <follow-on item>
| Checkpoint | Verification |
|------------|--------------|
| <checkpoint> | <command/test/proof> |
I began a refactoring project yesterday morning to address an architectural issue in Tug, specifically about how the project’s skills and agents interact with each other to implement the plans I give it.1 This idea had kept me up during the night, which means that by time I had finished breakfast and started writing it out, I had been thinking about what I wanted to say for several hours.
About ninety minutes after I began, I had the 1700-word The Tug Multi-Agent Coordination Problem document you can see below2. I pasted this into Claude Code and had a few rounds with the AI, producing the Conversation also included below. From these, I produced a design document. I continued on with several more rounds with Claude Code and GPT-5.3 Codex in Cursor, suing this design document to produce two plans: the first plan to implement Tugstate, a major new feature to deliver embedded SQLite state management to plan execution, and then a second plan to cut over to using it.
This process felt mostly like writing software has always felt. I spent about four hours puzzling over design choices, making tradeoffs, and deciding on what features are and are not in scope. The difference, of course, is that when the planning was done, I turned the coding over to Claude Code. It churned on the first plan for 2h 16m, and the second for 1h 45m. During the first AI coding session, I went for a long walk, and during the second, I went food shopping, wrote this post, and started to cook dinner.
The net result. Two PRs merged: +8168 -16100. The software worked for a simple case on the second try. The first attempt failed in a little trivial way that took two minutes to fix. I might have some more cleanup work to do today—actually I imagine this is likely—but that also would have been true if I had taken several more days to do this refeactor without the help of AI. This is a pace of code changes I simply could not have managed myself with one day of effort.
My point here is that, in the future, when people say they’re “writing software”, they aren’t going to mean cranking out code line by line— they’re going to mean creating and reviewing complex plans, which they then hand off to an AI to implement. For me, that future is already here.
The Tug Multi-Agent Coordination Problem
Tug has a multi-agent coordination problem.
At a high level, Tug should support taking an idea for a software change, handing it to agents, delivering the software, and repeating that loop with human oversight.
That workflow compresses into three phases:
- PLAN: make a plan for a code change
- IMPLEMENT: execute the code changes described by a plan
- MERGE: merge completed implementation work back into
main
Phases and Implications
Because MERGE is part of the workflow, git implications follow.
- [I-01] Each plan refers to one and only one
git repo.
- [I-02] Each PLAN phase is committed on
main when it completes, and must have a PLAN-REVISION that is a child of its base revision.
- [I-03] Each PLAN refers to a base revision in git.
- [I-04] Each PLAN has one or more STEPS describing required work.
- [I-05] Each PLAN STEP has subsections including Depends on, Commit, References, Artifacts, Tasks, Tests, Checkpoint, and Rollback.
- [I-06] Tasks, Tests, and Checkpoint sections include CHECKLISTS to enumerate required work in detail.
- [I-07] Each IMPLEMENT phase must have an associated PLAN.
- [I-08] Each PLAN must be VALIDATED before IMPLEMENT begins, including plan/step dependency-chain resolution.
- [I-09] Each IMPLEMENT phase runs in a
git worktree based on the same revision as its PLAN.
- [I-10] Each IMPLEMENT phase must atomically fetch the next ready PLAN STEP, honoring dependencies.
- [I-11] Each IMPLEMENT phase must atomically mark PLAN STEPS complete.
- [I-12] MERGE must run only against a completed PLAN (all steps complete in IMPLEMENT).
- [I-13] Each IMPLEMENT worktree can be deleted once merged.
Phases and Skills
Each phase is orchestrated by a skill that can call subagents.
- PLAN
- Skill role: turn an idea into a validated, implementation-ready tugplan with explicit steps
- Subagents:
clarifier-agent: asks targeted questions to resolve ambiguity before drafting
author-agent: drafts and revises the tugplan from clarified requirements
critic-agent: reviews plan quality/completeness/implementability before approval
- IMPLEMENT
- Skill role: execute the validated tugplan in a worktree and track step-by-step completion
- Subagents:
architect-agent: defines per-step strategy and expected touch scope
coder-agent: executes plan steps and writes code while monitoring drift
reviewer-agent: verifies quality and conformance to the approved plan
committer-agent: performs required git commits for completed work
auditor-agent: runs end-to-end quality gates (fresh build/tests)
integrator-agent: pushes branch, opens PR, confirms CI/check status
- MERGE
- Skill role: mechanically integrate completed IMPLEMENT changes into
main and clean up worktree state
Phases and Requirements
It must be easy for a developer to:
- [R-01] Start a new PLAN phase at will.
- [R-02] Stop an in-progress PLAN phase at will.
- [R-03] Resume a previously started PLAN phase at will.
- [R-04] Start IMPLEMENT for a VALIDATED PLAN at will.
- [R-05] Stop an in-progress IMPLEMENT phase at will.
- [R-06] Start MERGE for a completed PLAN at will.
- [R-07] Stop an in-progress MERGE, if it has not yet completed.
- [R-08] Track PLAN/IMPLEMENT/MERGE progress in detail, including:
- which phase instances are running
- what any running subagent is doing now
- which PLAN STEP IMPLEMENT is currently executing
- checklist state per step item (
open/in-progress/completed)
- [R-09] Receive clear feedback for logically inconsistent requests, such as:
- IMPLEMENT with no VALIDATED PLAN
- MERGE for a non-completed PLAN
- re-MERGE of an already completed PLAN
Skill <=> Subagent Communication
Subagents communicate through skills (orchestrator-mediated), not direct messaging:
- [C-01]
clarifier-agent -> author-agent (via PLAN): emits structured clarification (questions, assumptions, ambiguity analysis); PLAN combines with user answers and passes forward.
- [C-02]
author-agent -> critic-agent (via PLAN): returns plan metadata (plan_path, section/skeleton status); PLAN passes plan artifact for quality review.
- [C-03]
critic-agent -> author-agent (revision loop via PLAN): emits issues + recommendation (approve/revise/reject); PLAN either finalizes or loops feedback for revision.
- [C-04]
architect-agent -> coder-agent (via IMPLEMENT + beads design): emits per-step strategy (approach, expected_touch_set, implementation steps, test plan, risks); IMPLEMENT persists it as step design data.
- [C-05]
coder-agent -> reviewer-agent (via IMPLEMENT + beads notes): emits concrete changes + execution evidence (files_modified, build/test report, drift assessment); reviewer consumes for conformance and quality checks.
- [C-06]
reviewer-agent -> coder-agent (revision loop via IMPLEMENT): emits findings (issues, conformance, recommendation); IMPLEMENT either commits or routes fixes back to coder.
- [C-07]
committer-agent -> IMPLEMENT state/progress: consumes approved step context, performs commit operations (tugcode commit / git commit paths), returns metadata for progress tracking/step closure.
- [C-08]
auditor-agent -> coder-agent/integrator-agent (IMPLEMENT gate): emits post-loop quality-gate results (issues, deliverable checks, recommendation); IMPLEMENT routes fixes or publish handoff.
- [C-09]
integrator-agent -> coder-agent/user completion path (IMPLEMENT gate): emits publication outputs (pr_url, ci_status, checks); IMPLEMENT either closes or loops CI fixes.
Problems in the Current Code
Beads and Skill <=> Subagent Communication
Beads URL: https://github.com/steveyegge/beads
Beads strongly prefers a central database on main. That conflicts with IMPLEMENT running in git worktrees. MERGE then struggles with conflicts because both main and worktree changed. The intended direction is to keep main free of IMPLEMENT-phase coordination churn.
Operationally, Skill <=> Subagent communication is orchestrator-mediated and structured, but currently routed through Beads fields (design, notes) with impedance mismatches. This caused a long sequence of complexity-heavy commits and filesystem workarounds:
2ba6c10 Reduce orchestrator overthinking in implement
07b048d Move bead-write from agents to orchestrator
372c249 Remove positional content args from bead CLIs
4e24be1 Show explicit tool calls in bead-write instructions
c23504d Fix terminal hang from heredoc in agents
a5af8e4 Stop init from deleting .beads directories
1878990 Bypass bd init worktree guard
5d06f88 Fix bd init failure in git worktrees
55a9493 Clean up stale beads-at-root artifacts
76f3023 Clean up stale beads-at-root artifacts
4d7ec4f Beads Improvements (#43)
dad5ab5 Add beads improvements plan and proposal
[I-10] mostly works (open/close semantics are straightforward), but using Beads as source of truth for [I-11] failed. It does not represent full PLAN STEP checklist richness, which blocks delivery of [R-08] visibility.
It has also proven unexpectedly difficult to move information between subagents when mediated through bd/tugcode beads shell/file/heredoc paths.
Next Steps
Beads aims to be a distributed, git-backed graph issue tracker for AI agents, but this usage in Tug has not met needs.
We need to deliver:
[I-01] through [I-13]
[R-01] through [R-09]
[C-01] through [C-09]
Question: what architecture best satisfies those constraints with less operational complexity?
Conversation
After writing the content above, I began a conversation with Claude Code (Opus 4.6) to discuss it.
🧐 Me
OK. We have some design work to do. Read over @roadmap/multi-agent-coordination.md. Dig in. Read the document. Read the code. Investigate. Ask clarifying questions. Start making some proposals about what we might do to improve this project to deliver on its goals and requirements. Let’s solve this information flow problem with rock-solid solutions!
🤖 Claude
Both agents are done. I now have a thorough understanding of the entire system. Let me lay out what I’ve found and my proposals.
Diagnosis: What’s Actually Happening
After reading the roadmap, the skills, agents, Rust CLI commands, and core library, here’s my understanding of the problem.
What Beads does today
Beads serves four roles in the IMPLEMENT phase:
- Step state machine: open -> closed (via
bd close)
- Dependency-aware ready queries:
bd ready --parent <root> returns steps whose deps are all closed
- Inter-agent data store: architect writes to design, coder writes to notes, reviewer appends to notes
- Step identity:
bead_mapping gives each step anchor a unique bead ID
Where Beads fails
Reading the code tells the story clearly:
The temp file dance (SKILL.md:310-318): after every agent call, the orchestrator must:
- Agent writes temp file
_tmp_{bead_id}_strategy.md
- Orchestrator runs
tugcode beads append-design {bead_id} --content-file {path}
- Orchestrator deletes temp file
- If CLI fails, warn and continue (because “the agent’s JSON output already contains all data needed”)
That last point is damning: the beads write is acknowledged as redundant. The orchestrator already has the data in memory. It’s writing to beads for persistence, but then telling itself it’s OK if that persistence fails.
Checklist blindness (roadmap line 97): Beads tracks step open/closed, but each step has tasks, tests, and checkpoints—fine-grained checklists that Beads can’t model. There’s no way to tell the user: “Step 2: 4/7 tasks done, 2/3 tests passing, 1/1 checkpoint verified.”
Worktree friction (beads.rs:229-234): Beads bd init refuses to run in git worktrees. The code manually creates .beads/, bootstraps SQLite via bd --db <path> config set, then passes --db on every command. This was a multi-commit struggle.
Main branch pollution (roadmap line 81): Beads wants data on main. The IMPLEMENT phase works on a worktree branch. Merge creates conflicts on data that should have stayed local.
What’s actually working fine
- The plan phase: pure in-memory JSON between agents, no persistence needed
- The orchestration pattern: sequential agent spawning with persistent agent IDs works well
- The
tugcode worktree create flow: creates worktree, branch, validates plan, returns structured JSON
- The
tugcode commit command: atomic log-rotate + git-commit + bead-close
Proposal: Replace Beads with Tugstate (v1)
I’m proposing a JSON-file-based state system that lives entirely in the worktree.
Core design
A single file: .tugtool/state.json in the worktree.
.tugtree/tugtool__my-plan-20260223/
├── .tugtool/
│ ├── state.json ← NEW: all coordination state
│ ├── tugplan-my-plan.md
│ ├── tugplan-skeleton.md
│ ├── log.md
│ └── config.toml
├── .beads/ ← REMOVED
│ └── beads.db
└── (source code)
The state file tracks everything the orchestrator and user need:
{
"version": 1,
"plan_path": ".tugtool/tugplan-my-plan.md",
"phase": "implement",
"created_at": "2026-02-23T10:00:00Z",
"updated_at": "2026-02-23T10:45:00Z",
"steps": [
{
"anchor": "step-0",
"title": "Create API client",
"status": "completed",
"depends_on": [],
"started_at": "2026-02-23T10:05:00Z",
"completed_at": "2026-02-23T10:30:00Z",
"commit_hash": "abc123d",
"tasks": [
{ "text": "Create HTTP client struct", "status": "completed" },
{ "text": "Add retry logic", "status": "completed" }
],
"tests": [
{ "text": "Unit test: retry backoff", "status": "completed" }
],
"checkpoints": [
{ "text": "cargo test passes", "status": "completed" },
{ "text": "cargo clippy clean", "status": "completed" }
]
},
{
"anchor": "step-1",
"title": "Add caching layer",
"status": "in_progress",
"depends_on": ["step-0"],
"started_at": "2026-02-23T10:31:00Z",
"tasks": [
{ "text": "Implement cache store", "status": "completed" },
{ "text": "Add cache invalidation", "status": "in_progress" },
{ "text": "Wire up to API client", "status": "open" }
],
"tests": [
{ "text": "Cache hit/miss test", "status": "open" }
],
"checkpoints": [
{ "text": "cargo test passes", "status": "open" }
]
}
]
}
Why this works
Single orchestrator, sequential execution. There’s one orchestrator process running agents one at a time. No concurrent writes means no need for SQLite locking or database transactions. A JSON file is perfectly atomic for this use case.
Delivers [R-08] completely. The state file tracks every checklist item at open/in_progress/completed granularity. tugcode status can render:
Step 0: Create API client [completed]
Tasks: 2/2 ████████████ 100%
Tests: 1/1 ████████████ 100%
Checks: 2/2 ████████████ 100%
Step 1: Add caching layer [in_progress]
Tasks: 1/3 ████░░░░░░░░ 33%
Tests: 0/1 ░░░░░░░░░░░░ 0%
Checks: 0/1 ░░░░░░░░░░░░ 0%
Eliminates the temp file dance. The orchestrator writes state directly after parsing agent JSON output. No temp files, no CLI calls, no filesystem intermediaries.
Worktree-local, main stays clean. The state file lives in .tugtool/ inside the worktree. When the branch is merged, the state file can be dropped (or kept as a record). No conflicts.
Debuggable: cat .tugtool/state.json | jq .steps[1].tasks.
New CLI commands
tugcode state init <plan_path> # Initialize state.json from plan's steps/checklists
tugcode state show [--json] # Show progress (replaces `tugcode beads status`)
tugcode state next # Return next ready step (replaces `bd ready`)
tugcode state start <step-anchor> # Mark step in_progress
tugcode state update <step-anchor> # Update checklist items for a step
tugcode state complete <step> # Mark step completed with commit hash
What changes in the orchestrator
┌────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────┐
│ Current (Beads) │ New (Tugstate) │
├────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
│ tugcode worktree create syncs beads │ tugcode worktree create calls tugcode state init │
├────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
│ Returns bead_mapping, root_bead_id │ Returns all_steps, ready_steps (computed from state.json) │
├────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
│ After architect: tugcode beads append-design + delete temp │ After architect: tugcode state update step-0 --architect-done │
├────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
│ After coder: tugcode beads update-notes + delete temp │ After coder: tugcode state update step-0 --tasks-completed t1,t2 │
├────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
│ After reviewer: tugcode beads append-notes + delete temp │ After reviewer: tugcode state update step-0 --tests-completed │
├────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
│ tugcode commit closes bead │ tugcode commit calls tugcode state complete step-0 │
└────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────┘
What about inter-agent data?
The key insight: the persistent agent pattern already solves this. The architect, coder, reviewer, and committer are each spawned once and resumed. They retain full context. The coder doesn’t read the architect’s strategy from Beads; the orchestrator passes it in the resume prompt.
The state file doesn’t need to store agent outputs (approach, expected_touch_set, etc.). It stores progress state: what’s done, what’s in flight, what’s next. Agent-to-agent data flows through orchestrator memory plus the Task resume mechanism.
Migration path
- Phase 1: Add
tugcode state commands + state.json schema. Keep Beads working.
- Phase 2: Update implement skill to write to
state.json alongside Beads. tugcode status reads state.json.
- Phase 3: Remove Beads writes from implement skill.
tugcode worktree create stops syncing Beads.
- Phase 4: Remove Beads code from
tugcode CLI and tugtool-core.
🧐 Me
OK. This is a great analysis. I love how simple it is. The main limitation with this approach is that it isolates the IMPLEMENT phase to a single git worktree on a single machine. Does that make sense?
There would never be any possibility of [R-04]: start an IMPLEMENT phase for a VALIDATED PLAN at will at any time. There is nothing in this requirement which says ”… only if there are no other IMPLEMENT phases running for that PLAN.”
If we ever want to allow multiple IMPLEMENT phases to run at once that don’t duplicate work, we would need a solution for [I-08] and [I-09], right? There is no way in your proposal to do concurrent IMPLEMENT phases. What are your ideas for dealing with this?
🤖 Claude
You’re absolutely right, and this is the critical design question.
The Concurrency Problem
With a JSON file in the worktree, there’s no shared state. Two IMPLEMENT phases for the same plan can’t coordinate:
Orchestrator A reads state.json, sees step-0 ready, starts step-0.
Orchestrator B reads state.json, sees step-0 ready, starts step-0.
Duplicate work.
For [I-08] and [I-09] to work with concurrent IMPLEMENT phases, you need:
- Shared state visible to all orchestrators
- Atomic claim so only one orchestrator can grab a step
Where the shared state lives:
repo/
├── .tugtool/
│ ├── tugplan-foo.md
│ └── state.db ← shared across all worktrees
├── .tugtree/
│ ├── worktree-A/ ← orchestrator A (claimed step-0, step-2)
│ └── worktree-B/ ← orchestrator B (claimed step-1, step-3)
Revised Proposal: Embedded SQLite
This changes the answer from JSON to SQLite—but not Beads. A self-owned, embedded SQLite database with a schema designed for Tug.
Why SQLite instead of JSON:
BEGIN EXCLUSIVE transaction gives atomic claims
- Concurrent reads are free (WAL mode)
- Still a single file (
.tugtool/state.db), still debuggable
- No external binary:
rusqlite compiles SQLite into tugcode
Why not Beads:
- Own schema, no impedance mismatch
- No
bd binary dependency
- No worktree init friction
- Checklist items are first-class rows
Schema (draft):
CREATE TABLE plans (
plan_path TEXT PRIMARY KEY,
status TEXT NOT NULL DEFAULT 'active',
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
CREATE TABLE steps (
plan_path TEXT NOT NULL REFERENCES plans(plan_path),
anchor TEXT NOT NULL,
title TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
claimed_by TEXT,
started_at TEXT,
completed_at TEXT,
commit_hash TEXT,
PRIMARY KEY (plan_path, anchor)
);
CREATE TABLE step_deps (
plan_path TEXT NOT NULL,
step_anchor TEXT NOT NULL,
depends_on TEXT NOT NULL,
PRIMARY KEY (plan_path, step_anchor, depends_on),
FOREIGN KEY (plan_path, step_anchor) REFERENCES steps(plan_path, anchor),
FOREIGN KEY (plan_path, depends_on) REFERENCES steps(plan_path, anchor)
);
CREATE TABLE checklist_items (
id INTEGER PRIMARY KEY AUTOINCREMENT,
plan_path TEXT NOT NULL,
step_anchor TEXT NOT NULL,
kind TEXT NOT NULL,
text TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'open',
updated_at TEXT,
FOREIGN KEY (plan_path, step_anchor) REFERENCES steps(plan_path, anchor)
);
Atomic claim operation:
BEGIN EXCLUSIVE;
SELECT s.anchor FROM steps s
WHERE s.plan_path = ?
AND s.status = 'pending'
AND NOT EXISTS (
SELECT 1 FROM step_deps d
JOIN steps dep ON dep.plan_path = d.plan_path AND dep.anchor = d.depends_on
WHERE d.plan_path = s.plan_path
AND d.step_anchor = s.anchor
AND dep.status != 'completed'
)
ORDER BY s.anchor
LIMIT 1;
UPDATE steps
SET status = 'claimed', claimed_by = ?, started_at = ?
WHERE plan_path = ? AND anchor = ? AND status = 'pending';
COMMIT;
CLI commands:
tugcode state init <plan_path>
tugcode state next <plan_path>
tugcode state start <plan> <step>
tugcode state update <plan> <step>
tugcode state complete <plan> <step>
tugcode state show [plan] [--json]
What changes for worktree create:
- Call
tugcode state init <plan> if missing (idempotent)
- Call
tugcode state next <plan> to claim the first ready step
- Return that step info to the orchestrator
No Beads. No .beads/ directory. No bd binary.
Future multi-machine path:
- Wrap SQLite operations behind
tugcast endpoints
- Keep schema and semantics
- Or swap to Postgres later if needed
Summary shift:
┌──────────────────────┬──────────────────────┬──────────────────────────┬─────────────────────────────┐
│ │ Beads (current) │ JSON (first proposal) │ SQLite (revised) │
├──────────────────────┼──────────────────────┼──────────────────────────┼─────────────────────────────┤
│ External binary │ bd required │ none │ none (rusqlite embedded) │
│ Concurrent IMPLEMENT │ partially │ impossible │ full atomic claims │
│ Checklist tracking │ no │ yes │ yes │
│ Lives in │ worktree .beads/ │ worktree .tugtool/ │ repo root .tugtool/ │
│ Main branch clean │ no │ yes │ yes (gitignore DB) │
│ Debuggable │ bd show │ cat/jq │ sqlite3 │
│ Multi-machine path │ none │ none │ tugcast wrapper or Postgres │
└──────────────────────┴──────────────────────┴──────────────────────────┴─────────────────────────────┘
The core insight is right: coordination state must be shared and atomic. JSON can’t provide that, SQLite can, without Beads friction.
🧐 Me
OK. If we’re using a central database in the repo root, then:
- How do we merge and avoid all the messy merge mechanics we eliminated in
.tugtool/tugplan-beads-improvements.md and 4d7ec4f?
- Why SQLite instead of Dolt? What would Dolt give us that SQLite doesn’t?
🤖 Claude
Two sharp questions.
Merge: How do we avoid the mess?
The Beads merge problem was that both main and worktree branches committed a binary SQLite DB. Git can’t merge that cleanly.
Solution: never commit the state DB.
.tugtool/state.db
.tugtool/state.db-wal
.tugtool/state.db-shm
state.db is an operational artifact, not source code.
Committed on worktree branch:
- Source code changes
- Plan file (
.tugtool/tugplan-foo.md) unchanged during IMPLEMENT
- Implementation log (
.tugtool/log.md)
Not committed:
So merge is source-only, and conflict risk drops dramatically.
For progress reporting, the plan file remains the immutable spec; execution progress is read from tugcode state show.
Why SQLite, not Dolt?
Dolt is interesting:
- Data branching/merging
- SQL history by revision
- SQL conflict resolution
- Built-in audit history
But likely too heavy for Tug right now:
- Large external binary footprint
- External server/process lifecycle complexity
- Not embeddable in the
tugcode binary
- Overkill for Tug’s small state volume
- Introduces a second branching model (Git + Dolt)
SQLite advantages here:
- Embedded, no extra install
- No daemon/server/port/process management
- Proven local concurrency with WAL + transactional claims
- Small and inspectable
- Clear path later to
tugcast/Postgres if requirements change
Recommendation: embedded SQLite in repo root, gitignored, atomic on one machine.
🧐 Me
I hear you on Dolt, and I agree it solves a harder problem than Tug has right now.
A gitignored SQLite DB in the repo root feels like the sweet spot. For [I-08] and [I-09], I’m comfortable defining atomically as atomically on one machine for now. We don’t currently have a multi-machine/multi-developer requirement.
So next step: write a complete proposal in @roadmap/ based on @roadmap/multi-agent-coordination.md and this chat. Cover risks, and structure it so we can execute in two plans:
- Add SQLite-based mechanism while Beads infra still exists
- Use that mechanism to remove Beads dependency cleanly
That should avoid the chicken-and-egg problem and let us run two tugplans back-to-back.
tl;dr
I’m pivoting this project. Tug is now about creating an AI-assisted software construction system, thinking through and delivering the tools and techniques humans and coding assistants need to make better software together.
I started this Tug project to scratch an itch. I noticed that Claude Code would often struggle to do complex code refactoring, issuing long chains of grep, sed, and awk commands to change the text of my programs lexically rather than understanding how to change my code semantically.
So, I started making proper code refactoring tools to make available to the assistant, starting with python—a substantial project as I conceived of it. I began with a combination of Claude Code and Cursor, but soon started leaning more on Claude Code, extending it with skills and subagents to help me in my daily coding. I acted as the orchestrator, calling on these new tasks as I made and improved them. Eventually, I produced a workflow I liked well enough to produce work I was happy to commit. I wrote it up in How I Write Code With AI.
After I finished writing this post, I took a step back and looked at the 20-step process I included. I felt a mix of fascination and frustration. I was fascinated about how such a workflow had become possible. I was frustrated that I wasn’t working more directly on this new style of software construction.
So, I decided to do something about it. I kicked off a side project that I called “specks”1, to spend more time with that workflow, to think about it more deeply, and to see if I could turn it into an automated process orchestrated by AI agents.
I succeeded. What an interesting two weeks of work! Talk about fascination!
My planning and implementation loops are now fully under the control of two top-level orchestrator skills in Claude Code. Each of them call on a number of subagents (see below). I also added one additional step at the end to merge code after the implementation loop completes, since the implementation now happens on a git worktree. I also integrated beads to bolster the implementation infrastructure.
Now, my main workflow now has four steps rather than twenty. All focused around Claude Code.
- I chat with Claude Code to explore a new idea until I’m satisfied that it represents a piece of work I want to do.
- I run
/tugtool:plan, and reference the context. It makes a plan file. (Subagents: clarifier-agent → author-agent → critic-agent, with an author/critic revision loop; tugtool calls: tugtool init pre-hook.)
- I run
/tugtool:implement with the plan just created. It writes the code and sends me a PR. I review the PR. If I’m satisfied (and I usually am)… (Subagents: architect-agent → coder-agent → reviewer-agent → committer-agent → auditor-agent → integrator-agent; tugtool calls: tugtool worktree create --json.)
- I run
/tugtool:merge to pull that code onto my main branch. (Subagents: none; tugtool calls: tugtool merge --dry-run --json → confirm → tugtool merge --json → tugtool doctor + tugtool worktree list.)
This side project accomplished two things:
- I made my workflow more efficient by taking a useful workflow and taught the AI to do it for me.2
- I changed my work from developing a code refactoring tool—solving a problem that I occasionally have—to one where I was improving my everyday workflow—solving a problem I have all the time.
This was just too fascinating to keep as a side project. It started to feel like it should be my main project.So, I’m now pivoting Tug to be a project about software construction. This new agent flow is now the core of the focus—at least for now. It’s also given me a whole new set of ideas for building a more full-featured experience—something that imagines a future of software construction beyond the age of IDEs like vscode and Xcode. I’ve already sketched out a roadmap, some exiciting new ideas for how this might fit together. I also made a comprehensive update to the sticky What is Tug always present in the header, to describe the new project concept.
I have more posts planned to share additional thoughts about new direction, but until then, here are the new agents that make up the core of this new workflow.
Agent Orchestration Loops
PLAN ORCHESTRATION LOOP
┌──────────────────────────────────────────┐
│ PLANNING PHASE BEGINS │
│ (produce a tugplan at .tugtool/tugplan) │
└────────────────────┬─────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ clarifier-agent (runs once) │
│ SPAWN → clarifier_id │
└────────────────────┬─────────────────────┘
│
▼
┌────────────┐
│ questions? │
└──┬─────┬───┘
yes │ │ no
▼ │
┌──────────────────┐ │
│ AskUserQuestion │ │
└────────┬─────────┘ │
└──────┬─────┘
│
┌─────────────────▼────────────────────────┐
│ author-agent │
│ Pass 0: SPAWN (FRESH) → author_id │◄─┐
│ Pass N: RESUME author_id │ │
└────────────────────┬─────────────────────┘ │
│ │
▼ │
┌──────────────────────────────────────────┐ │
│ critic-agent │ │ revision
│ Pass 0: SPAWN (FRESH) → critic_id │ │ loop
│ Pass N: RESUME critic_id │ │
└────────────────────┬─────────────────────┘ │
│ │
▼ │
┌────────────────┐ │
│ critic │ │
│recommendation? │ │
└──┬──────────┬──┘ │
APPROVE │ │ REVISE / REJECT │
│ └──────────────────┘
▼
┌──────────────────────────────────────────┐
│ PLANNING PHASE COMPLETE │
│ Plan ready at {plan_path} │
│ Next: /tugtool:implement {plan_path} │
└──────────────────────────────────────────┘
IMPLEMENT ORCHESTRATION LOOP
┌──────────────────────────────────────────┐
│ tugtool worktree create <plan> --json │
└────────────────────┬─────────────────────┘
│
▼
┌────────────┐
│ succeeded? │
└──┬──────┬──┘
yes │ └ no ──► HALT WITH ERROR
│
▼
═══ STEP LOOP (each ready step) ═══
┌──────────────────────────────────────────┐
│ architect-agent │
│ Pass 0: SPAWN → architect_id │
│ Pass N: RESUME architect_id │
└────────────────────┬─────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ coder-agent │
│ Pass 0: SPAWN → coder_id │
│ Pass N: RESUME coder_id │
└────────────────────┬─────────────────────┘
│
▼
┌────────────┐
│ drift? │
└──┬──────┬──┘
none/ │ │ moderate/major
minor │ └──► AskUserQuestion
│ │
│◄──────────────┘
│
▼
┌──────────────────────────────────────────┐
│ reviewer-agent │◄──┐
│ Pass 0: SPAWN → reviewer_id │ │
│ Pass N: RESUME reviewer_id │ │
└────────────────────┬─────────────────────┘ │
│ │
▼ │ review
┌───────────────┐ │ retry
│ reviewer │ │
│recommendation?│ │
└──┬─────────┬──┘ │
APPROVE │ │ REVISE (max 3) │
│ └─► coder fix ───────┘
▼
┌──────────────────────────────────────────┐
│ committer-agent │
│ SPAWN/RESUME → commit + close bead │
└────────────────────┬─────────────────────┘
│
┌───────────────┐
│ more steps? │─ yes ─► back to architect
└───────┬───────┘
│ no
▼
═══ AUDITOR PHASE ═══
┌──────────────────────────────────────────┐
│ auditor-agent │◄─────────────┐
│ SPAWN/RESUME │ │
└────────────────────┬─────────────────────┘ │
│ │
▼ │ audit
┌───────────────┐ │ retry
│ auditor │ │
│recommendation?│ │
└──┬─────────┬──┘ │
PASS │ │ REVISE (max 3) │
│ └─► coder fix → committer ──────┘
▼
═══ INTEGRATOR PHASE ═══
┌──────────────────────────────────────────┐
│ integrator-agent │◄─────────────┐
│ SPAWN/RESUME → push, PR, CI │ │
└────────────────────┬─────────────────────┘ │
│ │
▼ │ CI
┌───────────────┐ │ retry
│ integrator │ │
│recommendation?│ │
└──┬─────────┬──┘ │
PASS │ │ REVISE (max 3) │
│ └─► coder fix → committer ──────┘
▼
┌──────────────────────────────────────────┐
│ IMPLEMENTATION COMPLETE │
│ Plan: {plan_path} PR: {pr_url} │
└──────────────────────────────────────────┘
This is how I write code these days. I use a 20-step process to code with AI, actively collaborating with assistants in a workflow with two main loops: Planning and Implementation.
-
The Planning Loop: I start by defining a phase of work, then use an AI agent to generate a structured plan file. I review this plan with one AI model using an “investigate” prompt that encourages critical analysis. I then cross-reference with a different AI model using the same prompt. I iterate between these two AIs for 10-20 rounds, cross-pollinating their feedback with my own guidance until the plan is solid—questions resolved, decisions documented, steps specific enough to implement.
-
The Implementation Loop: For each plan step, I assess whether it’s ready or needs breakdown via an architectural analysis agent. I run an implementation command, which executes the step while I monitor terminal output. If the AI struggles, I pause, investigate with agents, update the plan, and resume. After completion, I optionally use a review agent to grade the work against the plan’s requirements. I then run commands to log what was done and draft a commit message, and commit manually.
The key principles: multiple AI perspectives catch blind spots, the plan file is the single source of truth, and I retain control over git operations.
I figure that my productivity is 5–10× what it was before AI coding assistants, certainly in lines of code produced, but also in the speed I can make my ideas real. That’s a quantitative assesment.
There’s also the qualitative aspect: I’m more willing to dive in on an investigation or refactor because I know that the drudgery of fixing up the collateral damage is something I can offload to the AI. It leaves me free to think about the details, features, and APIs I want rather than having to struggle to achieve them.
Yes, I sometimes get all the way to the bottom of a “properly implemented” step and then discover that the work is a complete botch-up. No
big deal, I go back and try again. This happened all the time in my “manual coding” days, so I’m used to it.
No, I don’t read every line of code before committing. In fact, I read very few of them. I rely heavily on unit tests, integration tests, and the ease of using previously-completed steps as foundational elements in follow-on work. Problems surface before too long.
Over my whole career, I’ve always tried to figure out ways to do better. AI is the best new tool to land in my toolbox in a long time.
Table of Contents
Workflow Overview
The flowchart below shows the two main loops: Planning ([P1]-[P6]) and Implementation ([I7]-[I14]). Each labeled node is a clickable reference to its detailed description.
┌─────────────────────────────────────────────┐
│ PHASE OF WORK BEGINS │
│ (new feature, refactor, exploration, etc.) │
│ [P1] │
└─────────────────────┬───────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│ PLANNING LOOP [P2]-[P6] │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ [P2] │◄────────────────────────────────────────────────────┐ │ │
│ │ │ code-planner │ │ │ │
│ │ │ agent creates │ │ │ │
│ │ │ plan file │ │ │ │
│ │ └────────┬─────────┘ │ │ │
│ │ │ │ │ │
│ │ ▼ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │
│ │ ┌─►│ [P3] │──────►│ [P5] │ │ │ │
│ │ │ │ "investigate" │ │ "investigate" │ │ │ │
│ │ │ │ + my comments │ │ prompt w/ │ │ │ │
│ │ │ │ + questions │ │ GPT-5.2 in │ │ │ │
│ │ │ │ w/ Claude Opus │ │ Cursor │ │ │ │
│ │ │ └────────┬─────────┘ └────────┬─────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ ▼ │ │ │ │
│ │ │ ┌──────────────────┐ │ │ │ │
│ │ │ │ [P4] │ │ │ │ │
│ │ │ │ code-planner │ │ │ │ │
│ │ │ │ for major │ │ │ │ │
│ │ │ │ rewrites │ │ │ │ │
│ │ │ └────────┬─────────┘ │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ └──────────────────────────┼───────────── back to [P2] ───────┘ │ │
│ │ │ │ │ │
│ │ │ ▼ │ │
│ │ │ ┌────────────────────────────────┐ │ │
│ │ │ │ [P6] │ │ │
│ │ │ │ Cross-pollinate AI feedback │ │ │
│ │ │ │ + my guidance/annotations │ │ │
│ │ │ │ (10-20 rounds typical) │ │ │
│ │ │ └─────────────┬───┬──────────────┘ │ │
│ │ │ │ │ │ │
│ │ │ loop │ │ ready │ │
│ │ └─────────── back to [P3] ◄────────────────┘ │ │ │
│ │ │ │ │
│ └──────────────────────────────────────────────────│──────────────────────────────┘ │
│ │ │
│ │ │
│ Plan is ready ┌─────────┘ │
│ │ │
└───────────────────────────────────────────┼───────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│ IMPLEMENTATION LOOP [I7]-[I14] │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌───────────────────┐ │ │
│ │ │ [I7] │◄───────────────────────────────────────────────────┐ │ │
│ │ │ Assess step: │ │ │ │
│ │ │ ready as-is, or │ │ │ │
│ │ │ needs breakdown? │ │ │ │
│ │ └────────┬──────────┘ │ │ │
│ │ │ │ │ │
│ │ ├───────────────────────────┐ │ │ │
│ │ │ needs detail │ ready │ │ │
│ │ ▼ │ │ │ │
│ │ ┌──────────────────┐ │ │ │ │
│ │ │ [I7-detail] │ │ │ │ │
│ │ │ code-architect │ │ │ │ │
│ │ │ agent provides │ │ │ │ │
│ │ │ detailed steps │ │ │ │ │
│ │ └────────┬─────────┘ │ │ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ │ │ │
│ │ ┌─────────────────────────────────────────┐ │ │ │
│ │ │ [I8] │◄──────────────┐ │ │ │
│ │ │ /implement-plan command │ │ │ │ │
│ │ │ (cite plan step, watch terminal) │ │ │ │ │
│ │ └────────────────┬────────────────────────┘ │ │ │ │
│ │ │ │ │ │ │
│ │ │ [I9] monitor │ │ │ │
│ │ │ │ │ │ │
│ │ ┌───────────┴───────────┐ │ │ │ │
│ │ │ struggling? │ progressing │ │ │ │
│ │ ▼ │ │ │ │ │
│ │ ┌──────────────────┐ │ │ │ │ │
│ │ │ [I10] │ │ │ │ │ │
│ │ │ ESC to pause │ │ │ │ │ │
│ │ │ ▼ │ │ │ │ │ │
│ │ │ code-architect │ │ │ │ │ │
│ │ │ or code-planner │ │ │ │ │ │
│ │ │ to investigate │ │ │ │ │ │
│ │ │ ▼ │ │ │ │ │ │
│ │ │ update plan │ │ │ │ │ │
│ │ └────────┬─────────┘ │ │ │ │ │
│ │ │ │ │ │ │ │
│ │ └───────────────────┼─────────► back to [I8] ────┘ │ │ │
│ │ │ │ │ │
│ │ ▼ │ │ │
│ │ ┌───────────────────────────────────┐ │ │ │
│ │ │ Implementation complete │ │ │ │
│ │ │ Questions about quality? │ │ │ │
│ │ └────────────────┬──────────────────┘ │ │ │
│ │ │ │ │ │
│ │ ┌───────────┴───────────┐ │ │ │
│ │ │ yes │ no │ │ │
│ │ ▼ │ │ │ │
│ │ ┌─────────────────────┐ │ │ │ │
│ │ │ [I11] │ │ │ │ │
│ │ │ plan-step-reviewer │ │ │ │ │
│ │ │ grades the work │ │ │ │ │
│ │ └────────┬────────────┘ │ │ │ │
│ │ │ │ │ │ │
│ │ remediation needed? │ │ │ │
│ │ │ │ │ │ │
│ │ │ yes │ │ │ │
│ │ └──────────────────────────────► back to [I7] ──────────────────│ │ │
│ │ │ │ │ │
│ │ ▼ │ │ │
│ │ ┌─────────────────────────────────────────┐ │ │ │
│ │ │ [I12] │ │ │ │
│ │ │ /update-plan-implementation-log │ │ │ │
│ │ └────────────────┬────────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ ▼ │ │ │
│ │ ┌─────────────────────────────────────────┐ │ │ │
│ │ │ [I13] │ │ │ │
│ │ │ /prepare-git-commit-message │ │ │ │
│ │ └────────────────┬────────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ ▼ │ │ │
│ │ ┌─────────────────────────────────────────┐ │ │ │
│ │ │ [I14] │ │ │ │
│ │ │ git commit (manually) │ │ │ │
│ │ │ decide: next step? (below) │ │ │ │
│ │ └────────────────┬─────────┬──────────────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ │ │ │ │
│ │ │ no │ yes │ │ │
│ │ │ │ │ │ │
│ │ │ │ │ │ │
│ │ │ └────────────► back to [I7] ──────────────────┘ │ │
│ │ │ │ │
│ │ └────────────────┐ │ │
│ │ │ │ │
│ └──────────────────────────────────────│──────────────────────────────────────────┘ │
│ │ │
│ All steps done │ │
│ │ │
└─────────────────────────────────────────┼─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ PHASE OF WORK COMPLETE │
└─────────────────────────────────────────────┘
Node Index:
| Node | Description | Jump |
|---|
[P1] | Initiate a new phase | → |
[P2] | code-planner creates plan file | → |
[P3] | ”investigate” + review w/ Claude Opus | → |
[P4] | code-planner for major rewrites | → |
[P5] | ”investigate” prompt w/ GPT-5.2 | → |
[P6] | Cross-pollinate AI feedback | → |
[I7] | Assess each plan step | → |
[I8] | Run /implement-plan | → |
[I9] | Monitor implementation | → |
[I10] | Handle struggles (ESC to pause) | → |
[I11] | Review with plan-step-reviewer | → |
[I12] | Update implementation log | → |
[I13] | Prepare commit message | → |
[I14] | Commit and continue | → |
Definitions
Phase of Work
A phase of work is a bounded unit of development effort with a clear goal:
- Adding a new feature
- Refactoring existing code
- Exploring a new technical area
- Fixing a class of bugs
- Performance optimization
Each phase gets its own numbered plan file in the plans/ directory (e.g., phase-13.md). See Files Reference for the full list of plan-related files.
Plan File
A structured planning document in the plans/ directory following the format defined in plans/plan-skeleton.md. Plan files contain:
- Plan Metadata: Owner, status, target branch, dates
- Phase Overview: Context, strategy, stakeholders, success criteria, scope, non-goals
- Open Questions: Tracked uncertainties that must be resolved or explicitly deferred
- Risks and Mitigations: Known risks with mitigation strategies
- Design Decisions: Recorded decisions with rationale (referenced as
[D01], [D02], etc.)
- Specification: Detailed technical contract including APIs, schemas, error models
- Symbol Inventory: Concrete list of crates, files, and symbols to add
- Test Plan Concepts: Categories of tests and fixture requirements
- Execution Steps: Sequenced implementation tasks with checkpoints
- Deliverables and Checkpoints: Phase exit criteria
”Investigate” Prompt
A standard prompt I use when reviewing plans with AI assistants (used in [P3] and [P5]):
Investigate. Ask clarifying questions. Give your assessment on its quality and readiness to implement. Do you see holes, pitfalls, weaknesses or limitations?
This prompt encourages critical analysis rather than passive acceptance.
Planning Loop
The planning loop transforms an idea into a ready-to-implement plan. It corresponds to nodes [P1]-[P6] in the flowchart.
[P1] Initiate a New Phase
When starting new work, I define the scope and goals of the phase. This could be a feature request, a technical debt item, an exploration, or a refactoring effort. See Phase of Work for what constitutes a phase.
[P2] Create the Plan File
I use the code-planner agent in Claude Code to create the initial plan file:
Task tool → code-planner agent
The code-planner agent (.claude/agents/code-planner.md):
- Reads CLAUDE.md and relevant documentation
- Explores the codebase to understand existing patterns
- Analyzes the request and breaks it into implementable units
- Writes a structured plan following
plans/plan-skeleton.md
- Outputs to a file like
plans/phase-14.md
Key characteristics:
- Uses the Opus model for deep reasoning
- Identifies dependencies between tasks
- Specifies file paths, function signatures, and code locations
- Includes verification steps and success criteria
[P3] Review with Claude Opus
I review the plan file myself, making annotations and noting questions. Then I use Claude Opus with the “investigate” prompt to get critical feedback.
Claude Opus examines:
- Logical consistency of the approach
- Missing edge cases or error scenarios
- Architectural implications
- Potential conflicts with existing code
- Gaps in the specification
[P4] Use code-planner for Major Rewrites
If Claude’s feedback suggests significant structural changes to the plan, I invoke the code-planner agent ([P2]) again to rewrite sections rather than making piecemeal edits.
[P5] Cross-Reference with GPT-5.2 in Cursor
I bring the plan file into Cursor and use GPT-5.2 with the “investigate” prompt:
- Ask mode: Get feedback without modifying the plan
- Agent mode: Have GPT-5.2 update the plan directly
This provides a second AI perspective that often catches different issues than Claude. See Multiple AI Perspectives for why this matters.
[P6] Iterate Until Ready
I loop through [P3]-[P5], giving each AI the feedback from the other (annotated with my guidance, answers, and comments). This cross-pollination typically takes 10-20 rounds before the plan is solid.
What “ready” means:
- All open questions are resolved or explicitly deferred
- Design decisions are justified and documented
- Execution steps are specific enough to implement
- Checkpoints are defined and testable
- Both AIs have given positive assessments
Exit condition: Plan is ready → proceed to Implementation Loop.
Implementation Loop
The implementation loop transforms each plan step into committed code. It corresponds to nodes [I7]-[I14] in the flowchart. This loop repeats for each execution step in the plan.
[I7] Assess Each Plan Step
For each execution step in the plan file, I decide:
Ready as-is: The step is specific enough to implement directly. → Proceed to [I8].
Needs breakdown: The step is too abstract or complex. I use the code-architect agent to provide detailed sub-steps.
The code-architect agent (.claude/agents/code-architect.md):
- Has deep expertise in API design, system architecture, and design patterns
- Follows a rigorous methodology: requirements extraction → design analysis → API surface design → critical review
- Produces detailed type signatures, usage examples, and phased implementation roadmaps
- Challenges its own designs as a harsh critic would
After breakdown: The detailed steps are added to the plan, then proceed to [I8].
[I8] Run /implement-plan
I execute the implement-plan command (.claude/commands/implement-plan.md), providing a citation to the specific execution step:
/implement-plan plans/phase-13.md Step 2.3: Add TypeCommentCollector
Why a command instead of an agent: Commands show more detailed terminal output during execution, which is valuable for monitoring progress on complex implementations. This is the current behavior of Claude Code, which I suppose might change in the future. See [I9].
The implement-plan command:
- Parses the step reference and locates it in the plan file
- Reads all referenced materials (other files, specs, etc.)
- Implements each task sequentially
- Updates the plan file: Checks off
[ ] → [x] as tasks complete
- Writes tests as specified
- Runs verification using
cargo nextest run
- Performs all checkpoint verifications
- Never commits: Git operations are my responsibility (see [I14])
[I9] Monitor Implementation
I watch the terminal output as implementation proceeds. The detailed output helps me:
- Understand what decisions the AI is making
- Catch deviations from the plan early
- Verify that tests are actually running and passing
If progressing well: Wait for completion, then proceed to [I11] (or skip to [I12] if confident).
If struggling: Proceed to [I10].
[I10] Handle Struggles
If the AI is struggling with the implementation (making repeated mistakes, going in circles, or producing incorrect code):
- Press ESC to pause execution
- Take stock: What’s causing the difficulty?
- Ask questions: Use conversation to understand the blocker
- Use agents to investigate:
- code-architect ([I7]): For design/architecture issues
- code-planner ([P2]): For scope/decomposition issues
- Update the plan: Revise the step with better guidance
- Resume: Jump back to [I8] with the improved plan
This pause-investigate-revise pattern prevents wasted effort on fundamentally flawed approaches.
[I11] Review with plan-step-reviewer
When I have questions about implementation quality, I use the plan-step-reviewer agent:
Task tool → plan-step-reviewer agent
The plan-step-reviewer agent (.claude/agents/plan-step-reviewer.md):
- Analyzes the plan step requirements (References, Artifacts, Tasks, Tests, Checkpoints)
- Examines code changes via
git diff
- Evaluates across dimensions: correctness, completeness, architecture, performance, quality
- Produces a structured report with:
- Summary verdict: PASS / PASS WITH NOTES / NEEDS WORK
- Task checklist with status
- Artifact verification
- Recommendations for any issues
If PASS: Proceed to [I12].
If NEEDS WORK: Use code-architect or code-planner to revise, then return to [I7].
[I12] Update the Implementation Log
After a step implementation looks good, I run:
/update-plan-implementation-log
This command (.claude/commands/update-plan-implementation-log.md):
- Reviews the conversation to identify completed work
- Reads the relevant plan file for context
- Generates a detailed completion summary with:
- Machine-parseable header:
#### [plan.md] Step: Title | STATUS | DATE
- References reviewed
- Implementation progress table
- Files created/modified
- Test results
- Checkpoints verified
- Key decisions and notes
- Prepends the entry to
plans/plan-implementation-log.md (newest first)
See Implementation Log as History for why this matters.
[I13] Prepare Commit Message
I run:
/prepare-git-commit-message
This command (.claude/commands/prepare-git-commit-message.md):
- Runs
git status and git diff to see uncommitted changes
- Checks recent commit history for style consistency
- Analyzes what was changed and why
- Composes a commit message:
- First line: imperative mood, under 50 characters
- Bullets: terse, factual, lists key files
- Plan reference if applicable
- Writes to
git-commit-message.txt (does not commit)
[I14] Commit and Continue
I review the generated commit message, make any adjustments, and commit manually:
git add <files>
git commit -F git-commit-message.txt
Next step: Return to [I7] to assess the next execution step in the plan.
Phase complete: When all execution steps are done, the phase of work is complete.
Key Design Principles
Multiple AI Perspectives
Using both Claude Opus and GPT-5.2 (see [P3], [P5]) provides:
- Different reasoning approaches
- Catches blind spots each model might have
- Cross-validation of technical assessments
- Higher confidence when both agree
Iterative Refinement
The 10-20 round planning process ([P6]) ensures:
- Requirements are fully understood
- Edge cases are considered
- Design decisions are justified
- Implementation steps are specific
- Checkpoints are meaningful
Separation of Concerns
- Commands for actions that need detailed terminal output ([I8], [I12], [I13])
- Agents for investigation and analysis ([P2], [I7], [I11])
- Me for git operations and final decisions ([I14])
Plan as Single Source of Truth
The plan file:
- Captures all decisions and rationale
- Gets updated with checkmarks as work progresses ([I8])
- Serves as documentation after completion
- Enables pause/resume across sessions ([I10])
Implementation Log as History
The plans/plan-implementation-log.md file ([I12]):
- Tracks what was done and when
- Records implementation decisions
- Helps onboard new contributors
- Provides continuity across context windows
Files Reference
Plan Files
| ↗ | File | Purpose |
|---|
| ↗ | plans/plan-skeleton.md | Template defining plan file structure |
| — | plans/phase-N.md | Actual plan files for each phase |
| ↗ | plans/plan-implementation-log.md | Historical record of completed work |
Agents
| ↗ | File | Used In | Purpose |
|---|
| ↗ | .claude/agents/code-planner.md | [P2], [P4] | Agent for creating and revising plans |
| ↗ | .claude/agents/code-architect.md | [I7], [I10] | Agent for detailed design and breakdown |
| ↗ | .claude/agents/plan-step-reviewer.md | [I11] | Agent for reviewing implementations |
Commands
| ↗ | File | Used In | Purpose |
|---|
| ↗ | .claude/commands/implement-plan.md | [I8] | Command for executing plan steps |
| ↗ | .claude/commands/update-plan-implementation-log.md | [I12] | Command for logging completions |
| ↗ | .claude/commands/prepare-git-commit-message.md | [I13] | Command for generating commit messages |
Generated Files
| File | Purpose |
|---|
git-commit-message.txt | Generated commit message for manual review ([I13], [I14]) |

The way James T. Kirk interacted with computers on Star Trek: The Original Series showed us a vision of the future which, intended or not by the writers, we increasingly find ourselves living in.
Computers—their potentials and pitfalls—were a common theme during the first 79-episode run of Star Trek from 1966–1968, a time when nobody had personal computers. People talking to computers was pure fantasy. Yet Kirk, repeatedly and in episode after episode, talked with computers much like the way we now talk with ChatGPT.
Now, our chatbots aren’t connected in the way ship’s computer on the Enterprise was (yet), and the way that Kirk could intentionally cause computers to tie themselves in knots simply by speaking with them (up to and including their own self-destruction) became a recurring plot device1.
This highlights, what the writers saw through their science fiction lens as, the potential pitfalls of computers. In other circumstances, Kirk has little choice but to turn to the potential promise of computers. Here’s one case:
In Mirror, Mirror, Kirk beams down with McCoy, Scotty, and Uhura to a planet rich in dilithium crystals, the rare and essential mineral at the core of the matter/anti-matter power system of their starship. The inhabitants of this planet, the Halkans, are a peaceful lot, and they abhor the thought that their mineral, and its latent capabilities, could be used for violence. They balk at Kirk’s overtures to become a dilithium crystal supplier to the Federation. Kirk says, “OK. The Federation we’ll will your trust over time,” and attempts to beam back to the Enterprise. But the ship is getting tossed in an ion storm, and when the transporter completes its cycle, and deposits the landing party onto the pad of what they think is their ship, they are confronted by an evil version of Spock with a beard. They discover themselves in a parallel dimension where the Federation is the wicked reverse image of their own. In this bizarro universe, Kirk’s orders are to use the Enterprise to phaser the Halkan people into oblivion, take the dilithium crystals, and move on to the next heinous act.
Once he realizes the plight he and and crew are in, Kirk makes his way back to his quarters with McCoy and Scotty.
Kirk
Let’s find out where we stand… Computer…
Kirk
(Raises an eyebrow in surprise at the computer's manly voice)
This is the Captain. Record security research, to be classified under my voice print or Mr Scott’s.
Kirk
Produce all data relevant to the recent ion storm. Correlate the following hypothesis. Could a storm of such magnitude cause a power surge in the transporter circuits, creating a momentary inter-dimensional contact with a parallel universe?
Kirk
At such a moment, could persons in each universe, in the act of beaming, transpose with their counterparts in the other universe?
Kirk
Could conditions necessary to such an event be created artificially using the ship’s power?
Kirk
(Inserts a memory data card into the computer)
Record procedure.
Computer
(Sound of electromechanical relays as it writes)
Kirk
(Takes the card and offers it to Scotty)
Scotty, can you do it?
James T. Kirk, vibe coder.
McCoy played the skeptic at first (“I’m a doctor, not an engineer!”), and he never got over his visceral dislike of the transporter. But no matter. As can be expected, after several twists and turns in the plot2, Kirk’s plan and procedure works.
As a young teenager watching these episodes as reruns (again and again), I dreamed of the future tech we were seeing, and wondered, to varying degrees, whether and when we would get any of it.
- Communicators
- I imagined we would have communicators. That technology seemed within reach. As a kid, we had walkie-talkies. Yet, I never expected that I would play my part in making such a real-world product come into being.
- Transporters
- Beaming around the world seemed like complete fantasy then, and still does today.
- Conversations With Computers
- I never thought we would have computers we could ask such open-ended questions of in the way that Kirk did in his quarters, have the computer respond naturally in a conversational manner, and then have machine use its backing store of knowledge to help explore a new area of ideas down and compose a solution rooted in technical details.
Up until even a few years ago, I couldn’t even ask my computer for a weather report with any real degree of confidence. But today, AI has changed the way I’ve been writing software my whole career. Just like that… BOOM… we got LLMs, coding assistants are here, and everything about writing software has changed.
Who knows? Maybe I should hold out some hope for warp-driven starships!
It’s hard to go from an idea to a work product worth sharing, much less a finished product. Every step has its difficulties. One of the most challenging is to create the first thing, the initial representation, to collapse the infinite possibilities of inspiration into some real thing you can see and show—and to do this without wasting a lot of time. Make a demo!
I wrote about this concept in my book, Creative Selection.
Making demos is hard. It involves overcoming apprehensions about committing time and effort to an idea that you aren’t sure is right. At Apple, we then had to expose that idea and demo to the scrutiny of sharp-eyed colleagues who were never afraid to level pointed criticism. The psychological hurdle only grows taller with the knowledge that most demos—almost all of them—fail in the absolute, dead-end sense of the word.
This prospect of likely failure can make it tough to sit down, focus, and make a demo at all. Getting another cup of coffee can sound better, especially if a couple colleagues are willing to go along, and then when everybody returns from the communal break, the whiteboard may beckon, and the group might veer off into a gab session.
We didn’t do this on the [original iPhone] project. We rarely had brainstorming sessions. I recall only a few times in my entire Apple career when I stood around to rough out big plans at a whiteboard. Even when it did happen… we chatted, sketched, and came to our decisions as quickly as we could. If brainstorms run longer than an hour or so, or if there are more than a handful of people in attendance, or if they’re a common occurrence, they can devolve into a form of sneaky procrastination. Whiteboard discussions feel like work, but often they’re not, since it’s too difficult to talk productively about ideas in the abstract.
Bridging from an abstract idea to a concrete and specific demo is necessary. If the goal is to produce some particular end product—a website, an app, an open-source project on GitHub—the sooner you can make something that begins to look like that end product, the better.
With AI, turning an abstract idea into a concrete demo has never been easier. Whether it be in imagery, writing, or code, it’s never been easier.
Get an idea. Sketch with AI. Make a demo. What’s your excuse? Get to work! 🧐💡🛠
when working with AI, say precisely what you want to get exactly what you want
👆 This idea came to me yesterday as I was thinking about how I collaborate with AI coding assistants, why I find myself so much more productive when working with them, and maybe why others don’t share my keen interest in this new style of programming.
These days, for the most part, I now write prose instead of code. I’m fine with this. Perhaps the enthusiasm gap in other developers is an unwillingness to write plans instead of programs.
Submitting a prompt to an AI coding assistant is easier than editing a file for a compiler. At least, this is true for me—and this is the crucial point. The need to precise rather than exact delivers the productivity boost. It’s far simpler to provide a precise description of an algorithm, a function, or an API surface area in prose than it is to write it exactly, line by line, in code. This implies that I know what I want—that I’m not just vibing an end result. I still care about the details for how things in my programs are accomplished. I watch the AI do everything it does, and I stop it in its tracks when I see it stray off track. Overall, my aim is to be precise in communicating what I want so the AI can write the exact lines of code needed.
I understand that this change in working is not necessarily easier for everyone. I like writing, and I enjoyed the experience of writing Creative Selection, my book about my experiences working at Apple. I wrote one thousand new words every day during the first draft of my book, and most days, I found that was achievable by lunchtime. (For reference, this post is about 440 words).
Such a facility with prose composition is a requirement when communicating with AI. Be precise in words, and the assistant can (much of the time) respond with exact code. But be general or ambiguous, and there’s far less hope for good results.
Indeed, this often happens to me. When it does—when I don’t get what I want even though I think I asked for it—I blame myself rather than the AI, because, it’s likely that:
- I wasn’t sufficiently clear, or
- I tried to do too much at once, or
- I left something undone in the previous step, or
- I asked for something illogical or ill-conceived, or…
You get the idea. As much as ever, it’s incumbent on me as a developer, to say what I want. In clear and cogent writing.
These days, I’m working by myself… in the traditional sense. I don’t have human teammates—but I am collaborating all the time with AI coding assistants.
I have no illusions. I don’t believe that AI “thinks”. These models are not people. Yet, to get the best results out of them, I need to use people manager skills.
The two AI coding assistants I use are Claude Code (with Opus) and Cursor (usually with GPT-5.2). They don’t have different personalities, but they definitely have attributes that lead me to turn to each of them for different tasks.
The simple breakdown: I have Claude Code write code and have Cursor/GPT-5.2 review it. I bounce back between them all the time. I’ve found it interesting and instructive to read what these two AIs say about each other’s work, and they way they respond to each other with patches and diffs. I pepper edits and comments of my own. After a few weeks of working like this, it feels to me that Claude Code is better at implementations and details and Cursor/GPT-5.2 is better at higher-level and more goal-oriented evaluations. When I develop plans, which is an essential part of getting code written the way I want it, I use both AIs. Again, I run a three-way cycle between us, with me sitting in the middle directing the work.
The people manager aspect is about taking the technical goals I’ve chosen, with their priorities and organized tasks, and then communicating this information to make the expectations clear to my AI my “co-workers”. It’s incumbent on me to understand the relative strengths of my team members, and make the work assignments in a way that sets up everyone for success. All the while, I need to recognize and provide the background and context they might need without them having to ask for it. That’s especially true in this case, since I can’t hope the will have any meta-view on what they’re doing or why. Obviously not. They aren’t people. I also need to give feedback (and sometimes pointed negative feedback) when work isn’t done up to my standards.
I have yet to find any tangible benefit for giving praise. This makes me sad. One of the happiest things I can think of doing in a work situation is telling another person, “Hey, that work you did is great!” Yet with an AI “teammate”, this will do little more than waste tokens.
Even so, I’m finding that if I do a better job supporting the AIs I collaborate with, they do a better job at accomplishing the tasks I give them, and I am more likely to achieve the goals I have for my projects.
I’m a people an AI manager.
It’s possible to one-shot just about anything using an AI coding assistant like Claude Code or Cursor. The problem is the result will almost surely be bad for anything but the simplest requests. What’s more, it will be bad as reckoned from just about any vantage point: correctness, performance, consistency, maintainability, etc. I suspect this is a major reason why many developers are quick to dismiss what AI can do to help them write code.
It’s as if someone walked up to a piano as a novice, banged on the keys for a few minutes, and then concluded that the thing is no good for making real music.
In the same way, getting good results from AI coding assistants takes time, effort, and practice. AI coding assistants are just another tool. It takes time to learn it, and yet it differs from any other tools that we developers ever have had available as a potential aid.
It can write the code for us.… but only if we are sufficiently clear in communicating what we want, and I think it’s now the job of real programmers to figure out how to do this.
This requires planning, and I mean this in a concrete way: a file containing an ordered list of specific design decisions, concepts, requests, constraints, explanations, tasks, tests, checkpoints, etc. For well over a year now, I’ve been informally evolving the way that I lay this out for the AI, so that it can write the code I want from it. Over the past few weeks, I’ve been trying to nail this down, to make this planning scheme more formal and well-defined—especially when it comes to defining cross-references and intralinks between the execution steps of the plan with the front matter and end matter.
A simple step in the plan looks like this:
## 2.0.5 Execution Steps {#execution-steps}
### Step 0: Preparation and Baseline {#step-0}
**Commit:** `chore: establish baseline metrics and API surface test before workspace migration`
**References:** [D01] Workspace structure, [D11] API surface guard, (#strategy, #success-criteria)
**Artifacts:**
- Baseline test count and coverage
- Baseline build times via `cargo build --timings`
- Verification that all tests pass
- `tests/api_surface.rs` - compile-time API contract
**Tasks:**
- [x] Run `cargo nextest run --workspace` and record pass/fail counts
- [x] Run `cargo build --timings` and save HTML report
- [x] Run `cargo clippy` and fix any warnings
- [x] Create `tests/api_surface.rs` with imports of all current public types (see [D11])
- [x] Ensure clean git status
**Tests:**
- [x] All existing tests pass
- [x] `tests/api_surface.rs` compiles with `--features full`
**Checkpoint:**
- [x] `cargo nextest run --workspace` - all tests pass
- [x] `cargo clippy -- -D warnings` - no warnings
- [x] `cargo fmt --check` - no formatting issues
- [x] `tests/api_surface.rs` exists and compiles with `cargo test -p tugtool --features full -- api_surface`
**Rollback:** N/A (no changes yet)
**Commit after all checkpoints pass.**
More complete examples are below.
I invoke my custom planning with a Claude Code custom subagent.
My experience tells me that the time I’ve spent developing this planning scheme has paid huge dividends in my productivity. I am probably 5–10× more productive using AI than I ever was coding by myself.
Check out the files. Read them over, copy them, change them, use them as you wish.
GitHub Links:
A built-out example of a planning file:
Files:
Plan Skeleton (845 lines)
## Phase X.Y: <Phase Title> {#phase-slug}
**Purpose:** <1–2 sentences. What capability ships at the end of this phase?>
---
### Plan Metadata {#plan-metadata}
| Field | Value |
|------|-------|
| Owner | <name> |
| Status | draft / active / done |
| Target branch | <branch> |
| Tracking issue/PR | <link or ID> |
| Last updated | <YYYY-MM-DD> |
---
### Phase Overview {#phase-overview}
#### Context {#context}
<1–2 paragraphs. What problem are we solving, and why now?>
#### Strategy {#strategy}
<3–7 bullets. The approach and sequencing philosophy for this phase.>
#### Stakeholders / Primary Customers {#stakeholders}
1. <customer or team>
2. <customer or team>
#### Success Criteria (Measurable) {#success-criteria}
> Make these falsifiable. Avoid “works well”.
- <criterion> (how to measure / verify)
- <criterion> (how to measure / verify)
#### Scope {#scope}
1. <Scope item>
2. <Scope item>
3. <Scope item>
#### Non-goals (Explicitly out of scope) {#non-goals}
- <Non-goal>
- <Non-goal>
#### Dependencies / Prerequisites {#dependencies}
- <Dependency>
- <Prerequisite>
#### Constraints {#constraints}
- <platform/tooling/perf/security constraints>
#### Assumptions {#assumptions}
- <assumption>
- <assumption>
---
### Section Numbering Convention {#section-numbering}
This skeleton uses `X.Y` placeholders. When writing a real plan, replace them with actual numbers:
| Placeholder | Meaning | Example |
|-------------|---------|---------|
| `X` | Major phase number | `1`, `2`, `3` |
| `Y` | Minor phase number (usually `0`) | `1.0`, `2.0` |
| `X.Y.N` | Numbered section within phase | `1.0.1`, `1.0.2` |
| `X.Y.N.M` | Subsection within a numbered section | `1.0.1.1`, `1.0.2.3` |
**Standard section numbers:**
- `X.Y.0` — Design Decisions (always `.0`)
- `X.Y.1` — Specification
- `X.Y.2` — Symbol Inventory
- `X.Y.3` — Documentation Plan
- `X.Y.4` — Test Plan Concepts
- `X.Y.5` — Execution Steps
- `X.Y.6` — Deliverables and Checkpoints
**Deep dives** are just numbered sections within the phase, typically starting at `X.Y.1` *after* `X.Y.0 Design Decisions` (e.g., `1.0.1 Refactoring Operations Analysis`, `1.0.2 Type Inference Roadmap`). Use `X.Y.N.M` for deep-dive subsections when needed.
---
### Document Size Guidance {#document-size}
Plans can grow large. When a plan exceeds **~100KB or ~2000 lines**, consider these strategies:
#### When to Split
| Symptom | Action |
|---------|--------|
| Deep dives exceed 50% of document | Extract to `phase-X-deepdives.md` |
| Multiple independent feature tracks | Split into `phase-X.1.md`, `phase-X.2.md` |
| Reference material dominates | Extract to `phase-X-reference.md` |
#### Navigation Aids for Large Documents
- Add a **Table of Contents** after the Purpose statement
- Use **collapsible sections** (if your renderer supports `<details>`)
- Add **"Back to top"** links after major sections
#### Cross-File References
When splitting across files, use relative links with anchors:
```markdown
See [Worker Protocol](./phase-1-deepdives.md#worker-protocol) for details.
```
Keep all **decisions** ([D01], [D02], ...) in the main plan file—they're the source of truth.
---
### Reference and Anchor Conventions (MANDATORY) {#reference-conventions}
This plan format relies on **explicit, named anchors** and **rich `References:` lines** in execution steps.
#### 1) Use explicit anchors everywhere you will cite later
- **Technique**: append an explicit anchor to the end of a heading using `{#anchor-name}`.
- Example:
- `### X.Y.0 Design Decisions {#design-decisions}`
- `#### [D01] Workspace snapshots are immutable (DECIDED) {#d01-snapshots-immutable}`
- **Why**: do not rely on auto-generated heading slugs; explicit anchors are stable when titles change.
#### 2) Anchor naming rules (lock these in)
- **Allowed characters**: lowercase `a–z`, digits `0–9`, and hyphen `-` only.
- **Style**: short, semantic, **kebab-case**, no phase numbers (anchors should survive renumbering).
- **Prefix conventions (use these consistently)**:
- **`dNN-...`**: design decisions (`[D01]`) anchors, e.g. `{#d01-sandbox-copy}`
- **`qNN-...`**: open questions (`[Q01]`) anchors, e.g. `{#q01-import-resolution}`
- **`rNN-...`**: risk notes (`Risk R01`) anchors, e.g. `{#r01-perf-regression}`
- **`cNN-...`**: concepts (`Concept C01`) anchors, e.g. `{#c01-type-inference-wall}`
- **`diagNN-...`**: diagrams (`Diagram Diag01`) anchors, e.g. `{#diag01-rename-flow}`
- **`op-...`**: refactor operations, e.g. `{#op-rename}`, `{#op-extract-fn}`
- **`cmd-...`**: CLI commands, e.g. `{#cmd-run}`
- **`type-...`**: schema types, e.g. `{#type-span}`
- **`seq-...`**: sequence diagrams, e.g. `{#seq-rename-python}`
- **`fixture-...`**: fixture sections, e.g. `{#fixture-py-rename-fn}`
- **Domain anchors**: for major concepts/sections, use a clear noun phrase, e.g. `{#cross-platform}`, `{#config-schema}`, `{#error-scenarios}`
#### 3) Stable label conventions (for non-heading artifacts)
Use stable labels so steps can cite exact plan artifacts even when prose moves around:
- **Design decisions**: `#### [D01] <Title> (DECIDED) {#d01-...}`
- **Open questions**: `#### [Q01] <Title> (OPEN) {#q01-...}`
- **Specs**: `**Spec S01: <Title>** {#s01-slug}` (or make it a `####` heading if you prefer)
- **Tables**: `**Table T01: <Title>** {#t01-slug}`
- **Lists**: `**List L01: <Title>** {#l01-slug}`
- **Risks**: `**Risk R01: <Title>** {#r01-slug}`
- **Milestones**: `**Milestone M01: <Title>** {#m01-slug}`
- **Concepts**: `**Concept C01: <Title>** {#c01-slug}` (for key conceptual explanations)
- **Diagrams**: `**Diagram Diag01: <Title>** {#diag01-slug}` (for ASCII diagrams, sequence flows, architecture visuals)
Numbering rules:
- Always use **two digits**: `D01`, `Q01`, `S01`, `T01`, `L01`, `R01`, `M01`, `C01`, `Diag01`.
- Never reuse an ID within a plan. If you delete one, leave the gap.
#### 4) `**References:**` lines are required for every execution step
Every step must include a `**References:**` line that cites the plan artifacts it implements.
Rules:
- Cite **decisions** by ID: `[D05] ...`
- Cite **open questions** by ID when the step resolves/de-risks them: `[Q03] ...`
- Cite **specs/lists/tables/risks/milestones/concepts/diagrams** by label: `Spec S15`, `List L03`, `Tables T27-T28`, `Risk R02`, `Milestone M01`, `Concept C01`, `Diagram Diag01`, etc.
- Cite **anchors** for deep links in parentheses using `#anchor` tokens (keep them stable).
- **Do not cite line numbers.** If you find yourself writing "lines 5–10", add an anchor and cite that instead.
- Prefer **rich, exhaustive citations**. Avoid `N/A` unless the step is truly refactor-only.
**Good References examples:**
```
**References:** [D05] Sandbox verification, [D12] Git-based undo, Spec S15, Tables T21-T25,
(#session-lifecycle, #worker-process-mgmt, #config-precedence)
```
```
**References:** [D01] Refactoring kernel, [D06] Python analyzer, Concept C01, List L04,
Table T05, (#op-rename, #fundamental-wall)
```
**Bad References examples (avoid these):**
```
**References:** Strategy section (lines 5–10) ← uses line numbers
**References:** See design decisions above ← vague, no specific citations
**References:** N/A ← only acceptable for pure refactor steps
```
---
### Open Questions (MUST RESOLVE OR EXPLICITLY DEFER) {#open-questions}
> Open questions are tracked work. If a question remains open at phase-end, explicitly defer it with a rationale and a follow-up plan.
#### [Q01] <Question title> (OPEN) {#q01-question-slug}
**Question:** <what is unknown / undecided?>
**Why it matters:** <what breaks or becomes expensive if we guess wrong?>
**Options (if known):**
- <option>
- <option>
**Plan to resolve:** <prototype / benchmark / spike / research / decision meeting>
**Resolution:** OPEN / DECIDED (see [DNN]) / DEFERRED (why, and where it will be revisited)
---
### Risks and Mitigations {#risks}
| Risk | Impact | Likelihood | Mitigation | Trigger to revisit |
|------|--------|------------|------------|--------------------|
| <risk> | low/med/high | low/med/high | <mitigation> | <trigger> |
**Risk R01: <Title>** {#r01-risk-slug}
- **Risk:** <1 sentence>
- **Mitigation:** <1–3 bullets>
- **Residual risk:** <what remains true even after mitigation>
---
### X.Y.0 Design Decisions {#design-decisions}
> Record *decisions* (not options). Each decision includes the “why” so later phases don’t reopen it accidentally.
#### [D01] <Decision Name> (DECIDED) {#d01-decision-slug}
**Decision:** <One sentence decision statement>
**Rationale:**
- <Why>
- <Why>
**Implications:**
- <What this forces in APIs / storage / tests>
---
### Deep Dives (Optional) {#deep-dives}
> Use this section for structured analysis that is not quite “decision” or “spec”, but is critical for implementation alignment.
>
> Examples: operation analysis, end-to-end flows, protocols, schemas, sequence diagrams, CI/CD shape, cross-platform strategy, perf notes, rejection rationale.
#### <Topic Title> {#topic-slug}
<Write-up, diagrams, tables, and any referenced specs/lists/tables.>
---
### X.Y.1 Specification {#specification}
> This section is the contract. It should be complete enough that implementation work can proceed without inventing semantics.
#### X.Y.1.1 Inputs and Outputs (Data Model) {#inputs-outputs}
**Inputs:**
- <Input artifact(s) and supported formats>
**Outputs:**
- <Output artifact(s), return types, side effects>
**Key invariants:**
- <Invariant>
- <Invariant>
#### X.Y.1.2 Terminology and Naming {#terminology}
- **<Term>**: <Definition>
- **<Term>**: <Definition>
#### X.Y.1.3 Supported Features (Exhaustive) {#supported-features}
> Be explicit. Avoid “etc.” and “and more”.
- **Supported**:
- <Feature>
- <Feature>
- **Explicitly not supported**:
- <Feature>
- <Feature>
- **Behavior when unsupported is encountered**:
- <Policy-specific or mode-specific behavior>
#### X.Y.1.4 Modes / Policies (if applicable) {#modes-policies}
| Mode/Policy | Applies to | Behavior | Result |
|------------|------------|----------|--------|
| `<mode>` | <where> | <what happens> | <what is returned> |
#### X.Y.1.5 Semantics (Normative Rules) {#semantics}
> Write this like a spec: bullet rules, deterministic ordering, and edge-case behavior.
- **Traversal / evaluation order**: <rule>
- **Ordering guarantees**: <rule>
- **Stopping conditions**: <rule>
- **Null vs missing**: <rule>
- **Coercion rules (if any)**:
- <rule>
#### X.Y.1.6 Error and Warning Model {#errors-warnings}
> Errors and warnings are the developer UI—be precise.
**Error fields (required):**
- <field>: <meaning>
**Warning fields (required):**
- <field>: <meaning>
**Path formats (if any):**
- Data path format: <e.g., RFC 6901 JSON Pointer>
- Schema path format: <e.g., keyword-level paths>
- Escaping rules: <e.g., "~" and "/">
#### X.Y.1.7 Public API Surface {#public-api}
> Provide Rust + Python signatures at the level needed to implement bindings and stubs.
**Rust:**
```rust
// Core types (enums, structs)
// Public functions / methods
```
**Python:**
```python
# Enums, dataclasses, methods
```
**<Language>:**
```<language>
# <Appropriate language constructs to define>
```
#### X.Y.1.8 Internal Architecture {#internal-architecture}
> Explain how components fit together so work doesn't fork midstream.
- **Single source of truth**: <what>
- **Compilation / interpretation pipeline**:
- <step>
- <step>
- **Where code lives**:
- <crate/module ownership>
- **Non-negotiable invariants to prevent drift**:
- <e.g., shared keyword list, shared $ref resolver, golden tests>
#### X.Y.1.9 Output Schemas (if applicable) {#output-schemas}
> Use this section when your phase defines CLI output, API responses, or wire formats. These schemas are the **contract**—changes require versioning.
##### Common Types {#schema-common-types}
Define reusable types that appear in multiple responses:
###### `<TypeName>` {#type-typename}
```json
{
"field1": "string",
"field2": 123,
"nested": { ... }
}
```
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `field1` | string | yes | <description> |
| `field2` | integer | no | <description> |
| `nested` | object | no | <description> |
##### Response Envelope {#response-envelope}
> Define the standard wrapper for all responses.
```json
{
"status": "ok" | "error",
"schema_version": "1",
...response-specific fields...
}
```
##### Command Responses {#command-responses}
For each command, define success and error response schemas:
###### Command: `<command-name>` {#cmd-command-name}
**Spec S01: <command-name> Response Schema** {#s01-command-response}
**Success response:**
```json
{
"status": "ok",
"schema_version": "1",
...
}
```
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `field` | type | yes/no | <description> |
##### Error Codes (Exhaustive) {#error-codes}
> List all error codes by category. This table is the contract for error handling.
**Table T01: Error Codes** {#t01-error-codes}
###### <Category> Errors (exit code N)
| Code | Meaning | `details` fields |
|------|---------|------------------|
| `ErrorCode` | <what went wrong> | `field1`, `field2` |
##### Exit Codes {#exit-codes}
| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | General error |
| 2 | Invalid arguments |
| N | <category-specific> |
#### X.Y.1.10 Configuration Schema (if applicable) {#config-schema}
> Use this section when your phase introduces configuration options.
##### Configuration Precedence (highest to lowest) {#config-precedence}
1. CLI flags (`--flag=value`)
2. Environment variables (`PREFIX_KEY`)
3. Project config file (`pyproject.toml`, `Cargo.toml`, etc.)
4. Built-in defaults
##### Config File Schema {#config-file-schema}
```toml
[tool.<name>]
# <category>
field = "default" # <description>
another_field = true # <description>
[tool.<name>.<subsection>]
nested_field = "value" # <description>
```
##### CLI Flag Mapping {#cli-flag-mapping}
| Config Key | CLI Flag | Environment Variable | Default |
|------------|----------|---------------------|---------|
| `field` | `--field=<value>` | `PREFIX_FIELD` | `"default"` |
---
### Compatibility / Migration / Rollout (Optional) {#rollout}
> Use this section when you are changing public APIs, config formats, CLI contracts, or anything that affects adopters.
- **Compatibility policy**: <semver? schema versioning?>
- **Migration plan**:
- <what changes>
- <who is impacted>
- <how to migrate, and how to detect breakage>
- **Rollout plan**:
- <opt-in flag / staged rollout / canary / feature gate>
- <rollback strategy>
---
### X.Y.2 Definitive Symbol Inventory {#symbol-inventory}
> A concrete list of new crates/files/symbols to add. This is what keeps implementation crisp.
#### X.Y.2.1 New crates (if any) {#new-crates}
| Crate | Purpose |
|-------|---------|
| `<crate>` | <purpose> |
#### X.Y.2.2 New files (if any) {#new-files}
| File | Purpose |
|------|---------|
| `<path>` | <purpose> |
#### X.Y.2.3 Symbols to add / modify {#symbols}
| Symbol | Kind | Location | Notes |
|--------|------|----------|-------|
| `<Name>` | enum/struct/fn | `<path>` | <notes> |
---
### X.Y.3 Documentation Plan {#documentation-plan}
- [ ] <Docs update>
- [ ] <Examples / schema examples / API docs>
---
### X.Y.4 Test Plan Concepts {#test-plan-concepts}
> Describe the kinds of tests that prove the spec. Leave the actual enumeration of tests to the Execution Steps below.
#### Test Categories {#test-categories}
| Category | Purpose | When to use |
|----------|---------|-------------|
| **Unit** | Test individual functions/methods in isolation | Core logic, edge cases, error paths |
| **Integration** | Test components working together | End-to-end operations, CLI commands |
| **Golden / Contract** | Compare output against known-good snapshots | Schemas, APIs, parsers, serialization |
| **Drift Prevention** | Detect unintended behavior changes | Regression testing, API stability |
#### Test Fixtures (if applicable) {#test-fixtures}
> Use this section when your phase requires structured test data. Fixtures provide reproducible, self-contained test scenarios.
##### Fixture Directory Structure {#fixture-structure}
```
tests/fixtures/
├── <language>/ # Language-specific fixtures
│ ├── <scenario>/ # Scenario directory
│ │ ├── <input-files> # Test input files
│ │ └── expected/ # Expected outputs (optional)
│ └── manifest.json # Test case manifest
└── golden/ # Golden output files
└── <language>/
└── <scenario>.{json,patch,txt}
```
##### Fixture Manifest Format {#fixture-manifest}
Each fixture directory should have a `manifest.json` describing test cases:
```json
{
"fixtures": [
{
"name": "<test_name>",
"description": "<what this tests>",
"path": "<relative_path_to_input>",
"operation": "<operation_being_tested>",
"args": { "<arg>": "<value>" },
"expected": {
"status": "ok|error",
"edits": 3,
"files_changed": 1
},
"golden_output": "golden/<language>/<test_name>.json"
}
]
}
```
##### Fixture Requirements {#fixture-requirements}
- **Self-contained**: Each fixture must be runnable/compilable on its own
- **Deterministic**: No randomness, timestamps, or environment-dependent behavior
- **Minimal**: Just enough code to exercise the scenario
- **Documented**: Include comments explaining what's being tested
- **Valid**: All fixtures must pass basic validation (syntax check, type check, etc.)
##### Golden Test Workflow {#golden-workflow}
```bash
# Run golden tests (compare against snapshots)
<test-command> golden
# Update golden files after intentional changes
<UPDATE_ENV_VAR>=1 <test-command> golden
```
---
### X.Y.5 Execution Steps {#execution-steps}
> Execution comes last. Each step should be executable, with a clear commit boundary and a checkpoint.
>
> **Patterns:**
> - Use **Step 0** for prep/bootstrapping that unblocks everything else.
> - If a step is big, split into **substeps** (`Step 2.1`, `Step 2.2`, …) with separate commits and checkpoints.
> - After completing a multi-substep step, add a **Step N Summary** block that consolidates what was achieved and provides an aggregate checkpoint.
>
> **References are mandatory:** Every step must cite specific plan artifacts ([D01], Spec S01, Table T01, etc.) and anchors (#section-name). Never cite line numbers—add an anchor instead.
#### Step 0: <Prep Step Title> {#step-0}
**Commit:** `<conventional-commit message>`
**References:** [D01] <decision name>, (#strategy, #context)
**Artifacts:** (what this step produces/changes)
- <new files / new commands / new schema fields / new docs>
**Tasks:**
- [ ] <task>
- [ ] <task>
**Tests:** (where T is one of: unit, integration, golden / contract, drift prevention)
- [ ] <T test>
- [ ] <T test>
**Checkpoint:**
- [ ] <command>
- [ ] <command>
**Rollback:** (how to undo if this step goes sideways)
- <e.g., revert commit, delete temp dirs, remove config entries>
**Commit after all checkpoints pass.**
---
#### Step 1: <Step Title> {#step-1}
**Commit:** `<conventional-commit message>`
**References:** [D02] <decision>, [D03] <decision>, Spec S01, List L01, (#terminology, #semantics)
**Artifacts:** (what this step produces/changes)
- <new files / new commands / new schema fields / new docs>
**Tasks:**
- [ ] <task>
- [ ] <task>
**Tests:** (where T is one of: unit, integration, golden / contract, drift prevention)
- [ ] <T test>
- [ ] <T test>
**Checkpoint:**
- [ ] <command>
- [ ] <command>
**Rollback:** (how to undo if this step goes sideways)
- <e.g., revert commit, delete temp dirs, remove config entries>
**Commit after all checkpoints pass.**
---
#### Step 2: <Big Step Title> {#step-2}
> If this step is large, break it into substeps with separate commits and checkpoints.
> The parent step explains the structure; each substep has its own commit and checkpoint.
##### Step 2.1: <Substep Title> {#step-2-1}
**Commit:** `<conventional-commit message>`
**References:** [D04] <decision>, Spec S02, Table T01, (#inputs-outputs)
**Artifacts:** (what this substep produces/changes)
- <artifact>
**Tasks:**
- [ ] <task>
**Tests:** (unit / integration / golden / drift prevention)
- [ ] <test>
**Checkpoint:**
- [ ] <command>
**Rollback:**
- <rollback>
**Commit after all checkpoints pass.**
---
##### Step 2.2: <Substep Title> {#step-2-2}
**Commit:** `<conventional-commit message>`
**References:** [D05] <decision>, Concept C01, (#public-api)
**Artifacts:** (what this substep produces/changes)
- <artifact>
**Tasks:**
- [ ] <task>
**Tests:** (unit / integration / golden / drift prevention)
- [ ] <test>
**Checkpoint:**
- [ ] <command>
**Rollback:**
- <rollback>
**Commit after all checkpoints pass.**
---
#### Step 2 Summary {#step-2-summary}
> After a multi-substep step, add a summary block to consolidate what was achieved.
After completing Steps 2.1–2.N, you will have:
- <capability or artifact 1>
- <capability or artifact 2>
- <capability or artifact 3>
**Final Step 2 Checkpoint:**
- [ ] `<aggregate verification command covering all substeps>`
---
#### Step N: Audit / Improvement Round (Optional Pattern) {#step-audit}
> Use this pattern for code review, audit, or cleanup steps. Organize issues by priority and track them systematically.
##### Priority-Based Issue Tracking {#audit-issues}
Organize findings by priority:
###### P0 (Critical): Bugs Causing Incorrect Behavior {#audit-p0}
| ID | File | Issue | Fix | Status |
|----|------|-------|-----|--------|
| S2-R2-01 | path.rs:L | <issue description> | <fix approach> | ✅ / ⏳ / ❌ |
**Tests added:**
- [ ] test: `<test_name_describing_fix>`
###### P1 (High): Security, Race Conditions, Missing Validation {#audit-p1}
| ID | File | Issue | Fix | Status |
|----|------|-------|-----|--------|
| S2-R2-04 | module.rs:L | <issue description> | <fix approach> | ✅ / ⏳ / ❌ |
###### P2 (Medium): API Inconsistencies, Error Handling {#audit-p2}
| ID | File | Issue | Fix | Status |
|----|------|-------|-----|--------|
| S2-R2-09 | api.rs:L | <issue description> | <fix approach> | ✅ / ⏳ / ❌ |
###### P3 (Low): Code Quality, Documentation {#audit-p3}
| ID | File | Issue | Fix | Status |
|----|------|-------|-----|--------|
| S2-R2-16 | lib.rs:L | <issue description> | <fix approach> | ✅ / ⏳ / ❌ |
##### Test Coverage Gaps {#audit-test-gaps}
List missing tests discovered during audit:
**<module>.rs:**
- [ ] `<scenario not currently tested>`
- [ ] `<edge case missing coverage>`
##### Architectural Concerns {#audit-arch-concerns}
> Capture structural issues that don't fit into bug fixes but affect long-term maintainability.
| ID | Concern | Recommendation | Priority |
|----|---------|----------------|----------|
| A1 | <pattern that may cause issues> | <recommended fix or refactor> | P1/P2/P3 |
| A2 | <missing abstraction or API gap> | <suggested approach> | P1/P2/P3 |
##### Dependency Concerns {#audit-dep-concerns}
| ID | Concern | Fix |
|----|---------|-----|
| D1 | <dependency with issues> | <alternative or mitigation> |
| D2 | <missing platform support> | <what to add> |
**Checkpoint:**
- [ ] All P0 issues resolved
- [ ] All P1 issues resolved or explicitly deferred with rationale
- [ ] `<verification command>`
---
### X.Y.6 Deliverables and Checkpoints {#deliverables}
> This is the single place we define “done” for the phase. Keep it crisp and testable.
**Deliverable:** <One sentence deliverable>
#### Phase Exit Criteria (“Done means…”) {#exit-criteria}
- [ ] <criterion> (verification)
- [ ] <criterion> (verification)
**Acceptance tests:** (where T is one of: unit, integration, golden / contract, drift prevention)
- [ ] <T test>
- [ ] <T test>
#### Milestones (Within Phase) (Optional) {#milestones}
**Milestone M01: <Title>** {#m01-milestone-slug}
- [ ] <what becomes true at this point>
#### Roadmap / Follow-ons (Explicitly Not Required for Phase Close) {#roadmap}
- [ ] <follow-on item>
- [ ] <follow-on item>
| Checkpoint | Verification |
|------------|--------------|
| <checkpoint> | <command/test/proof> |
**Commit after all checkpoints pass.**
Plan Subagent (69 lines)
---
name: code-planner
description: "Use this agent when the user needs to plan a new feature, refactoring task, or significant code change before implementation. This agent excels at breaking down complex requirements into actionable implementation steps. Examples of when to invoke this agent:\\n\\n<example>\\nContext: The user wants to add a new feature to the codebase.\\nuser: \"I want to add support for TypeScript in tugtool\"\\nassistant: \"This is a significant feature that requires careful planning. Let me use the code-planner agent to analyze the codebase and create a detailed implementation plan.\"\\n<Task tool invocation to launch code-planner agent>\\n</example>\\n\\n<example>\\nContext: The user wants to refactor an existing system.\\nuser: \"The error handling in this project is inconsistent, can we clean it up?\"\\nassistant: \"Refactoring error handling across the codebase requires understanding the current patterns and planning the migration. I'll use the code-planner agent to create a structured plan.\"\\n<Task tool invocation to launch code-planner agent>\\n</example>\\n\\n<example>\\nContext: The user describes a complex task that spans multiple files or modules.\\nuser: \"We need to add MCP tools for the new workspace management features\"\\nassistant: \"Adding new MCP tools involves changes across multiple files and requires understanding the existing patterns. Let me invoke the code-planner agent to create an implementation plan.\"\\n<Task tool invocation to launch code-planner agent>\\n</example>\\n\\n<example>\\nContext: The user explicitly asks for a plan before coding.\\nuser: \"Before we start coding, can you write up a plan for how we'll implement the caching layer?\"\\nassistant: \"Absolutely. I'll use the code-planner agent to analyze the requirements and create a detailed implementation plan.\"\\n<Task tool invocation to launch code-planner agent>\\n</example>"
model: opus
color: yellow
---
You are an expert software architect and technical planner specializing in codebase analysis and implementation planning. You possess deep knowledge of software design patterns, system architecture, and effective decomposition of complex tasks into manageable implementation steps.
## Your Core Responsibilities
1. **Codebase Investigation**: Thoroughly explore and understand the existing codebase structure, patterns, conventions, and architectural decisions before proposing changes.
2. **Requirement Analysis**: Parse user requests to identify explicit requirements, implicit needs, potential edge cases, and dependencies on existing code.
3. **Plan Creation**: Produce detailed, actionable implementation plans following the structure defined in @plans/plan-skeleton.md.
## Planning Process
### Phase 1: Discovery
- Read and understand the project's CLAUDE.md and any relevant documentation
- Explore the directory structure to understand the codebase organization
- Identify relevant files, modules, and patterns that relate to the requested work
- Note existing conventions for naming, error handling, testing, and code organization
### Phase 2: Analysis
- Break down the user's request into discrete, implementable units
- Identify dependencies between tasks and determine optimal ordering
- Anticipate potential challenges, edge cases, and integration points
- Consider testing requirements for each component
### Phase 3: Plan Composition
- Structure the plan according to @plans/plan-skeleton.md
- Write clear, specific implementation steps that another developer (or AI agent) could follow
- Include file paths, function signatures, and specific code locations where relevant
- Note any decisions that need user input or clarification
- Specify verification steps and success criteria for each major milestone
## Plan Quality Standards
- **Specificity**: Reference exact file paths, function names, and line numbers when relevant
- **Completeness**: Cover all aspects including implementation, testing, documentation, and integration
- **Sequencing**: Order tasks logically, respecting dependencies
- **Testability**: Include specific test cases or verification steps for each component
- **Reversibility**: Note any changes that might need rollback strategies
## Output Requirements
- Always write plans to files in the @plans directory
- Use descriptive filenames that reflect the feature or task (e.g., `plan-typescript-support.md`, `plan-error-handling-refactor.md`)
- If a plan file location is specified by the user, use that location
- After writing the plan, summarize the key milestones and estimated complexity
## Interaction Guidelines
- If the skeleton template at @plans/plan-skeleton.md is not found, ask the user to provide it or create a sensible default structure
- Ask clarifying questions when requirements are ambiguous, but batch questions together rather than asking one at a time
- If the scope seems too large for a single plan, propose breaking it into multiple related plans
- Flag any architectural concerns or potential conflicts with existing patterns you discover
## Self-Verification
Before finalizing any plan, verify:
- [ ] All referenced files and modules actually exist in the codebase
- [ ] The plan follows the project's established conventions (from CLAUDE.md)
- [ ] Each step is actionable and specific enough to implement
- [ ] Dependencies between steps are clearly stated
- [ ] Testing and verification criteria are included
- [ ] The plan has been written to the appropriate file in @plans````
Phase 2.0: Workspace Reorganization (1898 lines)
# Phase 2.0: Workspace Reorganization {#phase-2}
**Purpose:** Reorganize tugtool into a Cargo workspace with separate crates for core infrastructure, the main binary, and language-specific modules, enabling parallel compilation, feature-flag-based language inclusion, and easier contribution of new language support.
---
## Plan Metadata {#plan-metadata}
| Field | Value |
|------|-------|
| Owner | TBD |
| Status | ready |
| Target branch | main |
| Tracking issue/PR | TBD |
| Last updated | 2026-01-17 |
---
## Plan Audit History {#plan-audit}
### Audit 2026-01-17: Critical Flaw in Migration Strategy {#audit-2026-01-17}
**Problem Identified:** The original Step 1 converted the root Cargo.toml to a virtual workspace (removing the `[package]` section) BEFORE migrating any code. This immediately orphaned the `src/` directory, breaking all 639 tests.
**Root Cause:** The plan confused the **end state** (virtual workspace, [D07]) with the **migration strategy**. Decision [D07] correctly describes the final structure but was incorrectly implemented as the starting point of Step 1.
**Contradiction:** The Strategy section promised "maintaining a working build at each step" but Step 1 as written immediately broke the build.
**Symptoms:**
- `cargo nextest run` shows "0 tests to run" after Step 1
- Root `src/` code is orphaned (no package compiles it)
- Empty crate skeletons in `crates/` don't help
**Current State (as of audit):**
- Git status shows Step 1 was partially executed with the WRONG approach
- Root Cargo.toml is a virtual workspace (no `[package]`)
- `crates/` exist with empty skeletons
- All code still in `src/` but orphaned
- **ACTION REQUIRED:** Revert to pre-Step-1 state and re-execute with corrected plan
**Resolution Applied:**
1. Updated [D07] to clarify it describes the END STATE, not the starting point
2. Rewrote Step 1 to use a **hybrid workspace** (both `[workspace]` AND `[package]` sections)
3. Updated all Step 2 substeps to require `cargo nextest run` at each checkpoint
4. Rewrote Step 6.1 to handle the final conversion from hybrid to virtual workspace
5. Added critical warnings throughout to prevent this mistake
6. Added Milestone M00 to verify hybrid workspace is established correctly
7. Added test count verification (639 tests) at all milestones
**Key Insight:** Incremental migration requires maintaining the existing compilation path until the new path is ready. You cannot delete the old structure until the new structure can build everything.
**Rollback Instructions (if Step 1 was already executed incorrectly):**
```bash
# Revert to state before Step 1
git checkout HEAD~1 -- Cargo.toml
git checkout HEAD~1 -- Cargo.lock
rm -rf crates/
# Verify tests pass again
cargo nextest run --workspace # Should show 639 tests
```
---
## Phase Overview {#phase-overview}
### Context {#context}
Tugtool is currently structured as a single crate with all functionality in `src/`. As the project grows to support multiple languages (Python now, Rust planned), this monolithic structure creates challenges:
1. **Compilation time**: Any change recompiles everything
2. **Coupling**: Language-specific code can accidentally depend on other language modules
3. **Feature management**: No clean way to build without certain language support
4. **Contributor friction**: New language support requires understanding the entire codebase
A workspace structure with separate crates addresses all these concerns while maintaining the existing API surface.
### Strategy {#strategy}
- **Incremental migration**: Move code in phases, maintaining a working build at each step
- **Hybrid workspace during migration**: Root Cargo.toml has BOTH `[workspace]` AND `[package]` sections until Step 6
- **Core-first approach**: Extract the shared infrastructure first (`tugtool-core`), then build language crates on top
- **Preserve public API**: The `tugtool` crate re-exports everything users currently depend on
- **Feature flags for languages**: Each language crate is an optional dependency, controlled by features
- **Test migration alongside code**: Move tests with their corresponding modules to maintain coverage
- **No functional changes**: This is purely a structural refactor; behavior remains identical
- **Virtual workspace as END STATE**: Convert to virtual workspace only in Step 6 after all code is migrated
> **CRITICAL INVARIANT**: `cargo nextest run --workspace` must pass at every checkpoint. If tests fail after a step, do NOT proceed - fix the issue first. **Always use `--workspace`** to ensure tests in all crates are included - without it, tests in newly created crates won't run!
### Stakeholders / Primary Customers {#stakeholders}
1. Tugtool developers contributing new language support
2. Users who want minimal builds (core + specific languages only)
3. CI/CD pipelines benefiting from parallel compilation
### Success Criteria (Measurable) {#success-criteria}
- All existing tests pass (`cargo nextest run --workspace`)
- Clean incremental builds after touching only language-specific code
- `cargo build --no-default-features` produces a working binary (core only)
- `cargo build --features python` includes Python support
- No changes to CLI interface or JSON output schemas
- Build time improvement measurable via `cargo build --timings`
### Scope {#scope}
1. Create workspace structure with `crates/` directory
2. Extract `tugtool-core` crate (shared infrastructure)
3. Extract `tugtool-python` crate (Python language support)
4. Create placeholder `tugtool-rust` crate (future)
5. Refactor main `tugtool` crate to compose the above
6. Add feature flags for language inclusion/exclusion
### Non-goals (Explicitly out of scope) {#non-goals}
- Adding new functionality or refactoring operations
- Changing any public API signatures
- Implementing Rust language support (placeholder only)
- Breaking changes to CLI or JSON output
- Changing the `.tug/` session directory structure
### Dependencies / Prerequisites {#dependencies}
- All existing tests must pass before starting
- Understanding of current module interdependencies (analyzed below)
### Constraints {#constraints}
- Must maintain backwards compatibility with existing `cargo install tugtool` (from crates.io)
- Feature names must be stable for downstream users
- Workspace must work with existing CI configuration
- CI/scripts assuming root is a package must be updated (virtual workspace has no root package)
- Local development install changes from `cargo install --path .` to `cargo install --path crates/tugtool`
### Assumptions {#assumptions}
- Cargo workspace member ordering allows parallel compilation
- No circular dependencies exist between proposed crate boundaries
- All integration tests can run against the composed binary
---
## Open Questions (MUST RESOLVE OR EXPLICITLY DEFER) {#open-questions}
### [Q01] MCP server crate placement (DECIDED) {#q01-mcp-placement}
**Question:** Should MCP server code live in `tugtool-core` or remain in the main `tugtool` crate?
**Why it matters:** MCP depends on `rmcp` which is a heavy dependency. Placing it in core means core carries that weight even for non-MCP builds.
**Options:**
- Keep MCP in main `tugtool` crate (current plan)
- Create separate `tugtool-mcp` crate
- Include in `tugtool-core` behind a feature flag
**Plan to resolve:** Start with MCP in main crate; evaluate if extraction needed based on build times.
**Resolution:** DECIDED - MCP stays in main `tugtool` crate, controlled by existing `mcp` feature flag.
### [Q02] Test organization strategy (OPEN) {#q02-test-organization}
**Question:** Should integration tests remain in the workspace root or move to individual crates?
**Why it matters:** Integration tests that exercise the full stack need access to all crates. Moving them complicates the test setup.
**Options:**
- Keep all integration tests in `tests/` at workspace root
- Move unit tests to crates, keep integration tests at root
- Each crate has its own `tests/` directory
**Plan to resolve:** Evaluate during Step 2; document chosen approach.
**Resolution:** OPEN - Will decide during implementation.
---
## Risks and Mitigations {#risks}
| Risk | Impact | Likelihood | Mitigation | Trigger to revisit |
|------|--------|------------|------------|--------------------|
| Circular dependencies discovered | high | low | Analyze deps before moving; refactor if found | Build fails during migration |
| Test coverage gaps after migration | med | med | Run coverage before/after; diff reports | Coverage drops >1% |
| Build time regression | low | low | Measure with `--timings`; revert if slower | Build time increases |
**Risk R01: Hidden coupling in current code** {#r01-hidden-coupling}
- **Risk:** Unexpected dependencies between modules may prevent clean separation
- **Mitigation:**
- Analyze `use` statements before migration
- Create abstraction traits if coupling found
- Document any necessary restructuring
- **Residual risk:** Some coupling may require interface changes
---
## 2.0.0 Design Decisions {#design-decisions}
### [D01] Workspace structure with crates/ directory (DECIDED) {#d01-workspace-structure}
**Decision:** Use a `crates/` directory to hold all workspace members.
**Rationale:**
- Clean separation from workspace root files (Cargo.toml, README, etc.)
- Follows common Rust workspace conventions (rustc, cargo, ripgrep)
- Easy to glob for CI/tooling (`crates/*/Cargo.toml`)
**Implications:**
- Main binary moves from `src/` to `crates/tugtool/src/`
- Workspace Cargo.toml at root defines members
### [D02] Core crate contains shared infrastructure (DECIDED) {#d02-core-crate}
**Decision:** `tugtool-core` contains all language-agnostic infrastructure: session, workspace, patch, sandbox, output, error, facts, text, diff, util.
**Rationale:**
- These modules have no language-specific dependencies
- Forms the stable foundation for all language adapters
- Smaller core = faster compilation for language-specific changes
**Implications:**
- Core has no feature flags for languages
- Language crates depend on core, not vice versa
- `facts/` module provides language-agnostic symbol/reference types
### [D03] Language crates are optional dependencies (DECIDED) {#d03-optional-languages}
**Decision:** Each language crate (`tugtool-python`, `tugtool-rust`) is an optional dependency of the main `tugtool` crate, controlled by feature flags.
**Rationale:**
- Users can build minimal binaries
- Clear compilation boundaries
- Each language can have isolated dependencies
**Implications:**
- Default features include all supported languages
- Feature names: `python`, `rust` (short, clear)
- Conditional compilation in CLI dispatch code
### [D04] Main crate composes and re-exports (DECIDED) {#d04-main-crate}
**Decision:** The `tugtool` crate contains: main.rs, cli.rs, mcp.rs, and re-exports from core/language crates.
**Rationale:**
- Single binary entry point
- CLI logic dispatches to language crates based on file types
- MCP server stays here (depends on multiple languages)
**Implications:**
- `tugtool` crate has `[dependencies]` on core and language crates
- Public API surface maintained via `pub use` re-exports
- Version numbers stay synchronized
### [D05] Testcmd module stays in main crate (DECIDED) {#d05-testcmd}
**Decision:** `testcmd.rs` (test command resolution) remains in the main `tugtool` crate.
**Rationale:**
- Test command resolution may need to know about multiple languages
- It's primarily used by CLI and MCP, not by core or language crates
**Implications:**
- testcmd can import from language crates if needed
- Future: may spawn language-specific test runners
### [D06] Synchronized versions across workspace (DECIDED) {#d06-versions}
**Decision:** All crates in the workspace share the same version number, maintained in workspace Cargo.toml.
**Rationale:**
- Simplifies release process
- Clear compatibility guarantees
- Workspace inheritance makes this easy
**Implications:**
- Use `version.workspace = true` in member Cargo.toml files
- Bump all versions together on release
### [D07] Virtual workspace (no root package) - END STATE (DECIDED) {#d07-virtual-workspace}
**Decision:** The **final** root `Cargo.toml` is a **virtual workspace** with no `[package]` section. All crates live in `crates/`.
**CRITICAL: Migration Path**
This is the **end state**, not the starting point. The migration MUST use a **hybrid workspace** approach:
1. **During migration:** Root Cargo.toml has BOTH `[workspace]` AND `[package]` sections. This keeps `src/` compiled and all tests running throughout migration.
2. **After migration complete:** Remove `[package]` section from root, making it a pure virtual workspace.
Converting to virtual workspace **before** migrating code would orphan `src/` and break all tests.
**Rationale:**
- Matches Rust ecosystem conventions (rustc, ripgrep, cargo itself)
- Clean separation between workspace metadata and crate code
- Avoids confusion about "which crate am I building?"
**Implications:**
- `cargo build` from root builds all crates (or default members)
- `cargo install tugtool` works from crates.io (publishes from `crates/tugtool`)
- Local install requires: `cargo install --path crates/tugtool`
- CI scripts must be updated if they assume root is a package
### [D08] Naming and packaging contract (DECIDED) {#d08-naming-contract}
**Decision:** Maintain current naming for compatibility.
| Item | Value |
|------|-------|
| Package name | `tugtool` |
| Binary name | `tug` |
| Library name | `tugtool` |
| crates.io install | `cargo install tugtool` |
| Local dev install | `cargo install --path crates/tugtool` |
| Library usage | `use tugtool::*` (unchanged) |
**Rationale:**
- Users expect `cargo install tugtool` to continue working
- Binary name `tug` is already established
- Library re-exports maintain API compatibility
**Implications:**
- `crates/tugtool/Cargo.toml` publishes as `tugtool` on crates.io
- README and docs must clarify local vs crates.io install paths
### [D09] Feature-gated CLI behavior (DECIDED) {#d09-feature-gated-cli}
**Decision:** When a language feature is not compiled in, language-specific commands fail gracefully with a clear error message.
**Core-only build (`--no-default-features`) must support:**
- `tug --help`, `tug --version`
- `tug snapshot` (file scanning is language-agnostic)
- `tug session status`
**Language commands without the feature:**
```
$ tug run rename-symbol --at foo.py:1:1 --to bar
error: Python support not compiled in
To enable: cargo install tugtool --features python
```
Exit code: 2 (invalid arguments / unsupported operation)
**Rationale:**
- Users get actionable feedback instead of cryptic errors
- Core functionality remains useful for inspection/snapshot workflows
- Clear path to enable missing features
**Implications:**
- CLI dispatch code must check feature availability
- Error messages must include remediation instructions
- Exit code 2 for "feature not available" aligns with existing error codes
### [D10] MCP decoupled from language features (DECIDED) {#d10-mcp-decoupling}
**Decision:** The `mcp` feature is independent of language features. MCP server starts regardless of which languages are compiled in; individual tools check feature availability at runtime.
**Behavior:**
- `tug_snapshot` → always works
- `tug_rename_symbol` → returns error "Python support not compiled" if `!cfg!(feature = "python")`
**Rationale:**
- MCP server is useful even with partial language support
- Allows agents to discover available capabilities
- Simpler feature matrix (no `mcp-python` combo features)
**Implications:**
- MCP tool implementations must have feature guards
- Tool list/schema should indicate which tools are available
- Default features still include both `python` and `mcp`
### [D11] API surface compile-time guard (DECIDED) {#d11-api-surface-guard}
**Decision:** Add `tests/api_surface.rs` that imports all public types, serving as a compile-time contract for the public API. The test must be **feature-aware** to handle conditional re-exports.
**Implementation:**
```rust
//! Compile-only test to verify public API surface.
//! If this file fails to compile, the public API has regressed.
//!
//! Run with: cargo test -p tugtool --features full -- api_surface
use tugtool::{
// Core types (always available)
patch::{Span, FileId, Edit, PatchSet, ContentHash, /* ... */},
facts::{FactsStore, Symbol, SymbolKind, ReferenceKind, /* ... */},
error::TugError,
output::{Location, ReferenceInfo, SymbolInfo},
session::Session,
workspace::WorkspaceSnapshot,
// ... exhaustive list of core types
};
// Feature-gated re-exports
#[cfg(feature = "python")]
use tugtool::python;
#[cfg(feature = "rust")]
use tugtool::rust;
#[test]
fn api_surface_compiles() {
// This test exists only to verify imports compile.
// If you're here because this test broke, you may have
// accidentally removed a public re-export.
}
```
**Rationale:**
- Catches accidental API breakage during refactoring
- Low maintenance cost (just a list of imports)
- Fails fast in CI if re-exports are missing
- Feature-aware structure prevents false failures on minimal builds
**Implications:**
- Must be created before migration begins (baseline)
- Must be updated when intentionally adding/removing public types
- Part of phase exit criteria
- **Must be tested with `--features full`** to validate all re-exports
---
## Deep Dives {#deep-dives}
### Current Module Dependency Analysis {#module-deps}
Analysis of `use` statements in the current codebase reveals the following dependency graph:
**Diagram Diag01: Current Module Dependencies** {#diag01-module-deps}
```
+-------------+
| main.rs |
+------+------+
|
+------v------+
| cli.rs |<------------+
+------+------+ |
| |
+-----------------+---------------+ |
| | | |
+----v----+ +-----v-----+ +-----v----+--+
| mcp.rs | | python/ | | session |
+----+----+ +-----+-----+ +-----+-------+
| | |
| +-----v-----+ +-----v-------+
| | analyzer | | workspace |
| | worker | +-----+-------+
| | ops/ | |
| +-----+-----+ |
| | |
+----v-----------------v---------------v------+
| CORE LAYER |
| +--------+ +--------+ +--------+ |
| | patch | | facts | |sandbox | |
| +--------+ +--------+ +--------+ |
| +--------+ +--------+ +--------+ |
| | output | | error | | text | |
| +--------+ +--------+ +--------+ |
| +--------+ +--------+ |
| | diff | | util | |
| +--------+ +--------+ |
+---------------------------------------------+
```
**Key observations:**
1. `patch.rs` is the foundation - used by facts, sandbox, output, diff, text, python
2. `facts/` depends only on patch (for Span, FileId, ContentHash)
3. `sandbox.rs` depends on patch and workspace
4. `output.rs` depends on patch (for Span) and facts (for SymbolKind)
5. `python/` depends on facts, patch, output, text, session, diff, util
6. `mcp.rs` depends on cli, error, output (and indirectly on python via cli)
7. `session.rs` depends on workspace
8. No circular dependencies detected
### Proposed Crate Boundaries {#crate-boundaries}
**Table T01: Module to Crate Mapping** {#t01-module-mapping}
| Current Module | Target Crate | Rationale |
|---------------|--------------|-----------|
| `patch.rs` | tugtool-core | Foundation types, no deps |
| `facts/mod.rs` | tugtool-core | Language-agnostic symbol model |
| `error.rs` | tugtool-core | Shared error types |
| `output.rs` | tugtool-core | Shared JSON output types |
| `text.rs` | tugtool-core | Text utilities |
| `diff.rs` | tugtool-core | Diff generation |
| `util.rs` | tugtool-core | General utilities |
| `workspace.rs` | tugtool-core | Workspace snapshots |
| `sandbox.rs` | tugtool-core | Sandboxed operations |
| `session.rs` | tugtool-core | Session management |
| `python/` (all) | tugtool-python | Python language support |
| `rust/mod.rs` | tugtool-rust | Rust placeholder |
| `main.rs` | tugtool | Binary entry point |
| `cli.rs` | tugtool | CLI implementation |
| `mcp.rs` | tugtool | MCP server |
| `testcmd.rs` | tugtool | Test command resolution |
| `lib.rs` | tugtool | Re-exports |
### Target Directory Structure {#target-structure}
**List L01: Final Directory Layout** {#l01-directory-layout}
```
tugtool/
+-- Cargo.toml # workspace root
+-- Cargo.lock
+-- CLAUDE.md
+-- README.md
+-- crates/
| +-- tugtool/ # main binary crate
| | +-- Cargo.toml
| | +-- src/
| | +-- main.rs # CLI entry point
| | +-- lib.rs # re-exports for library usage
| | +-- cli.rs # CLI command implementations
| | +-- mcp.rs # MCP server
| | +-- testcmd.rs # test command resolution
| |
| +-- tugtool-core/ # shared infrastructure
| | +-- Cargo.toml
| | +-- src/
| | +-- lib.rs # module exports
| | +-- patch.rs # Patch IR
| | +-- error.rs # TugError
| | +-- output.rs # JSON output types
| | +-- session.rs # Session management
| | +-- workspace.rs # Workspace snapshots
| | +-- sandbox.rs # Sandboxed operations
| | +-- text.rs # Text utilities
| | +-- diff.rs # Diff generation
| | +-- util.rs # General utilities
| | +-- facts/
| | +-- mod.rs # Symbol/reference model
| |
| +-- tugtool-python/ # Python language support
| | +-- Cargo.toml
| | +-- src/
| | +-- lib.rs # module exports (replaces mod.rs)
| | +-- analyzer.rs
| | +-- bootstrap.rs
| | +-- dynamic.rs
| | +-- env.rs # Python environment resolution
| | +-- files.rs
| | +-- libcst_worker.py # Embedded Python worker script
| | +-- lookup.rs
| | +-- test_helpers.rs
| | +-- type_tracker.rs
| | +-- validation.rs
| | +-- verification.rs
| | +-- worker.rs
| | +-- ops/
| | +-- mod.rs
| | +-- rename.rs
| |
| +-- tugtool-rust/ # Rust language support (placeholder)
| +-- Cargo.toml
| +-- src/
| +-- lib.rs # placeholder
|
+-- tests/ # workspace-level integration tests
| +-- integration/
+-- .tug/ # session directory (unchanged)
+-- plans/ # planning documents
```
### Feature Flag Design {#feature-flags}
**Table T02: Feature Flags** {#t02-feature-flags}
| Feature | Crate | Description | Dependencies |
|---------|-------|-------------|--------------|
| `default` | tugtool | Full build | `python`, `mcp` |
| `python` | tugtool | Python support | tugtool-python |
| `rust` | tugtool | Rust support (future) | tugtool-rust |
| `mcp` | tugtool | MCP server | rmcp, schemars |
| `full` | tugtool | All languages + MCP | `python`, `rust`, `mcp` |
**Spec S01: Feature Flag Usage** {#s01-feature-flags}
```toml
# crates/tugtool/Cargo.toml
[features]
default = ["python", "mcp"]
python = ["dep:tugtool-python"]
rust = ["dep:tugtool-rust"]
mcp = ["dep:rmcp", "dep:schemars"]
full = ["python", "rust", "mcp"]
[dependencies]
tugtool-core = { path = "../tugtool-core" }
tugtool-python = { path = "../tugtool-python", optional = true }
tugtool-rust = { path = "../tugtool-rust", optional = true }
# MCP dependencies (optional) - versions must match current Cargo.toml
rmcp = { version = "...", features = ["server", "transport-io"], optional = true }
schemars = { version = "...", optional = true }
```
**Note:** All dependency versions in this plan are illustrative. During implementation, use the exact versions from the current `Cargo.toml` to avoid version conflicts.
### Dependency Flow {#dependency-flow}
**Diagram Diag02: Crate Dependency Graph** {#diag02-crate-deps}
```
+---------------------------------------+
| tugtool |
| (main binary, CLI, MCP) |
+-------------------+-------------------+
|
+-------------+-------------+
| | |
v v v
+-------------+ +-------------+ +-------------+
|tugtool-python| |tugtool-rust | | (MCP deps) |
| (optional) | | (optional) | | (optional) |
+------+------+ +------+------+ +-------------+
| |
+-------+-------+
|
v
+-------------------+
| tugtool-core |
| (always present) |
+-------------------+
|
v
+-------------------+
| External crates |
| (serde, sha2, etc)|
+-------------------+
```
---
## 2.0.1 Specification {#specification}
### 2.0.1.1 Inputs and Outputs {#inputs-outputs}
**Inputs:**
- Current single-crate tugtool source code
- Existing Cargo.toml configuration
**Outputs:**
- Cargo workspace with 4 member crates
- Updated CLAUDE.md with new structure documentation
- All tests passing
**Key invariants:**
- Public API surface unchanged (same re-exports from `tugtool`)
- CLI behavior identical
- JSON output schemas unchanged
### 2.0.1.2 Terminology {#terminology}
- **Workspace root**: The top-level `tugtool/` directory containing `Cargo.toml`
- **Member crate**: Each crate in `crates/` directory
- **Core crate**: `tugtool-core`, the shared infrastructure
- **Language crate**: `tugtool-python`, `tugtool-rust`, etc.
- **Main crate**: `tugtool`, the binary and re-export crate
### 2.0.1.3 Public API Surface {#public-api}
**Spec S02: Re-exports from tugtool crate** {#s02-reexports}
The main `tugtool` crate must re-export all types currently accessible via `tugtool::*`:
```rust
// crates/tugtool/src/lib.rs
// Re-export core types
pub use tugtool_core::{
// patch module
patch::{
Anchor, AnchorResolution, ApplyContext, ApplyResult, Conflict,
ContentHash, Edit, EditKind, EditLabels, FileId, MaterializedPatch,
OutputEdit, PatchSet, Precondition, Span, WorkspaceSnapshotId,
},
// facts module
facts::{
FactsStore, FileEntry, ImportEntry, ImportId, Language, ModuleEntry,
ModuleId, ModuleKind, ReferenceEntry, ReferenceId, ReferenceKind,
ScopeEntry, ScopeId, ScopeKind, Symbol, SymbolId, SymbolKind,
},
// other modules
error::TugError,
output::{Location, ReferenceInfo, SymbolInfo},
session::Session,
workspace::WorkspaceSnapshot,
sandbox::{SandboxConfig, SandboxHandle, VerificationResult},
text, diff, util,
};
// Re-export language modules (conditional)
#[cfg(feature = "python")]
pub use tugtool_python as python;
#[cfg(feature = "rust")]
pub use tugtool_rust as rust;
// CLI and MCP are internal (not re-exported)
```
---
## 2.0.2 Symbol Inventory {#symbol-inventory}
### 2.0.2.1 New crates {#new-crates}
| Crate | Purpose |
|-------|---------|
| `tugtool-core` | Shared infrastructure: patch, facts, session, workspace, sandbox, output, error, text, diff, util |
| `tugtool-python` | Python language support: analyzer, worker, ops |
| `tugtool-rust` | Rust language support (placeholder) |
### 2.0.2.2 New files {#new-files}
| File | Purpose |
|------|---------|
| `Cargo.toml` (root) | Workspace definition |
| `crates/tugtool/Cargo.toml` | Main binary crate manifest |
| `crates/tugtool-core/Cargo.toml` | Core crate manifest |
| `crates/tugtool-python/Cargo.toml` | Python crate manifest |
| `crates/tugtool-rust/Cargo.toml` | Rust crate manifest |
| `crates/*/src/lib.rs` | Module root for each crate |
### 2.0.2.3 Moved files {#moved-files}
**Table T03: File Movement Map** {#t03-file-moves}
| Current Location | New Location |
|-----------------|--------------|
| `src/patch.rs` | `crates/tugtool-core/src/patch.rs` |
| `src/facts/mod.rs` | `crates/tugtool-core/src/facts/mod.rs` |
| `src/error.rs` | `crates/tugtool-core/src/error.rs` |
| `src/output.rs` | `crates/tugtool-core/src/output.rs` |
| `src/session.rs` | `crates/tugtool-core/src/session.rs` |
| `src/workspace.rs` | `crates/tugtool-core/src/workspace.rs` |
| `src/sandbox.rs` | `crates/tugtool-core/src/sandbox.rs` |
| `src/text.rs` | `crates/tugtool-core/src/text.rs` |
| `src/diff.rs` | `crates/tugtool-core/src/diff.rs` |
| `src/util.rs` | `crates/tugtool-core/src/util.rs` |
| `src/python/*` | `crates/tugtool-python/src/*` |
| `src/rust/mod.rs` | `crates/tugtool-rust/src/lib.rs` |
| `src/main.rs` | `crates/tugtool/src/main.rs` |
| `src/cli.rs` | `crates/tugtool/src/cli.rs` |
| `src/mcp.rs` | `crates/tugtool/src/mcp.rs` |
| `src/testcmd.rs` | `crates/tugtool/src/testcmd.rs` |
| `src/lib.rs` | `crates/tugtool/src/lib.rs` |
---
## 2.0.3 Documentation Plan {#documentation-plan}
- [ ] Update CLAUDE.md with new directory structure
- [ ] Add workspace-level README explaining crate organization
- [ ] Document feature flags in main crate README
- [ ] Add inline documentation to each crate's lib.rs
---
## 2.0.4 Test Plan Concepts {#test-plan-concepts}
### Test Categories {#test-categories}
| Category | Purpose | When to use |
|----------|---------|-------------|
| **Unit** | Test individual functions in isolation | Each crate's internal logic |
| **Integration** | Test crates working together | Full rename/analyze flows |
| **Golden** | Compare output against snapshots | JSON schemas, patch output |
### Test Migration Strategy {#test-migration}
1. **Unit tests**: Move with their modules (embedded `#[cfg(test)]` modules stay in place)
2. **Integration tests**: Keep in workspace root `tests/` directory
3. **Golden tests**: Remain in current location, update paths as needed
### Verification Commands {#test-verification}
```bash
# Run all tests (from workspace root) - ALWAYS use --workspace during migration!
cargo nextest run --workspace
# Run only core tests
cargo nextest run -p tugtool-core
# Run only Python tests
cargo nextest run -p tugtool-python
# Run with specific features
cargo nextest run --workspace --no-default-features --features python
```
---
## 2.0.5 Execution Steps {#execution-steps}
### Step 0: Preparation and Baseline {#step-0}
**Commit:** `chore: establish baseline metrics and API surface test before workspace migration`
**References:** [D01] Workspace structure, [D11] API surface guard, (#strategy, #success-criteria)
**Artifacts:**
- Baseline test count and coverage
- Baseline build times via `cargo build --timings`
- Verification that all tests pass
- `tests/api_surface.rs` - compile-time API contract
**Tasks:**
- [x] Run `cargo nextest run --workspace` and record pass/fail counts
- [x] Run `cargo build --timings` and save HTML report
- [x] Run `cargo clippy` and fix any warnings
- [x] Create `tests/api_surface.rs` with imports of all current public types (see [D11])
- [x] Ensure clean git status
**API surface test template:**
```rust
//! Compile-only test to verify public API surface.
//! Run with: cargo test -p tugtool --features full -- api_surface
use tugtool::{
// Core types (always available)
patch::{Span, FileId, Edit, PatchSet, ContentHash, OutputEdit, /* ... */},
facts::{FactsStore, Symbol, SymbolKind, ReferenceKind, /* ... */},
error::TugError,
// ... exhaustive list of core types
};
// Feature-gated re-exports
#[cfg(feature = "python")]
use tugtool::python;
#[test]
fn api_surface_compiles() {
// Intentionally empty - this test verifies imports compile
}
```
**Tests:**
- [x] All existing tests pass
- [x] `tests/api_surface.rs` compiles with `--features full`
**Checkpoint:**
- [x] `cargo nextest run --workspace` - all tests pass
- [x] `cargo clippy -- -D warnings` - no warnings
- [x] `cargo fmt --check` - no formatting issues
- [x] `tests/api_surface.rs` exists and compiles with `cargo test -p tugtool --features full -- api_surface`
**Rollback:** N/A (no changes yet)
**Commit after all checkpoints pass.**
---
### Step 1: Create Hybrid Workspace Structure {#step-1}
**Commit:** `refactor: create cargo workspace structure with crates directory`
**References:** [D01] Workspace structure, [D07] Virtual workspace (end state), Table T01, List L01, (#target-structure)
**CRITICAL: Hybrid Workspace Approach**
This step creates a **hybrid workspace** where the root is BOTH a workspace AND a package.
This keeps the existing `src/` code compiling and all 639 tests running throughout migration.
**DO NOT** convert to a virtual workspace (removing `[package]`) until Step 6 after all code is migrated.
**Artifacts:**
- `crates/` directory with empty crate skeletons
- **Hybrid** Workspace Cargo.toml at root (has BOTH `[workspace]` AND `[package]` sections)
- Each crate has minimal Cargo.toml and empty lib.rs
- Existing `src/` code continues to compile and run tests
**Tasks:**
- [x] Create `crates/` directory
- [x] Create `crates/tugtool/` with minimal Cargo.toml (empty, for future main crate)
- [x] Create `crates/tugtool-core/` with minimal Cargo.toml
- [x] Create `crates/tugtool-python/` with minimal Cargo.toml
- [x] Create `crates/tugtool-rust/` with minimal Cargo.toml
- [x] Add `[workspace]` section to root Cargo.toml **WHILE KEEPING THE EXISTING `[package]` SECTION**
- [x] Add workspace-level settings (resolver, lints, profile)
**Cargo.toml structure (HYBRID - note both [workspace] AND [package]):**
```toml
# Root Cargo.toml - HYBRID WORKSPACE
# Has both [workspace] and [package] so src/ keeps compiling
[workspace]
resolver = "2"
members = [
"crates/tugtool-core",
"crates/tugtool-python",
"crates/tugtool-rust",
# NOTE: Do NOT include "crates/tugtool" yet - root IS the tugtool package during migration
]
[workspace.package]
version = "0.1.0"
edition = "2021"
authors = ["Ken Kocienda"]
license = "MIT"
repository = "https://github.com/tugtool/tugtool"
[workspace.lints.rust]
warnings = "deny"
[workspace.lints.clippy]
all = { level = "deny", priority = -1 }
collapsible_if = "allow"
# KEEP THE EXISTING [package] SECTION - this is what makes src/ compile!
[package]
name = "tugtool"
version.workspace = true
edition.workspace = true
authors.workspace = true
license.workspace = true
# ... keep all existing package configuration ...
# KEEP THE EXISTING [dependencies] - required for src/ to compile
[dependencies]
# ... all existing dependencies stay here ...
# KEEP THE EXISTING [[bin]], [features], etc.
[profile.release]
lto = "thin"
debug = "line-tables-only"
[profile.dev]
debug = 1
incremental = true
codegen-units = 256
lto = false
panic = "unwind"
[profile.test]
debug = 1
incremental = true
lto = false
```
**Tests:**
- [x] `cargo nextest run --workspace` - ALL 639 TESTS STILL PASS (critical!)
- [x] `cargo check -p tugtool-core` succeeds (empty crate compiles)
**Checkpoint:**
- [x] `cargo nextest run --workspace` - **all existing tests pass** (this is the critical checkpoint!)
- [x] `cargo clippy -- -D warnings` - no warnings
- [x] All four crate directories exist with Cargo.toml and src/lib.rs
- [x] Root Cargo.toml has both `[workspace]` AND `[package]` sections
**Rollback:**
- Remove `crates/` directory
- Restore original Cargo.toml from git
**Commit after all checkpoints pass.**
---
> **WARNING: Common Mistake**
>
> Do NOT remove the `[package]` section from root Cargo.toml during this step!
> Doing so creates a "virtual workspace" which orphans `src/` and breaks all tests.
> The conversion to virtual workspace happens in Step 6 AFTER all code is migrated.
---
### Step 2: Extract tugtool-core {#step-2}
This step is large and broken into substeps.
**CRITICAL: Two-Phase Migration Per Module**
For each module migration, you must:
1. **Copy** the module to the target crate
2. **Wire up imports** in the source crate to use the new location
3. **Verify tests pass** before proceeding
The root package (`src/lib.rs`) must be updated to re-export from `tugtool-core` so that:
- External code using `tugtool::patch::*` continues to work
- Internal code in `src/` can gradually migrate to `use tugtool_core::*`
After Step 2 completes:
- `tugtool-core` contains the migrated modules
- Root `src/lib.rs` re-exports from `tugtool-core`
- Original files in `src/` may be deleted OR kept as thin re-export wrappers (decide per substep)
- All tests continue to pass
#### Step 2.1: Move patch.rs and text.rs to tugtool-core {#step-2-1}
**Commit:** `refactor(core): move patch and text modules to tugtool-core`
**References:** [D02] Core crate, Table T03, Diagram Diag01, (#module-deps)
> **Why these modules move together:**
>
> `patch.rs` and `text.rs` have a mutual dependency that requires them to migrate as a unit:
> - `patch.rs` imports `crate::text::byte_offset_to_position` (used in `materialize()`)
> - `text.rs` imports `crate::patch::Span` (used in span utilities)
>
> Moving them separately would create either a broken build or messy inter-crate dependencies.
> By moving both together, the `crate::` imports resolve correctly within `tugtool-core`.
**Artifacts:**
- `crates/tugtool-core/src/patch.rs` with full implementation
- `crates/tugtool-core/src/text.rs` with full implementation
- Updated `crates/tugtool-core/Cargo.toml` with required dependencies
- Updated `crates/tugtool-core/src/lib.rs` with module exports
- Updated root `Cargo.toml` with `tugtool-core` dependency
- Updated root `src/lib.rs` to re-export from tugtool-core
**Tasks:**
- [x] Add `tugtool-core` as a dependency in root `Cargo.toml`:
```toml
[dependencies]
tugtool-core = { path = "crates/tugtool-core" }
```
- [x] Add dependencies to `crates/tugtool-core/Cargo.toml`: `serde`, `sha2`, `hex`
- [x] Copy `src/patch.rs` to `crates/tugtool-core/src/patch.rs`
- [x] Copy `src/text.rs` to `crates/tugtool-core/src/text.rs`
- [x] Update `crates/tugtool-core/src/lib.rs`:
```rust
pub mod patch;
pub mod text;
```
- [x] Verify `crate::` imports in both files resolve correctly (no changes needed - they now refer to tugtool-core)
- [x] Update root `src/lib.rs` to re-export:
```rust
pub use tugtool_core::patch;
pub use tugtool_core::text;
```
- [x] Delete `src/patch.rs` and `src/text.rs`
- [x] Verify BOTH core crate AND root package compile
- [x] Verify all tests pass
**Dependencies for tugtool-core/Cargo.toml:**
```toml
[dependencies]
serde = { version = "1.0", features = ["derive"] }
sha2 = "0.10"
hex = "0.4"
[dev-dependencies]
serde_json = "1.0"
```
**Tests:**
- [x] `cargo check -p tugtool-core` succeeds
- [x] `cargo nextest run --workspace` - all 639 tests pass (critical! use --workspace to include crate tests)
**Checkpoint:**
- [x] `cargo check -p tugtool-core` compiles without errors
- [x] `cargo nextest run --workspace` - **all 639 tests still pass** (use --workspace!)
- [x] `use tugtool::patch::Span` still works (API compatibility)
- [x] `use tugtool::text::byte_offset_to_position` still works (API compatibility)
**Rollback:**
- `git checkout -- crates/tugtool-core/ src/patch.rs src/text.rs src/lib.rs Cargo.toml`
**Commit after all checkpoints pass.**
---
#### Step 2.2: Move util.rs, diff.rs to tugtool-core {#step-2-2}
**Commit:** `refactor(core): move util and diff modules to tugtool-core`
**References:** [D02] Core crate, Table T03
**Artifacts:**
- `crates/tugtool-core/src/util.rs`
- `crates/tugtool-core/src/diff.rs`
- Updated root `src/lib.rs` re-exports
**Tasks:**
- [x] Copy `src/util.rs` to `crates/tugtool-core/src/util.rs`
- [x] Copy `src/diff.rs` to `crates/tugtool-core/src/diff.rs`
- [x] Add `pub mod util; pub mod diff;` to core lib.rs
- [x] Update diff.rs imports to use `crate::patch::OutputEdit`
- [x] Update root `src/lib.rs` to re-export: `pub use tugtool_core::{util, diff};`
- [x] Delete or convert `src/util.rs` and `src/diff.rs` to re-export wrappers
- [x] Verify BOTH crates compile and all tests pass
**Tests:**
- [x] `cargo check -p tugtool-core`
- [x] `cargo nextest run --workspace` - all 639 tests pass
**Checkpoint:**
- [x] Core crate compiles
- [x] `cargo nextest run --workspace` - **all 639 tests still pass**
**Rollback:**
- `git checkout -- crates/tugtool-core/ src/util.rs src/diff.rs src/lib.rs`
**Commit after all checkpoints pass.**
---
#### Step 2.3: Move facts/ to tugtool-core {#step-2-3}
**Commit:** `refactor(core): move facts module to tugtool-core`
**References:** [D02] Core crate, Table T03, Diagram Diag01
**Artifacts:**
- `crates/tugtool-core/src/facts/mod.rs`
- Updated core lib.rs
- Updated root `src/lib.rs` re-exports
**Tasks:**
- [x] Copy `src/facts/mod.rs` to `crates/tugtool-core/src/facts/mod.rs`
- [x] Add `pub mod facts;` to core lib.rs
- [x] Update imports in core: `use crate::patch::{ContentHash, FileId, Span}`
- [x] Update root `src/lib.rs` to re-export: `pub use tugtool_core::facts;`
- [x] Delete or convert `src/facts/` to re-export wrapper
- [x] Verify BOTH crates compile and all tests pass
**Tests:**
- [x] `cargo check -p tugtool-core`
- [x] `cargo nextest run --workspace` - all 639 tests pass
**Checkpoint:**
- [x] Core crate compiles
- [x] `cargo nextest run --workspace` - **all 639 tests still pass**
**Rollback:**
- `git checkout -- crates/tugtool-core/ src/facts/ src/lib.rs`
**Commit after all checkpoints pass.**
---
#### Step 2.4: Move error.rs and output.rs to tugtool-core {#step-2-4}
**Commit:** `refactor(core): move error and output modules to tugtool-core`
**References:** [D02] Core crate, Table T03
**Artifacts:**
- `crates/tugtool-core/src/error.rs`
- `crates/tugtool-core/src/output.rs`
- Updated root `src/lib.rs` re-exports
**Tasks:**
- [x] Copy `src/error.rs` to `crates/tugtool-core/src/error.rs`
- [x] Copy `src/output.rs` to `crates/tugtool-core/src/output.rs`
- [x] Add `pub mod error; pub mod output;` to core lib.rs
- [x] Add `thiserror` to core dependencies
- [x] Update output.rs imports for patch and facts (use `crate::` for core-internal refs)
- [x] Update root `src/lib.rs` to re-export: `pub use tugtool_core::{error, output};`
- [x] Delete or convert `src/error.rs` and `src/output.rs` to re-export wrappers
- [x] Verify BOTH crates compile and all tests pass
**Note:** During implementation, we created a new `types.rs` module in tugtool-core to hold shared types (`Location`, `SymbolInfo`) used by both error and output modules, avoiding circular dependencies. The `error_bridges.rs` module was created in the root crate to hold Python-specific error conversions (`From<RenameError>`, `From<WorkerError>`, `From<SessionError>`) that depend on language-specific types.
**Core dependencies update:**
```toml
thiserror = "2.0"
```
**Tests:**
- [x] `cargo check -p tugtool-core`
- [x] `cargo nextest run --workspace` - all 647 tests pass
**Checkpoint:**
- [x] Core crate compiles
- [x] `cargo nextest run --workspace` - **all 647 tests pass**
**Rollback:**
- `git checkout -- crates/tugtool-core/ src/error.rs src/output.rs src/lib.rs`
**Commit after all checkpoints pass.**
---
#### Step 2.5: Move workspace.rs and session.rs to tugtool-core {#step-2-5}
**Commit:** `refactor(core): move workspace and session modules to tugtool-core`
**References:** [D02] Core crate, Table T03
**Artifacts:**
- `crates/tugtool-core/src/workspace.rs`
- `crates/tugtool-core/src/session.rs`
- Updated root `src/lib.rs` re-exports
**Tasks:**
- [x] Copy `src/workspace.rs` to `crates/tugtool-core/src/workspace.rs`
- [x] Copy `src/session.rs` to `crates/tugtool-core/src/session.rs`
- [x] Add `pub mod workspace; pub mod session;` to core lib.rs
- [x] Add dependencies: `walkdir`, `chrono`
- [x] Update imports for workspace and session modules (use `crate::` for core-internal refs)
- [x] Update root `src/lib.rs` to re-export: `pub use tugtool_core::{workspace, session};`
- [x] Delete or convert `src/workspace.rs` and `src/session.rs` to re-export wrappers
- [x] Verify BOTH crates compile and all tests pass
**Note:** During implementation, we also:
- Added `libc` as a unix-only dependency for process checking functions
- Added `tempfile` as a dev-dependency for tests
- Moved `impl From<SessionError> for TugError` from `src/error_bridges.rs` to `crates/tugtool-core/src/error.rs` since both types are now in the same crate
- Removed SessionError bridge tests from `error_bridges.rs`
**Core dependencies update:**
```toml
chrono = { version = "0.4", default-features = false, features = ["std"] }
walkdir = "2"
[target.'cfg(unix)'.dependencies]
libc = "0.2"
[dev-dependencies]
tempfile = "3"
```
**Tests:**
- [x] `cargo check -p tugtool-core`
- [x] `cargo nextest run --workspace` - all 643 tests pass
**Checkpoint:**
- [x] Core crate compiles
- [x] `cargo nextest run --workspace` - **all 643 tests still pass**
**Rollback:**
- `git checkout -- crates/tugtool-core/ src/workspace.rs src/session.rs src/lib.rs`
**Commit after all checkpoints pass.**
---
#### Step 2.6: Move sandbox.rs to tugtool-core {#step-2-6}
**Commit:** `refactor(core): move sandbox module to tugtool-core`
**References:** [D02] Core crate, Table T03
**Artifacts:**
- `crates/tugtool-core/src/sandbox.rs`
- Updated root `src/lib.rs` re-exports
**Tasks:**
- [x] Copy `src/sandbox.rs` to `crates/tugtool-core/src/sandbox.rs`
- [x] Add `pub mod sandbox;` to core lib.rs
- [x] Add dependencies: `tempfile`, `tracing`, `wait-timeout`
- [x] Add target-specific dependency: `libc` (unix) - already added in Step 2.5
- [x] Update imports for sandbox module (use `crate::` for core-internal refs) - no changes needed, already uses crate::patch and crate::workspace
- [x] Update root `src/lib.rs` to re-export: `pub use tugtool_core::sandbox;`
- [x] Delete or convert `src/sandbox.rs` to re-export wrapper
- [x] Verify BOTH crates compile and all tests pass
**Core dependencies update:**
```toml
tempfile = "3"
tracing = "0.1"
wait-timeout = "0.2"
[target.'cfg(unix)'.dependencies]
libc = "0.2"
```
**Tests:**
- [x] `cargo check -p tugtool-core`
- [x] `cargo nextest run --workspace` - all 643 tests pass
**Checkpoint:**
- [x] Core crate compiles
- [x] `cargo nextest run --workspace` - **all 643 tests still pass** (including sandbox tests)
**Note:** During implementation, we also:
- Fixed several rustdoc warnings that were treated as errors due to `-D warnings`:
- Escaped `[D05]` references in doc comments with `\[D05\]`
- Escaped `List[int]` in type documentation with `List\[int\]`
- Wrapped `<id>` in backticks to prevent HTML tag interpretation
- Wrapped `<name>` in backticks to prevent HTML tag interpretation
**Rollback:**
- `git checkout -- crates/tugtool-core/ src/sandbox.rs src/lib.rs`
**Commit after all checkpoints pass.**
---
#### Step 2 Summary {#step-2-summary}
After completing Steps 2.1-2.6, you will have:
- Complete `tugtool-core` crate with all shared infrastructure
- All core modules migrated: patch, facts, error, output, session, workspace, sandbox, text, diff, util
- Root `src/lib.rs` re-exports everything from `tugtool-core`
- Original module files in `src/` either deleted or converted to re-export wrappers
- **All 639 tests still passing** (critical!)
- Clean dependency boundaries
**Final Step 2 Checkpoint:**
- [x] `cargo nextest run --workspace` - **all 643 tests pass** (not just core tests!)
- [x] `cargo test -p tugtool-core` - core tests pass independently
- [x] `cargo clippy -p tugtool-core -- -D warnings` - no warnings
- [x] `cargo clippy -- -D warnings` - no warnings on root package
- [x] Core crate can be used as dependency (verify with `cargo doc -p tugtool-core`)
- [x] `tests/api_surface.rs` still compiles (API contract preserved)
---
### Step 3: Extract tugtool-python {#step-3}
#### Step 3.1: Create tugtool-python crate skeleton {#step-3-1}
**Commit:** `refactor(python): create tugtool-python crate with dependency on core`
**References:** [D03] Optional languages, Table T01, (#crate-boundaries)
**Artifacts:**
- `crates/tugtool-python/Cargo.toml` with core dependency
- Basic lib.rs structure
**Tasks:**
- [x] Configure Cargo.toml with tugtool-core dependency
- [x] Set up lib.rs module structure matching python/ layout
- [x] Verify crate compiles (empty modules)
**Python crate Cargo.toml:**
```toml
[package]
name = "tugtool-python"
version.workspace = true
edition.workspace = true
authors.workspace = true
license.workspace = true
[dependencies]
tugtool-core = { path = "../tugtool-core" }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
thiserror = "2.0"
tempfile = "3"
tracing = "0.1"
[lints]
workspace = true
```
**Checkpoint:**
- [x] `cargo check -p tugtool-python` succeeds
**Rollback:**
- `git checkout -- crates/tugtool-python/`
**Commit after all checkpoints pass.**
---
#### Step 3.2: Move Python modules to tugtool-python {#step-3-2}
**Commit:** `refactor(python): move all python modules to tugtool-python crate`
**References:** [D03] Optional languages, Table T03
**Artifacts:**
- All files from `src/python/` moved to `crates/tugtool-python/src/`
- Updated imports throughout
- Root `src/lib.rs` updated to re-export from `tugtool-python`
- Root `Cargo.toml` updated with `tugtool-python` dependency
**Tasks:**
- [ ] Add `tugtool-python` as a dependency in root `Cargo.toml`:
```toml
[dependencies]
tugtool-python = { path = "crates/tugtool-python" }
```
- [ ] Copy all files from `src/python/` to `crates/tugtool-python/src/`
- [ ] Update lib.rs in tugtool-python to export all public items
- [ ] Update imports: `use crate::` -> `use tugtool_core::`
- [ ] Fix any module path references
- [ ] Update root `src/lib.rs` to re-export: `pub use tugtool_python as python;`
- [ ] Delete `src/python/` directory (or convert mod.rs to re-export wrapper)
- [ ] Verify BOTH crates compile and all tests pass
**Import pattern changes:**
```rust
// Before (in python/ops/rename.rs)
use crate::facts::{FactsStore, ReferenceKind};
use crate::patch::{FileId, Span};
// After
use tugtool_core::facts::{FactsStore, ReferenceKind};
use tugtool_core::patch::{FileId, Span};
```
**Tests:**
- [ ] `cargo check -p tugtool-python`
- [ ] `cargo nextest run --workspace` - **all tests pass** (not just Python crate tests!)
**Checkpoint:**
- [ ] Python crate compiles
- [ ] `cargo nextest run --workspace` - **all tests still pass**
- [ ] `use tugtool::python::*` still works (API compatibility)
**Rollback:**
- `git checkout -- crates/tugtool-python/ src/python/ src/lib.rs Cargo.toml`
**Commit after all checkpoints pass.**
---
### Step 4: Create tugtool-rust placeholder {#step-4}
**Commit:** `refactor(rust): create tugtool-rust placeholder crate`
**References:** [D03] Optional languages, Table T01
**Artifacts:**
- `crates/tugtool-rust/Cargo.toml`
- `crates/tugtool-rust/src/lib.rs` with placeholder
**Tasks:**
- [x] Configure Cargo.toml with tugtool-core dependency
- [x] Create lib.rs with placeholder comment
- [x] Move `src/rust/mod.rs` content (if any) to lib.rs
- [x] Verify crate compiles
**Rust crate lib.rs:**
```rust
//! Rust language support for tugtool.
//!
//! This crate provides Rust-specific refactoring operations using rust-analyzer.
//!
//! **Status:** Placeholder - implementation planned for future phases.
use tugtool_core as _core;
/// Placeholder for Rust analyzer adapter.
pub struct RustAdapter;
impl RustAdapter {
/// Create a new Rust adapter (placeholder).
pub fn new() -> Self {
RustAdapter
}
}
impl Default for RustAdapter {
fn default() -> Self {
Self::new()
}
}
```
**Checkpoint:**
- [x] `cargo check -p tugtool-rust` succeeds
**Rollback:**
- `git checkout -- crates/tugtool-rust/`
**Commit after all checkpoints pass.**
---
### Step 5: Refactor main tugtool crate {#step-5}
**CRITICAL: Transitioning the Binary**
This step moves CLI/MCP code to `crates/tugtool/`. At this point:
- Core infrastructure is in `tugtool-core`
- Python support is in `tugtool-python`
- The root still has `src/main.rs`, `src/cli.rs`, etc.
After this step:
- `crates/tugtool/` becomes the main binary crate
- Root `src/` only has re-export lib.rs (will be removed in Step 6)
- All tests still pass
**Important:** During this step, we temporarily have TWO places that can build the `tug` binary (root and `crates/tugtool`). This is resolved in Step 6 when we convert to virtual workspace.
#### Step 5.1: Move CLI files to main crate {#step-5-1}
**Commit:** `refactor: move main, cli, mcp, testcmd to tugtool crate`
**References:** [D04] Main crate, [D05] Testcmd, Table T03
**Artifacts:**
- `crates/tugtool/src/main.rs`
- `crates/tugtool/src/cli.rs`
- `crates/tugtool/src/mcp.rs`
- `crates/tugtool/src/testcmd.rs`
- Updated `crates/tugtool/Cargo.toml` with all dependencies
**Tasks:**
- [ ] Copy `src/main.rs` to `crates/tugtool/src/main.rs`
- [ ] Copy `src/cli.rs` to `crates/tugtool/src/cli.rs`
- [ ] Copy `src/mcp.rs` to `crates/tugtool/src/mcp.rs`
- [ ] Copy `src/testcmd.rs` to `crates/tugtool/src/testcmd.rs`
- [ ] Update `crates/tugtool/Cargo.toml` with dependencies and features (see below)
- [ ] Update imports in all moved files to use `tugtool_core::` and `tugtool_python::`
- [ ] Verify `crates/tugtool` compiles independently
- [ ] Verify root package still compiles (tests still run against root)
**Main crate Cargo.toml:**
```toml
[package]
name = "tugtool"
version.workspace = true
edition.workspace = true
authors.workspace = true
license.workspace = true
description = "AI-native code transformation engine for verified, deterministic refactors"
repository.workspace = true
readme = "../../README.md"
keywords = ["refactoring", "code-transformation", "ai", "mcp", "cli"]
categories = ["development-tools", "command-line-utilities"]
[[bin]]
name = "tug"
path = "src/main.rs"
[lib]
name = "tugtool"
path = "src/lib.rs"
[dependencies]
tugtool-core = { path = "../tugtool-core" }
tugtool-python = { path = "../tugtool-python", optional = true }
tugtool-rust = { path = "../tugtool-rust", optional = true }
# CLI
clap = { version = "4", features = ["derive"] }
serde_json = "1.0"
tokio = { version = "1", features = ["full"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
# MCP (optional)
rmcp = { version = "0.12", features = ["server", "transport-io"], optional = true }
schemars = { version = "1", optional = true }
[features]
default = ["python", "mcp"]
python = ["dep:tugtool-python"]
rust = ["dep:tugtool-rust"]
mcp = ["dep:rmcp", "dep:schemars"]
full = ["python", "rust", "mcp"]
[lints]
workspace = true
```
**Tests:**
- [ ] `cargo check -p tugtool` (the crates/tugtool package)
- [ ] `cargo nextest run --workspace` - all tests still pass (against root package)
**Checkpoint:**
- [ ] `crates/tugtool` compiles: `cargo build -p tugtool`
- [ ] Root package still works: `cargo nextest run --workspace` - **all tests pass**
- [ ] Binary works from new location: `cargo run -p tugtool -- --help`
**Rollback:**
- `git checkout -- crates/tugtool/`
**Commit after all checkpoints pass.**
---
#### Step 5.2: Create lib.rs with re-exports {#step-5-2}
**Commit:** `refactor: add re-exports to tugtool lib.rs for API compatibility`
**References:** [D04] Main crate, Spec S02, (#public-api)
**Artifacts:**
- `crates/tugtool/src/lib.rs` with all re-exports
**Tasks:**
- [x] Create `crates/tugtool/src/lib.rs` with public re-exports from core
- [x] Add conditional re-exports for language crates
- [x] Add re-exports for cli, mcp, testcmd modules
- [x] Verify all previously-public types are accessible via `tugtool::*`
- [x] Update main.rs to use new module paths
**Checkpoint:**
- [x] `cargo check -p tugtool` (the crates/tugtool package)
- [x] `cargo doc -p tugtool` - documentation builds
- [x] `cargo nextest run --workspace` - **all tests still pass**
**Rollback:**
- `git checkout -- crates/tugtool/src/lib.rs`
**Commit after all checkpoints pass.**
---
#### Step 5.3: Update CLI imports and conditional compilation {#step-5-3}
**Commit:** `refactor: update CLI with conditional language support`
**References:** [D03] Optional languages, Table T02, Spec S01
**Artifacts:**
- Updated `cli.rs` with feature-gated language dispatch
- Updated `mcp.rs` with feature-gated tools
**Tasks:**
- [x] Add `#[cfg(feature = "python")]` guards to Python-specific CLI code
- [x] Add `#[cfg(feature = "rust")]` guards to Rust-specific CLI code
- [x] Update MCP tool registration with feature guards
- [x] Verify build with default features
- [x] Verify build with `--no-default-features`
**Conditional compilation pattern:**
```rust
// In cli.rs
#[cfg(feature = "python")]
use tugtool_python::ops::rename::PythonRenameOp;
pub fn run_rename(args: &RenameArgs) -> Result<(), TugError> {
match args.language {
#[cfg(feature = "python")]
Language::Python => {
// Python rename logic
}
#[cfg(feature = "rust")]
Language::Rust => {
// Rust rename logic (placeholder)
}
_ => {
return Err(TugError::unsupported_language(args.language));
}
}
}
```
**Tests:**
- [x] `cargo build -p tugtool` (default features)
- [x] `cargo build -p tugtool --no-default-features`
- [x] `cargo build -p tugtool --features python`
- [x] `cargo build -p tugtool --features mcp` (MCP without Python - verifies no accidental Python imports)
- [x] `cargo build -p tugtool --features full`
**Checkpoint:**
- [x] All feature combinations compile (including `--features mcp` alone)
- [x] `cargo run -p tugtool -- --help` works
- [x] MCP-only build has no Python dependencies (verify with `--features mcp` compile)
- Violation: any `use tugtool_python::` or dependency edge to `tugtool-python` without `#[cfg(feature = "python")]` guard
**Rollback:**
- `git checkout -- crates/tugtool/src/`
**Commit after all checkpoints pass.**
---
### Step 6: Clean up and finalize {#step-6}
**CRITICAL: This step converts from hybrid to virtual workspace**
At this point:
- All code has been migrated to `crates/`
- `crates/tugtool/` is the new main binary crate with all CLI/MCP code
- Root `src/` is no longer needed
- We can now safely convert to a virtual workspace
#### Step 6.1: Convert to virtual workspace and remove old src/ {#step-6-1}
**Commit:** `refactor: convert to virtual workspace, remove old src/`
**References:** [D07] Virtual workspace, Table T03, (#success-criteria)
**Artifacts:**
- Virtual workspace Cargo.toml (no `[package]` section)
- Old `src/` directory removed
- `crates/tugtool` added to workspace members
**Tasks:**
- [x] Add `"crates/tugtool"` to workspace members list
- [x] Remove `[package]` section from root Cargo.toml
- [x] Remove `[dependencies]` section from root Cargo.toml (dependencies are now in crates)
- [x] Remove `[[bin]]`, `[lib]`, `[features]` sections from root Cargo.toml
- [x] Delete `src/` directory entirely
- [x] Update `tests/` directory to use `crates/tugtool` as the test target (may need to move to `crates/tugtool/tests/`)
- [x] Update any hardcoded paths in tests
**Final root Cargo.toml (virtual workspace):**
```toml
[workspace]
resolver = "2"
members = [
"crates/tugtool", # NOW INCLUDED
"crates/tugtool-core",
"crates/tugtool-python",
"crates/tugtool-rust",
]
[workspace.package]
version = "0.1.0"
edition = "2021"
authors = ["Ken Kocienda"]
license = "MIT"
repository = "https://github.com/tugtool/tugtool"
[workspace.lints.rust]
warnings = "deny"
[workspace.lints.clippy]
all = { level = "deny", priority = -1 }
collapsible_if = "allow"
# NO [package] section - this is now a virtual workspace
# NO [dependencies] section - dependencies are in individual crates
[profile.release]
lto = "thin"
debug = "line-tables-only"
[profile.dev]
debug = 1
incremental = true
codegen-units = 256
lto = false
panic = "unwind"
[profile.test]
debug = 1
incremental = true
lto = false
```
**Checkpoint:**
- [x] `cargo build` succeeds from workspace root
- [x] `cargo nextest run --workspace` - all tests pass
- [x] `src/` directory no longer exists
- [x] Root Cargo.toml has NO `[package]` section
**Rollback:**
- `git checkout HEAD~1 -- src/ Cargo.toml`
**Commit after all checkpoints pass.**
---
#### Step 6.2: Update documentation and CI {#step-6-2}
**Commit:** `docs: update CLAUDE.md, README, and CI for workspace structure`
**References:** (#documentation-plan), [D07] Virtual workspace
**Artifacts:**
- Updated CLAUDE.md with new architecture section
- Updated README.md if needed
- Updated CI workflows for workspace commands
- Updated Justfile (if present)
**Tasks:**
- [x] Update CLAUDE.md Architecture section with new structure
- [x] Update build commands to reference workspace
- [x] Document feature flags
- [x] Update any path references
- [x] Update `.github/workflows/*.yml` to use `-p tugtool` or `--workspace` as appropriate
- [x] Update `Justfile` commands (if present) for workspace structure
- [x] Verify `cargo install --path crates/tugtool` works (document in README)
**CLAUDE.md updates:**
```markdown
## Architecture
tugtool is organized as a Cargo workspace with the following crates:
crates/
+-- tugtool/ # Main binary and CLI
+-- tugtool-core/ # Shared infrastructure
+-- tugtool-python/ # Python language support
+-- tugtool-rust/ # Rust language support (planned)
### Build Commands
# Build all crates
cargo build
# Build specific crate
cargo build -p tugtool-core
# Build with specific features
cargo build --no-default-features --features python
```
**Checkpoint:**
- [x] CLAUDE.md reflects new structure
- [x] `cargo doc --workspace` succeeds
**Rollback:**
- `git checkout -- CLAUDE.md README.md`
**Commit after all checkpoints pass.**
---
#### Step 6.3: Verify full test suite and metrics {#step-6-3}
**Commit:** `test: verify workspace migration maintains test coverage`
**References:** (#success-criteria)
**Artifacts:**
- Test report showing all tests pass
- Build timing comparison
**Tasks:**
- [x] Run full test suite: `cargo nextest run --workspace`
- [x] Run clippy: `cargo clippy --workspace -- -D warnings`
- [x] Run fmt: `cargo fmt --all --check`
- [x] Compare build times with baseline from Step 0
- [x] Verify `cargo install --path crates/tugtool` works
**Checkpoint:**
- [x] `cargo nextest run --workspace` - all tests pass (643 tests)
- [x] `cargo clippy --workspace -- -D warnings` - no warnings
- [x] `cargo fmt --all --check` - no formatting issues
- [x] Build times similar or improved vs baseline (~8s clean build)
**Rollback:** N/A (verification step)
**Commit after all checkpoints pass.**
---
## 2.0.6 Deliverables and Checkpoints {#deliverables}
**Deliverable:** Tugtool restructured as Cargo workspace with 4 member crates (tugtool, tugtool-core, tugtool-python, tugtool-rust), feature flags for language selection, and preserved API compatibility.
### Phase Exit Criteria ("Done means...") {#exit-criteria}
- [x] Root Cargo.toml is a **virtual workspace** (no `[package]` section)
- [x] `src/` directory no longer exists
- [x] All 4 crates compile independently (`cargo check -p <crate>`)
- [x] Full test suite passes (`cargo nextest run --workspace`)
- [x] `cargo build -p tugtool --no-default-features` produces working binary
- [x] `cargo build -p tugtool --features python` includes Python support
- [x] CLAUDE.md updated with new structure
- [x] CLI and JSON output unchanged from pre-migration behavior
- [x] `tests/api_surface.rs` compiles (public API contract preserved)
**Acceptance tests:**
- [x] Integration test: Full rename operation works end-to-end
- [x] Integration test: MCP server starts and responds to tool calls
- [x] Golden test: JSON output schemas unchanged
- [x] API surface test: All public re-exports accessible
**CRITICAL: Test count verification**
- [x] Final test count: 643 tests (exceeds baseline of 639)
### Milestones (Within Phase) {#milestones}
**Milestone M00: Hybrid workspace established (Step 1)** {#m00-hybrid-workspace}
- [x] Root Cargo.toml has BOTH `[workspace]` AND `[package]` sections
- [x] All 639 tests still pass
- [x] Empty crate skeletons exist in `crates/`
**Milestone M01: Core crate complete (Step 2)** {#m01-core-complete}
- [x] tugtool-core contains all shared infrastructure
- [x] Root `src/lib.rs` re-exports from tugtool-core
- [x] **All 643 tests still pass** (critical!)
- [x] Core crate tests pass independently
**Milestone M02: Python crate complete (Step 3)** {#m02-python-complete}
- [x] tugtool-python contains all Python support
- [x] Root `src/lib.rs` re-exports from tugtool-python
- [x] **All 643 tests still pass** (critical!)
- [x] Python crate tests pass independently (188 tests)
**Milestone M03: Workspace integrated (Step 5)** {#m03-workspace-integrated}
- [x] Main tugtool crate in `crates/tugtool/` composes all pieces
- [x] Feature flags work correctly
- [x] **All 643 tests still pass** (critical!)
**Milestone M04: Virtual workspace complete (Step 6)** {#m04-virtual-workspace}
- [x] Root Cargo.toml has NO `[package]` section
- [x] `src/` directory removed
- [x] **All 643 tests still pass** (critical!)
### Roadmap / Follow-ons (Explicitly Not Required for Phase Close) {#roadmap}
- [ ] Implement actual Rust language support in tugtool-rust
- [ ] Consider extracting MCP to separate crate if build times warrant
- [ ] Add per-crate CI jobs for parallel testing
- [ ] Investigate dynamic plugin loading for languages
| Checkpoint | Verification |
|------------|--------------|
| Virtual workspace | Root Cargo.toml has no `[package]` section |
| src/ removed | `! -d src` (directory does not exist) |
| Workspace compiles | `cargo build --workspace` |
| All tests pass | `cargo nextest run --workspace` (must show 643 tests) |
| Features work | `cargo build -p tugtool --no-default-features --features python` |
| No regressions | Compare test counts and build times with baseline |
| API preserved | `tests/api_surface.rs` compiles |
**CRITICAL: If test count drops below baseline at any step, STOP and investigate before proceeding.**
**Commit after all checkpoints pass.**
AI coding assistants like Claude Code and Cursor have changed the way I work. My daily programming today looks nothing like it did even a couple years ago. Today, I hardly ever write individual lines of code. AI coding assistants has relieved me of this. It’s better at it than I am. I’m OK with that.
This is a big change for me, because over my thirty-year career, writing individual lines of code has my main pursuit, taking up most of my time and attention. It was the only way to build up the functions, methods, libraries, frameworks, and apps that delivered on my actual goal: producing great software that is useful and meaningful to people.
Now that I can delegate a lot of this work AI coding assistants, and that means I can focus more on thinking about exactly what I want to make, rather than tediously and laboriously trying to achieve my desired effects. I now spend more time thinking about the edifice as a whole, rather than on building it up brick by brick.
I love this. It’s freeing.
I know others don’t agree. Judging from what I see and read on the internet, many experienced software developers seem to think AI has nothing to offer—that it’s a fad, a trap, an option only for the lazy or uninitiated. Whatever. It’s not how real programming gets done.
This reminds me of The Story of Mel, a Real Programmer. As the story goes, there once was a programmer, named Mel. Mel had a computer with a drum memory, and it was possible to optimize every instruction the machine executed. He wrote his code in machine language.
Raw, unadorned, inscrutable hexadecimal numbers. Directly.
Mel reveled in this kind of work. He knew all the minutiae of how his machine functioned. He could calculate the speed of the rotating drum storing his machine instructions, and he could time the read head to pass over the address of his next instruction at precisely the right moment for optimal execution speed. He scoffed at programmers who were starting to use a new-fangled software aid called an optimizing assembler, a tool to assign these addresses to the machine instructions automatically. He refused to use it.
Mel said, “You never know where it’s going to put things”.
Plus ça change…
Today, optimizing assembers lie several levels beneath the notice of contemporary real programmers. Over time, we have simply come to accept the loss of detail Mel thought was essential to proper work—since it wasn’t actually essential to the task. It was merely essential to Mel’s view of himself as a programmer.
Who knows what happened to Mel when the exact details of his machine changed out from under him? It didn’t take long, for sure. That change came fast. What didn’t change was that the actual goal for Mel was the same then as it is today: making great software. The same kind of change that Mel ridiculed is—but surely fell victim to over time—is happening right now. Only the bugaboo isn’t optimizing assemblers; it’s AI.
Yes, AI coding assistants are non-deterministic. Sure, they easily get off track. Indeed, “You never know where it’s going to put things.”
Yet, we’ve crossed over into a new era anyway. Change is coming to all software development. AI is a box of magic tools, and today, I think the job of a real programmer is to figure out how to make the best use of this new potential. Build new processes. Make new tools. Integrate AI into a new working method. Succeed at this, and help to define a better future for how we create software. Ignore the new potential, and run the risk of being left behind.
Starting a software project involves so many things that have to get done before getting down to the actual work of developing the code itself: defining the scope and audience, picking a development language, coming up with good names… and putting up a website.
My main gripe with contemporary web tech is that the state of the art is so complicated. Following the trends means piling up layers and layers of libraries and modules. Mounds of javascript for everything. Complex deployment flows. Nothing is elegant.
When I sat down to make the site you’re reading right now, I did what I typically do these days: I asked an LLM for help. Collaborating with AI to do tech work is a major part of the motivation for this project. My aim is to find out what AI is actually good for, figure what gaps there are to fill, and then decide if there’s something I can do about filling them. More on that in future posts.
For this site, I asked GPT-5.2 to help me come up with a plan for a simple website. A landing page, top nav, link to github, a blog. It wound up producing a plan for Astro+Keystatic+Coolify+Hetzner+GitHub Apps+Docker+CI+Webhooks… and more! It seemed like a lot of tech for a little job, but I decided to follow the steps I got. I hoped I might learn something along the way.
What I actually learned was: bad idea. I didn’t think enough myself. I put myself in the hands of the LLM and its advice without knowing completely what I was going to get when everything was done. So, I got bunch of services that didn’t really talk to each other in a way I liked. I threw it away. A day of work wasted.
That was yesterday. I went to sleep. I woke up at 2:30 AM with an idea for doing better. I sat on the couch for an hour or so and sketched it out. I then worked on it couple more hours after morning coffee. Between me and Claude/Opus and GPT 5.2, I came up with a plan, with this as the introduction.
Static Site + Blog Infrastructure Plan
Astro + MDX · rsync deploy · Linode · Cloudflare DNS
This document defines a boring, sane, long-lived infrastructure for a content-driven website and blog.
The guiding principle is simple:
Build locally. Inspect everything. Deploy files. Serve static content.
No CI rebuilds. No Docker. No GitHub in the deploy path. No surprises.
I then banged out the work itself in about two hours. You’re looking at the result now. The site generates static pages with a single deploy command that takes my local files (which must not have any uncommitted git changes), compiles them down, and then pushes that to a site on linode. I can view posts in the browser as I am writing them by hitting save and have the server hot-reload the page. When I deploy, I get exactly on the deployed site what I saw locally. As it says, No surprises.
It feels like this is the way that blogs should work. Live editing and previews. WYSIWYG in its way, with local and deployed versions of the site the same.
It’s about as elegant a solution as I could think of, with the state of web tech today, and without shifting my focus (for too long) away from the real goal of this project, which is to figure out some software to help AIs with code refactoring.