|
@@ -0,0 +1,331 @@
|
|
|
|
|
+# ROADMAP.md
|
|
|
|
|
+
|
|
|
|
|
+# Clawable Coding Harness Roadmap
|
|
|
|
|
+
|
|
|
|
|
+## Goal
|
|
|
|
|
+
|
|
|
|
|
+Turn claw-code into the most **clawable** coding harness:
|
|
|
|
|
+- no human-first terminal assumptions
|
|
|
|
|
+- no fragile prompt injection timing
|
|
|
|
|
+- no opaque session state
|
|
|
|
|
+- no hidden plugin or MCP failures
|
|
|
|
|
+- no manual babysitting for routine recovery
|
|
|
|
|
+
|
|
|
|
|
+This roadmap assumes the primary users are **claws wired through hooks, plugins, sessions, and channel events**.
|
|
|
|
|
+
|
|
|
|
|
+## Definition of "clawable"
|
|
|
|
|
+
|
|
|
|
|
+A clawable harness is:
|
|
|
|
|
+- deterministic to start
|
|
|
|
|
+- machine-readable in state and failure modes
|
|
|
|
|
+- recoverable without a human watching the terminal
|
|
|
|
|
+- branch/test/worktree aware
|
|
|
|
|
+- plugin/MCP lifecycle aware
|
|
|
|
|
+- event-first, not log-first
|
|
|
|
|
+- capable of autonomous next-step execution
|
|
|
|
|
+
|
|
|
|
|
+## Current Pain Points
|
|
|
|
|
+
|
|
|
|
|
+### 1. Session boot is fragile
|
|
|
|
|
+- trust prompts can block TUI startup
|
|
|
|
|
+- prompts can land in the shell instead of the coding agent
|
|
|
|
|
+- "session exists" does not mean "session is ready"
|
|
|
|
|
+
|
|
|
|
|
+### 2. Truth is split across layers
|
|
|
|
|
+- tmux state
|
|
|
|
|
+- clawhip event stream
|
|
|
|
|
+- git/worktree state
|
|
|
|
|
+- test state
|
|
|
|
|
+- gateway/plugin/MCP runtime state
|
|
|
|
|
+
|
|
|
|
|
+### 3. Events are too log-shaped
|
|
|
|
|
+- claws currently infer too much from noisy text
|
|
|
|
|
+- important states are not normalized into machine-readable events
|
|
|
|
|
+
|
|
|
|
|
+### 4. Recovery loops are too manual
|
|
|
|
|
+- restart worker
|
|
|
|
|
+- accept trust prompt
|
|
|
|
|
+- re-inject prompt
|
|
|
|
|
+- detect stale branch
|
|
|
|
|
+- retry failed startup
|
|
|
|
|
+- classify infra vs code failures manually
|
|
|
|
|
+
|
|
|
|
|
+### 5. Branch freshness is not enforced enough
|
|
|
|
|
+- side branches can miss already-landed main fixes
|
|
|
|
|
+- broad test failures can be stale-branch noise instead of real regressions
|
|
|
|
|
+
|
|
|
|
|
+### 6. Plugin/MCP failures are under-classified
|
|
|
|
|
+- startup failures, handshake failures, config errors, partial startup, and degraded mode are not exposed cleanly enough
|
|
|
|
|
+
|
|
|
|
|
+### 7. Human UX still leaks into claw workflows
|
|
|
|
|
+- too much depends on terminal/TUI behavior instead of explicit agent state transitions and control APIs
|
|
|
|
|
+
|
|
|
|
|
+## Product Principles
|
|
|
|
|
+
|
|
|
|
|
+1. **State machine first** — every worker has explicit lifecycle states.
|
|
|
|
|
+2. **Events over scraped prose** — channel output should be derived from typed events.
|
|
|
|
|
+3. **Recovery before escalation** — known failure modes should auto-heal once before asking for help.
|
|
|
|
|
+4. **Branch freshness before blame** — detect stale branches before treating red tests as new regressions.
|
|
|
|
|
+5. **Partial success is first-class** — e.g. MCP startup can succeed for some servers and fail for others, with structured degraded-mode reporting.
|
|
|
|
|
+6. **Terminal is transport, not truth** — tmux/TUI may remain implementation details, but orchestration state must live above them.
|
|
|
|
|
+7. **Policy is executable** — merge, retry, rebase, stale cleanup, and escalation rules should be machine-enforced.
|
|
|
|
|
+
|
|
|
|
|
+## Roadmap
|
|
|
|
|
+
|
|
|
|
|
+## Phase 1 — Reliable Worker Boot
|
|
|
|
|
+
|
|
|
|
|
+### 1. Ready-handshake lifecycle for coding workers
|
|
|
|
|
+Add explicit states:
|
|
|
|
|
+- `spawning`
|
|
|
|
|
+- `trust_required`
|
|
|
|
|
+- `ready_for_prompt`
|
|
|
|
|
+- `prompt_accepted`
|
|
|
|
|
+- `running`
|
|
|
|
|
+- `blocked`
|
|
|
|
|
+- `finished`
|
|
|
|
|
+- `failed`
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- prompts are never sent before `ready_for_prompt`
|
|
|
|
|
+- trust prompt state is detectable and emitted
|
|
|
|
|
+- shell misdelivery becomes detectable as a first-class failure state
|
|
|
|
|
+
|
|
|
|
|
+### 2. Trust prompt resolver
|
|
|
|
|
+Add allowlisted auto-trust behavior for known repos/worktrees.
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- trusted repos auto-clear trust prompts
|
|
|
|
|
+- events emitted for `trust_required` and `trust_resolved`
|
|
|
|
|
+- non-allowlisted repos remain gated
|
|
|
|
|
+
|
|
|
|
|
+### 3. Structured session control API
|
|
|
|
|
+Provide machine control above tmux:
|
|
|
|
|
+- create worker
|
|
|
|
|
+- await ready
|
|
|
|
|
+- send task
|
|
|
|
|
+- fetch state
|
|
|
|
|
+- fetch last error
|
|
|
|
|
+- restart worker
|
|
|
|
|
+- terminate worker
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- a claw can operate a coding worker without raw send-keys as the primary control plane
|
|
|
|
|
+
|
|
|
|
|
+## Phase 2 — Event-Native Clawhip Integration
|
|
|
|
|
+
|
|
|
|
|
+### 4. Canonical lane event schema
|
|
|
|
|
+Define typed events such as:
|
|
|
|
|
+- `lane.started`
|
|
|
|
|
+- `lane.ready`
|
|
|
|
|
+- `lane.prompt_misdelivery`
|
|
|
|
|
+- `lane.blocked`
|
|
|
|
|
+- `lane.red`
|
|
|
|
|
+- `lane.green`
|
|
|
|
|
+- `lane.commit.created`
|
|
|
|
|
+- `lane.pr.opened`
|
|
|
|
|
+- `lane.merge.ready`
|
|
|
|
|
+- `lane.finished`
|
|
|
|
|
+- `lane.failed`
|
|
|
|
|
+- `branch.stale_against_main`
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- clawhip consumes typed lane events
|
|
|
|
|
+- Discord summaries are rendered from structured events instead of pane scraping alone
|
|
|
|
|
+
|
|
|
|
|
+### 5. Failure taxonomy
|
|
|
|
|
+Normalize failure classes:
|
|
|
|
|
+- `prompt_delivery`
|
|
|
|
|
+- `trust_gate`
|
|
|
|
|
+- `branch_divergence`
|
|
|
|
|
+- `compile`
|
|
|
|
|
+- `test`
|
|
|
|
|
+- `plugin_startup`
|
|
|
|
|
+- `mcp_startup`
|
|
|
|
|
+- `mcp_handshake`
|
|
|
|
|
+- `gateway_routing`
|
|
|
|
|
+- `tool_runtime`
|
|
|
|
|
+- `infra`
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- blockers are machine-classified
|
|
|
|
|
+- dashboards and retry policies can branch on failure type
|
|
|
|
|
+
|
|
|
|
|
+### 6. Actionable summary compression
|
|
|
|
|
+Collapse noisy event streams into:
|
|
|
|
|
+- current phase
|
|
|
|
|
+- last successful checkpoint
|
|
|
|
|
+- current blocker
|
|
|
|
|
+- recommended next recovery action
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- channel status updates stay short and machine-grounded
|
|
|
|
|
+- claws stop inferring state from raw build spam
|
|
|
|
|
+
|
|
|
|
|
+## Phase 3 — Branch/Test Awareness and Auto-Recovery
|
|
|
|
|
+
|
|
|
|
|
+### 7. Stale-branch detection before broad verification
|
|
|
|
|
+Before broad test runs, compare current branch to `main` and detect if known fixes are missing.
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- emit `branch.stale_against_main`
|
|
|
|
|
+- suggest or auto-run rebase/merge-forward according to policy
|
|
|
|
|
+- avoid misclassifying stale-branch failures as new regressions
|
|
|
|
|
+
|
|
|
|
|
+### 8. Recovery recipes for common failures
|
|
|
|
|
+Encode known automatic recoveries for:
|
|
|
|
|
+- trust prompt unresolved
|
|
|
|
|
+- prompt delivered to shell
|
|
|
|
|
+- stale branch
|
|
|
|
|
+- compile red after cross-crate refactor
|
|
|
|
|
+- MCP startup handshake failure
|
|
|
|
|
+- partial plugin startup
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- one automatic recovery attempt occurs before escalation
|
|
|
|
|
+- the attempted recovery is itself emitted as structured event data
|
|
|
|
|
+
|
|
|
|
|
+### 9. Green-ness contract
|
|
|
|
|
+Workers should distinguish:
|
|
|
|
|
+- targeted tests green
|
|
|
|
|
+- package green
|
|
|
|
|
+- workspace green
|
|
|
|
|
+- merge-ready green
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- no more ambiguous "tests passed" messaging
|
|
|
|
|
+- merge policy can require the correct green level for the lane type
|
|
|
|
|
+
|
|
|
|
|
+## Phase 4 — Claws-First Task Execution
|
|
|
|
|
+
|
|
|
|
|
+### 10. Typed task packet format
|
|
|
|
|
+Define a structured task packet with fields like:
|
|
|
|
|
+- objective
|
|
|
|
|
+- scope
|
|
|
|
|
+- repo/worktree
|
|
|
|
|
+- branch policy
|
|
|
|
|
+- acceptance tests
|
|
|
|
|
+- commit policy
|
|
|
|
|
+- reporting contract
|
|
|
|
|
+- escalation policy
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- claws can dispatch work without relying on long natural-language prompt blobs alone
|
|
|
|
|
+- task packets can be logged, retried, and transformed safely
|
|
|
|
|
+
|
|
|
|
|
+### 11. Policy engine for autonomous coding
|
|
|
|
|
+Encode automation rules such as:
|
|
|
|
|
+- if green + scoped diff + review passed -> merge to dev
|
|
|
|
|
+- if stale branch -> merge-forward before broad tests
|
|
|
|
|
+- if startup blocked -> recover once, then escalate
|
|
|
|
|
+- if lane completed -> emit closeout and cleanup session
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- doctrine moves from chat instructions into executable rules
|
|
|
|
|
+
|
|
|
|
|
+### 12. Claw-native dashboards / lane board
|
|
|
|
|
+Expose a machine-readable board of:
|
|
|
|
|
+- repos
|
|
|
|
|
+- active claws
|
|
|
|
|
+- worktrees
|
|
|
|
|
+- branch freshness
|
|
|
|
|
+- red/green state
|
|
|
|
|
+- current blocker
|
|
|
|
|
+- merge readiness
|
|
|
|
|
+- last meaningful event
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- claws can query status directly
|
|
|
|
|
+- human-facing views become a rendering layer, not the source of truth
|
|
|
|
|
+
|
|
|
|
|
+## Phase 5 — Plugin and MCP Lifecycle Maturity
|
|
|
|
|
+
|
|
|
|
|
+### 13. First-class plugin/MCP lifecycle contract
|
|
|
|
|
+Each plugin/MCP integration should expose:
|
|
|
|
|
+- config validation contract
|
|
|
|
|
+- startup healthcheck
|
|
|
|
|
+- discovery result
|
|
|
|
|
+- degraded-mode behavior
|
|
|
|
|
+- shutdown/cleanup contract
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- partial-startup and per-server failures are reported structurally
|
|
|
|
|
+- successful servers remain usable even when one server fails
|
|
|
|
|
+
|
|
|
|
|
+### 14. MCP end-to-end lifecycle parity
|
|
|
|
|
+Close gaps from:
|
|
|
|
|
+- config load
|
|
|
|
|
+- server registration
|
|
|
|
|
+- spawn/connect
|
|
|
|
|
+- initialize handshake
|
|
|
|
|
+- tool/resource discovery
|
|
|
|
|
+- invocation path
|
|
|
|
|
+- error surfacing
|
|
|
|
|
+- shutdown/cleanup
|
|
|
|
|
+
|
|
|
|
|
+Acceptance:
|
|
|
|
|
+- parity harness and runtime tests cover healthy and degraded startup cases
|
|
|
|
|
+- broken servers are surfaced as structured failures, not opaque warnings
|
|
|
|
|
+
|
|
|
|
|
+## Immediate Backlog (from current real pain)
|
|
|
|
|
+
|
|
|
|
|
+1. Worker readiness handshake + trust resolution
|
|
|
|
|
+2. Prompt misdelivery detection and recovery
|
|
|
|
|
+3. Canonical lane event schema in clawhip
|
|
|
|
|
+4. Failure taxonomy + blocker normalization
|
|
|
|
|
+5. Stale-branch detection before workspace tests
|
|
|
|
|
+6. MCP structured degraded-startup reporting
|
|
|
|
|
+7. Structured task packet format
|
|
|
|
|
+8. Lane board / machine-readable status API
|
|
|
|
|
+
|
|
|
|
|
+## Suggested Session Split
|
|
|
|
|
+
|
|
|
|
|
+### Session A — worker boot protocol
|
|
|
|
|
+Focus:
|
|
|
|
|
+- trust prompt detection
|
|
|
|
|
+- ready-for-prompt handshake
|
|
|
|
|
+- prompt misdelivery detection
|
|
|
|
|
+
|
|
|
|
|
+### Session B — clawhip lane events
|
|
|
|
|
+Focus:
|
|
|
|
|
+- canonical lane event schema
|
|
|
|
|
+- failure taxonomy
|
|
|
|
|
+- summary compression
|
|
|
|
|
+
|
|
|
|
|
+### Session C — branch/test intelligence
|
|
|
|
|
+Focus:
|
|
|
|
|
+- stale-branch detection
|
|
|
|
|
+- green-level contract
|
|
|
|
|
+- recovery recipes
|
|
|
|
|
+
|
|
|
|
|
+### Session D — MCP lifecycle hardening
|
|
|
|
|
+Focus:
|
|
|
|
|
+- startup/handshake reliability
|
|
|
|
|
+- structured failed server reporting
|
|
|
|
|
+- degraded-mode runtime behavior
|
|
|
|
|
+- lifecycle tests/harness coverage
|
|
|
|
|
+
|
|
|
|
|
+### Session E — typed task packets + policy engine
|
|
|
|
|
+Focus:
|
|
|
|
|
+- structured task format
|
|
|
|
|
+- retry/merge/escalation rules
|
|
|
|
|
+- autonomous lane closure behavior
|
|
|
|
|
+
|
|
|
|
|
+## MVP Success Criteria
|
|
|
|
|
+
|
|
|
|
|
+We should consider claw-code materially more clawable when:
|
|
|
|
|
+- a claw can start a worker and know with certainty when it is ready
|
|
|
|
|
+- claws no longer accidentally type tasks into the shell
|
|
|
|
|
+- stale-branch failures are identified before they waste debugging time
|
|
|
|
|
+- clawhip reports machine states, not just tmux prose
|
|
|
|
|
+- MCP/plugin startup failures are classified and surfaced cleanly
|
|
|
|
|
+- a coding lane can self-recover from common startup and branch issues without human babysitting
|
|
|
|
|
+
|
|
|
|
|
+## Short Version
|
|
|
|
|
+
|
|
|
|
|
+claw-code should evolve from:
|
|
|
|
|
+- a CLI a human can also drive
|
|
|
|
|
+
|
|
|
|
|
+to:
|
|
|
|
|
+- a **claw-native execution runtime**
|
|
|
|
|
+- an **event-native orchestration substrate**
|
|
|
|
|
+- a **plugin/hook-first autonomous coding harness**
|