3 月之前 · 95aa5ef15c
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -0,0 +1,331 @@
 
				+# ROADMAP.md
			
 
				+
			
 
				+# Clawable Coding Harness Roadmap
			
 
				+
			
 
				+## Goal
			
 
				+
			
 
				+Turn claw-code into the most **clawable** coding harness:
			
 
				+- no human-first terminal assumptions
			
 
				+- no fragile prompt injection timing
			
 
				+- no opaque session state
			
 
				+- no hidden plugin or MCP failures
			
 
				+- no manual babysitting for routine recovery
			
 
				+
			
 
				+This roadmap assumes the primary users are **claws wired through hooks, plugins, sessions, and channel events**.
			
 
				+
			
 
				+## Definition of "clawable"
			
 
				+
			
 
				+A clawable harness is:
			
 
				+- deterministic to start
			
 
				+- machine-readable in state and failure modes
			
 
				+- recoverable without a human watching the terminal
			
 
				+- branch/test/worktree aware
			
 
				+- plugin/MCP lifecycle aware
			
 
				+- event-first, not log-first
			
 
				+- capable of autonomous next-step execution
			
 
				+
			
 
				+## Current Pain Points
			
 
				+
			
 
				+### 1. Session boot is fragile
			
 
				+- trust prompts can block TUI startup
			
 
				+- prompts can land in the shell instead of the coding agent
			
 
				+- "session exists" does not mean "session is ready"
			
 
				+
			
 
				+### 2. Truth is split across layers
			
 
				+- tmux state
			
 
				+- clawhip event stream
			
 
				+- git/worktree state
			
 
				+- test state
			
 
				+- gateway/plugin/MCP runtime state
			
 
				+
			
 
				+### 3. Events are too log-shaped
			
 
				+- claws currently infer too much from noisy text
			
 
				+- important states are not normalized into machine-readable events
			
 
				+
			
 
				+### 4. Recovery loops are too manual
			
 
				+- restart worker
			
 
				+- accept trust prompt
			
 
				+- re-inject prompt
			
 
				+- detect stale branch
			
 
				+- retry failed startup
			
 
				+- classify infra vs code failures manually
			
 
				+
			
 
				+### 5. Branch freshness is not enforced enough
			
 
				+- side branches can miss already-landed main fixes
			
 
				+- broad test failures can be stale-branch noise instead of real regressions
			
 
				+
			
 
				+### 6. Plugin/MCP failures are under-classified
			
 
				+- startup failures, handshake failures, config errors, partial startup, and degraded mode are not exposed cleanly enough
			
 
				+
			
 
				+### 7. Human UX still leaks into claw workflows
			
 
				+- too much depends on terminal/TUI behavior instead of explicit agent state transitions and control APIs
			
 
				+
			
 
				+## Product Principles
			
 
				+
			
 
				+1. **State machine first** — every worker has explicit lifecycle states.
			
 
				+2. **Events over scraped prose** — channel output should be derived from typed events.
			
 
				+3. **Recovery before escalation** — known failure modes should auto-heal once before asking for help.
			
 
				+4. **Branch freshness before blame** — detect stale branches before treating red tests as new regressions.
			
 
				+5. **Partial success is first-class** — e.g. MCP startup can succeed for some servers and fail for others, with structured degraded-mode reporting.
			
 
				+6. **Terminal is transport, not truth** — tmux/TUI may remain implementation details, but orchestration state must live above them.
			
 
				+7. **Policy is executable** — merge, retry, rebase, stale cleanup, and escalation rules should be machine-enforced.
			
 
				+
			
 
				+## Roadmap
			
 
				+
			
 
				+## Phase 1 — Reliable Worker Boot
			
 
				+
			
 
				+### 1. Ready-handshake lifecycle for coding workers
			
 
				+Add explicit states:
			
 
				+- `spawning`
			
 
				+- `trust_required`
			
 
				+- `ready_for_prompt`
			
 
				+- `prompt_accepted`
			
 
				+- `running`
			
 
				+- `blocked`
			
 
				+- `finished`
			
 
				+- `failed`
			
 
				+
			
 
				+Acceptance:
			
 
				+- prompts are never sent before `ready_for_prompt`
			
 
				+- trust prompt state is detectable and emitted
			
 
				+- shell misdelivery becomes detectable as a first-class failure state
			
 
				+
			
 
				+### 2. Trust prompt resolver
			
 
				+Add allowlisted auto-trust behavior for known repos/worktrees.
			
 
				+
			
 
				+Acceptance:
			
 
				+- trusted repos auto-clear trust prompts
			
 
				+- events emitted for `trust_required` and `trust_resolved`
			
 
				+- non-allowlisted repos remain gated
			
 
				+
			
 
				+### 3. Structured session control API
			
 
				+Provide machine control above tmux:
			
 
				+- create worker
			
 
				+- await ready
			
 
				+- send task
			
 
				+- fetch state
			
 
				+- fetch last error
			
 
				+- restart worker
			
 
				+- terminate worker
			
 
				+
			
 
				+Acceptance:
			
 
				+- a claw can operate a coding worker without raw send-keys as the primary control plane
			
 
				+
			
 
				+## Phase 2 — Event-Native Clawhip Integration
			
 
				+
			
 
				+### 4. Canonical lane event schema
			
 
				+Define typed events such as:
			
 
				+- `lane.started`
			
 
				+- `lane.ready`
			
 
				+- `lane.prompt_misdelivery`
			
 
				+- `lane.blocked`
			
 
				+- `lane.red`
			
 
				+- `lane.green`
			
 
				+- `lane.commit.created`
			
 
				+- `lane.pr.opened`
			
 
				+- `lane.merge.ready`
			
 
				+- `lane.finished`
			
 
				+- `lane.failed`
			
 
				+- `branch.stale_against_main`
			
 
				+
			
 
				+Acceptance:
			
 
				+- clawhip consumes typed lane events
			
 
				+- Discord summaries are rendered from structured events instead of pane scraping alone
			
 
				+
			
 
				+### 5. Failure taxonomy
			
 
				+Normalize failure classes:
			
 
				+- `prompt_delivery`
			
 
				+- `trust_gate`
			
 
				+- `branch_divergence`
			
 
				+- `compile`
			
 
				+- `test`
			
 
				+- `plugin_startup`
			
 
				+- `mcp_startup`
			
 
				+- `mcp_handshake`
			
 
				+- `gateway_routing`
			
 
				+- `tool_runtime`
			
 
				+- `infra`
			
 
				+
			
 
				+Acceptance:
			
 
				+- blockers are machine-classified
			
 
				+- dashboards and retry policies can branch on failure type
			
 
				+
			
 
				+### 6. Actionable summary compression
			
 
				+Collapse noisy event streams into:
			
 
				+- current phase
			
 
				+- last successful checkpoint
			
 
				+- current blocker
			
 
				+- recommended next recovery action
			
 
				+
			
 
				+Acceptance:
			
 
				+- channel status updates stay short and machine-grounded
			
 
				+- claws stop inferring state from raw build spam
			
 
				+
			
 
				+## Phase 3 — Branch/Test Awareness and Auto-Recovery
			
 
				+
			
 
				+### 7. Stale-branch detection before broad verification
			
 
				+Before broad test runs, compare current branch to `main` and detect if known fixes are missing.
			
 
				+
			
 
				+Acceptance:
			
 
				+- emit `branch.stale_against_main`
			
 
				+- suggest or auto-run rebase/merge-forward according to policy
			
 
				+- avoid misclassifying stale-branch failures as new regressions
			
 
				+
			
 
				+### 8. Recovery recipes for common failures
			
 
				+Encode known automatic recoveries for:
			
 
				+- trust prompt unresolved
			
 
				+- prompt delivered to shell
			
 
				+- stale branch
			
 
				+- compile red after cross-crate refactor
			
 
				+- MCP startup handshake failure
			
 
				+- partial plugin startup
			
 
				+
			
 
				+Acceptance:
			
 
				+- one automatic recovery attempt occurs before escalation
			
 
				+- the attempted recovery is itself emitted as structured event data
			
 
				+
			
 
				+### 9. Green-ness contract
			
 
				+Workers should distinguish:
			
 
				+- targeted tests green
			
 
				+- package green
			
 
				+- workspace green
			
 
				+- merge-ready green
			
 
				+
			
 
				+Acceptance:
			
 
				+- no more ambiguous "tests passed" messaging
			
 
				+- merge policy can require the correct green level for the lane type
			
 
				+
			
 
				+## Phase 4 — Claws-First Task Execution
			
 
				+
			
 
				+### 10. Typed task packet format
			
 
				+Define a structured task packet with fields like:
			
 
				+- objective
			
 
				+- scope
			
 
				+- repo/worktree
			
 
				+- branch policy
			
 
				+- acceptance tests
			
 
				+- commit policy
			
 
				+- reporting contract
			
 
				+- escalation policy
			
 
				+
			
 
				+Acceptance:
			
 
				+- claws can dispatch work without relying on long natural-language prompt blobs alone
			
 
				+- task packets can be logged, retried, and transformed safely
			
 
				+
			
 
				+### 11. Policy engine for autonomous coding
			
 
				+Encode automation rules such as:
			
 
				+- if green + scoped diff + review passed -> merge to dev
			
 
				+- if stale branch -> merge-forward before broad tests
			
 
				+- if startup blocked -> recover once, then escalate
			
 
				+- if lane completed -> emit closeout and cleanup session
			
 
				+
			
 
				+Acceptance:
			
 
				+- doctrine moves from chat instructions into executable rules
			
 
				+
			
 
				+### 12. Claw-native dashboards / lane board
			
 
				+Expose a machine-readable board of:
			
 
				+- repos
			
 
				+- active claws
			
 
				+- worktrees
			
 
				+- branch freshness
			
 
				+- red/green state
			
 
				+- current blocker
			
 
				+- merge readiness
			
 
				+- last meaningful event
			
 
				+
			
 
				+Acceptance:
			
 
				+- claws can query status directly
			
 
				+- human-facing views become a rendering layer, not the source of truth
			
 
				+
			
 
				+## Phase 5 — Plugin and MCP Lifecycle Maturity
			
 
				+
			
 
				+### 13. First-class plugin/MCP lifecycle contract
			
 
				+Each plugin/MCP integration should expose:
			
 
				+- config validation contract
			
 
				+- startup healthcheck
			
 
				+- discovery result
			
 
				+- degraded-mode behavior
			
 
				+- shutdown/cleanup contract
			
 
				+
			
 
				+Acceptance:
			
 
				+- partial-startup and per-server failures are reported structurally
			
 
				+- successful servers remain usable even when one server fails
			
 
				+
			
 
				+### 14. MCP end-to-end lifecycle parity
			
 
				+Close gaps from:
			
 
				+- config load
			
 
				+- server registration
			
 
				+- spawn/connect
			
 
				+- initialize handshake
			
 
				+- tool/resource discovery
			
 
				+- invocation path
			
 
				+- error surfacing
			
 
				+- shutdown/cleanup
			
 
				+
			
 
				+Acceptance:
			
 
				+- parity harness and runtime tests cover healthy and degraded startup cases
			
 
				+- broken servers are surfaced as structured failures, not opaque warnings
			
 
				+
			
 
				+## Immediate Backlog (from current real pain)
			
 
				+
			
 
				+1. Worker readiness handshake + trust resolution
			
 
				+2. Prompt misdelivery detection and recovery
			
 
				+3. Canonical lane event schema in clawhip
			
 
				+4. Failure taxonomy + blocker normalization
			
 
				+5. Stale-branch detection before workspace tests
			
 
				+6. MCP structured degraded-startup reporting
			
 
				+7. Structured task packet format
			
 
				+8. Lane board / machine-readable status API
			
 
				+
			
 
				+## Suggested Session Split
			
 
				+
			
 
				+### Session A — worker boot protocol
			
 
				+Focus:
			
 
				+- trust prompt detection
			
 
				+- ready-for-prompt handshake
			
 
				+- prompt misdelivery detection
			
 
				+
			
 
				+### Session B — clawhip lane events
			
 
				+Focus:
			
 
				+- canonical lane event schema
			
 
				+- failure taxonomy
			
 
				+- summary compression
			
 
				+
			
 
				+### Session C — branch/test intelligence
			
 
				+Focus:
			
 
				+- stale-branch detection
			
 
				+- green-level contract
			
 
				+- recovery recipes
			
 
				+
			
 
				+### Session D — MCP lifecycle hardening
			
 
				+Focus:
			
 
				+- startup/handshake reliability
			
 
				+- structured failed server reporting
			
 
				+- degraded-mode runtime behavior
			
 
				+- lifecycle tests/harness coverage
			
 
				+
			
 
				+### Session E — typed task packets + policy engine
			
 
				+Focus:
			
 
				+- structured task format
			
 
				+- retry/merge/escalation rules
			
 
				+- autonomous lane closure behavior
			
 
				+
			
 
				+## MVP Success Criteria
			
 
				+
			
 
				+We should consider claw-code materially more clawable when:
			
 
				+- a claw can start a worker and know with certainty when it is ready
			
 
				+- claws no longer accidentally type tasks into the shell
			
 
				+- stale-branch failures are identified before they waste debugging time
			
 
				+- clawhip reports machine states, not just tmux prose
			
 
				+- MCP/plugin startup failures are classified and surfaced cleanly
			
 
				+- a coding lane can self-recover from common startup and branch issues without human babysitting
			
 
				+
			
 
				+## Short Version
			
 
				+
			
 
				+claw-code should evolve from:
			
 
				+- a CLI a human can also drive
			
 
				+
			
 
				+to:
			
 
				+- a **claw-native execution runtime**
			
 
				+- an **event-native orchestration substrate**
			
 
				+- a **plugin/hook-first autonomous coding harness**