Roadmap - Scaffold OS

Current State — v5.6

What's live today.

Scaffold OS v5.6 is the current stable release — the platform foundation plus project intelligence, branch-aware delivery, trusted context summaries, deploy coordination, recommendation mode, and release-readiness coverage. Here's what the engine covers right now.

Build Protocol

Full Stack + Platform

Backend, frontend, automation, CMS, cloud infra, ML pipelines, and multi-surface targets — all coordinated.

Workflow State

Engine-Owned

Machine-readable canonical state, trusted context summaries, session handoff, decision tracking, deploy follow-through, and review-readiness signals.

Post-Build Model

Update Cycle + Recommendation

Each change round is a structured cycle with durable record, and live products can now route into next-version planning without falling back to loose conversation.

Platform

Delivery Intelligence + Readiness

v5.6 is reusable and more operationally complete — target registry, integration registry, event history, branch-aware delivery, deploy coordination, recommendation mode, readiness checks, and wrapper-surface support.

✓6 operating flows including recommendation mode

✓3-tier audit system (Spot / Focused / Full)

✓8-step contract change protocol

✓Three-state drift detection per feature

✓Project health score, auto-calculated each session

✓Trusted context summaries for faster session restarts

✓Spreadsheet-aware intake for workbook-driven products

✓Structured repo continuity signals for imported codebases

✓Pre-push secret checks as explicit workflow state

✓Deploy coordination with target expectations and follow-through

✓Rollback-aware delivery planning

✓Recommendation mode for planning the next version

✓Feature and coverage visibility for product surfaces

✓Session recovery protocol (no lost context)

✓Archaeology mode for existing codebases

✓Adaptive skill-level detection in brainstorm

✓Machine-readable project identity and state

✓Engine migration with additive upgrade handling

✓Build milestone ledger and session lifecycle signaling

✓Model configuration routing per phase

✓Demand reality check before architecture

✓Adversarial spec review before the plan is locked

✓Planning mode selection (Expand / Hold Scope / Reduce)

✓Error and rescue mapping plus observability planning

✓36 curated domain skills plus 1,300+ extended skills

✓11 project profiles with PASS / ADVISORY / FAIL enforcement

✓Named update cycle with durable records

✓Canonical workflow state and session handoff

✓Decision and approval tracking with explicit pending state

✓Target registry, integration registry, and lifecycle event history

✓Environment doctor and stronger release validation before real project work

Upcoming — Next 1-2 Months

Five category shifts.

Phase 1 · Highest Impact

Autonomous Delivery — Live URL Execution

Closing the full loop: Idea → Architecture → Build → Deploy → Live URL

Next Up

v5.5 already makes deployment coordination real: the engine can now describe what should ship, what checks should run, and what result should come back. The next step is full provider-side execution. Give the system server access and DNS configuration and the delivery layer should carry the whole release through to a live URL: backend, frontend, automation workflows, and external platform integrations — all coordinated, all at once.

This is not "deploy a single app." It is deploying an entire product — every system, every integration, every service — in one coordinated operation. That is the next category shift beyond the deploy intelligence that is already live.

Product Category Shift

From "AI development platform" to "AI software company in a box"

User Outcome

Idea to live URL — without touching a deployment dashboard

Phase 2 · Speed

Parallel Multi-Agent Building

Build time drops from days to hours — not a promise, a protocol

In Design

Today's build model is sequential: one agent, one step at a time. The evolution is simultaneous parallel agents — one handling backend infrastructure while another builds the frontend, a third configures automation workflows — all coordinated by the contract system that already exists.

The contract enforcement layer was built for exactly this purpose. Parallel agents can't overwrite each other's work because every shared interface is governed by an explicit, acknowledged contract. Build time on complex projects drops by 60-70%.

Build Time

15-session project → 3-4 sessions. Days → hours.

Architecture Change

Session management, state tracking, and contract system extended for concurrent writes

Phase 3 · Platform

Project Template Marketplace

Pre-built scaffold templates for common product architectures

Planned

After building enough projects, patterns emerge. "SaaS with auth, billing, and dashboard" accounts for 60% of builds. "Marketplace with two-sided payments" for another 20%. Pre-building these as scaffold templates means the brainstorm phase is 80% pre-done before you start. Build time drops further.

SaaS Starter E-Commerce Marketplace Internal Tool API Platform + Custom Templates

Platform Shift

From "AI development tool" to "platform" — templates shared by the community compound over time, making every user faster

Community Templates

Power users publish templates. Everyone benefits from proven scaffolds. The more builds that happen, the smarter the template library gets.

Phase 4 · Operations

Live Monitoring & Self-Healing

Health score extends from build-time to production runtime

Planned

Once the DevOps agent is deploying apps, the natural extension is: keep watching. Monitor uptime, error rates, and response times against the declared spec. When production behavior drifts from specification — the system opens a diagnostic session, traces root cause via logs, proposes a fix, and deploys the patch.

A user opens the dashboard: "App health: 94. Two endpoints responding 40% slower than spec. One feature error rate up 3x since last deploy." That dashboard has existed for build-time. Extending it to production turns the health score into a live operational metric.

What Changes for You

Your deployed apps become self-monitoring. Problems surface before users report them. Fixes happen before incidents escalate.

Further out. Higher stakes.

Horizon · Learning

Automated Learning from Build Outcomes

The system gets smarter from its own work — every build feeds quality signal back in

Planned

Each completed build feeds outcome data back into skill quality scoring. Skills that consistently correlate with successful builds get weighted higher in future project profiles. The system gets smarter from its own work.

Horizon · Portfolio

Cross-Project Health Benchmarking

One dashboard. All your projects. See which are drifting and which are stable.

Planned

Health scores become comparable across a portfolio of projects. A dashboard shows which projects are drifting, which are stable, and which need immediate attention — across all projects, not just the one open in the current session.

Horizon · Skills Platform

Natural Language Skill Authoring

Describe a capability in plain English. The system turns it into a structured skill definition.

Planned

Describe a specialist capability in plain English. The system generates a structured skill definition, tests it against historical project types, and proposes which quality gate category it belongs to. Human review gates acceptance.

Horizon · Risk

Predictive Risk Scoring

Before architecture is locked, the planning layer scores the risk profile — predicting failure categories, not just flagging issues

Planned

Before architecture is locked, the planning layer scores the risk profile of the declared architecture — not just identifying specific issues, but predicting which categories of problems are most likely to surface in the build phase, based on patterns from similar project types.

Horizon · Build Engine

Adaptive Build Pacing

Step granularity adjusts automatically based on context pressure and project complexity

Planned

The build engine learns how much work typically fits within a session for a given project complexity level, and adjusts step granularity automatically — breaking large steps into smaller ones when context pressure is detected, and batching small steps when headroom is ample.

Where we're going.
And why it matters.

What's live today.

Five category shifts.

Further out. Higher stakes.

Questions about the roadmap?

Where we're going.And why it matters.

What's live today.

Five category shifts.

Further out. Higher stakes.

Questions about the roadmap?

Where we're going.
And why it matters.