Not a chatbot. Not a copilot. A coordination engine that defines exactly what every AI agent must do, check, and verify before each session even begins.
Scaffold now operates with six flows, not five. What changed across v5.5 and v5.6 is the layer before and around those flows: the engine can begin from trusted context, coordinate deployment handoff, enter recommendation mode for the next version, surface richer product signals, and verify that the environment is truly ready before serious project work starts.
Version 2.2 was already beyond simple prompting, but the protocol still had visible ceilings. v3 solved multi-system coordination and first-class build targets. The later v3 releases added ML workflow coverage, better project structure, tiered audits, UI-readable state, and safer upgrades. v4 then made planning sharper, specialization real, and product-surface compatibility much cleaner. v4.8 through v5.0 turned the engine into a reusable protocol platform, and v5.1 through v5.6 made it more queryable, delivery-aware, recommendation-capable, and operationally ready for real project work.
Multiple entry point files caused friction. Users didn't know which to use. The wrong choice wasted the session.
Platforms like n8n and WordPress were integration targets. Agents called them, but never built on them as first-class surfaces.
API contract changes required manual user coordination. No automatic dependency scanner. No enforcement window.
The system assumed one backend. Multi-system architectures (microservices, external platforms, ML pipelines) had to be hacked in.
ML projects were treated as regular code. No training pipeline types, no evaluation gates, no model drift tracking.
One file. The system reads your project state and auto-routes to the correct flow. Zero user decision required. Zero wasted sessions.
Salesforce, Snowflake, n8n, WordPress — the agent builds artifacts directly on these platforms and drift-tracks them identically to code.
Any contract change triggers auto-dependency scan, alert propagation, 24-hour ACK window, and build block until all dependents are updated.
Coordinate backend, frontend, cloud infrastructure, external SaaS platforms, ML pipelines — all under one protocol, one health score, one drift system.
Auto-detected from architecture. Training + evaluation as first-class step types. Model drift tracked as a structural contract alongside code drift.
No project detected → full brainstorm + architecture + planning + audit + build protocol activated.
Architecture + plan exist → resume briefing, drift check, next build steps executed.
Architecture exists, no plan → validate architecture, generate phased build plan, enter audit.
Code exists, no plan → read codebase, reconstruct full planning document set by inference.
Live product detected → start from drift, health, backlog, and current operating condition before new work begins.
Live product asks what comes next → study shipped work, backlog, health, and priorities to recommend the next version.
Each phase is a defined protocol — not a suggestion. An agent executing Scaffold OS cannot skip a phase, cannot reorder steps, and cannot proceed without confirmed state.
Every session starts here. The agent reads the project folder, detects the current state, and routes to the correct flow — automatically. No user decision required. No ambiguity about where to start.
The system runs a structured cognitive protocol to extract architecture. It infers from everything in the folder first — existing docs, vocabulary, tech signals — before asking a single question. It detects your technical level and adapts the entire interview on the fly.
Before touching architecture, the system validates real demand: it asks who actually needs this product, what they do today without it, and what the minimum useful version looks like. Then an adversarial review argues against the plan before it's locked — surfacing the strongest objections while you can still change course, not after weeks of building.
Output: a complete production architecture with confirmed tech stack, full system topology, every external service with access method and agent role, security model, error and failure path map, observability plan, environment variables, and deployment target. Zero TBDs. Zero vague notes. Any gap found = critical finding.
Planning Mode: You choose how the engine approaches your plan — expand and surface what's missing, hold scope and make it bulletproof exactly as specified, or cut ruthlessly to the narrowest useful version. Same project, different modes for different moments.
Before a single line of code is written, the system runs two rounds of consistency checks across 13 audit dimensions. These range from schema-interface alignment and endpoint-feature mapping to deployment pipeline consistency and error code registry coherence.
Critical findings are fixed and re-audited. The build only begins when audit verdict is CONSISTENT. Issues found here take minutes to fix. The same issues found in production take days.
The build executes against a phased plan where every step is a clearly defined unit of work with explicit entry conditions, success criteria, and completion signals. Agents don't interpret the plan — they execute it.
Shared interfaces between systems are governed by contracts. If a contract changes — an API endpoint is renamed, a database schema shifts — an 8-step protocol fires: alert published, dependency scan across every system, 24-hour acknowledgment window. Unacknowledged = build blocked.
Every session begins with a mandatory drift check across every feature and every external system. This cannot be skipped. The system compares what was specified against what was built — and produces a per-feature drift state.
Code matches spec exactly. Proceed normally.
Minor deviation. Score -5. Investigate before new work.
Significant gap. Score -20. Must resolve first.
After drift is assessed, every post-launch work request routes to one of four structured tracks — each with its own execution protocol:
Timed hypothesis protocol with a hard 3-strike limit. After 3 failed attempts, the agent stops and presents a structured escalation block — hypotheses tested, outcomes, best current guess — rather than continuing to guess.
Full mini-spec cycle before any code is written. Each new feature gets a spec, acceptance criteria, and drift tracking from day one — same standard as the original build.
Debt items are weighed against active feature drift states and sprint capacity before scheduling. Debt that creates new risk gets elevated priority automatically.
Any change to existing feature behavior requires a brief evolution gate before execution — purpose review, side-effect analysis, spec update. Prevents silent drift accumulating through "quick changes."
If a session crashes, context is lost, or a new agent enters mid-build — the system recovers without human intervention. A structured recovery summary and progress markers become the source of truth for the restart.
Read the latest recovery summary and progress markers - the engine's explicit record of what finished cleanly and what comes next.
Read current project state, active queue context, and pending decisions to confirm the exact status of the build before doing more work.
Resume from the last completed step, then validate any in-flight work before continuing. Completed work is not blindly rerun.
Present the exact next action and continue. No re-explaining. No lost decisions. No starting over from phase zero.
Works across agent crashes, context window resets, tool restarts, and new agents joining an active build mid-session.
Every project is different. Scaffold OS detects its current state and routes automatically. You never have to know which file to run or which mode to enable.
No project detected. Routes to full brainstorm protocol — adaptive interview, challenge phase, complete architecture output.
Architecture file found, no build plan. Validates architecture for completeness, then generates phase-by-phase build plan.
Active build found. Runs a structured resume protocol: drift check, state reconstruction, and pickup at the exact interruption point.
Code exists without plan. System reverse-engineers the architecture from code, then manages it under full Scaffold OS protocol going forward.
Project is in production. Session begins with full drift audit, health score calculation, and backlog prioritization. Then every work request routes to the right track: Track A for bugs (3-strike escalation), Track B for new features (mini-spec required), Track C for tech debt (capacity-weighted), Track D for changes to existing behavior (evolution gate). Drift resolved before new work begins.
Project is live and the question is no longer "what are we building now?" but "what should we build next?" The engine studies shipped work, backlog, current health, and open decisions, then proposes ranked next-version options with one clear recommendation.
Scaffold OS activates the right skills for your project type — not a fixed list of roles that always run. 36 curated skills across 4 categories. 11 project profiles. Skills that match what you're building. Explore the skills system →
When the session begins on a new project, Scaffold OS acts first as a Solutions Architect. It reads everything already in the folder — existing documents, tech stack signals, vocabulary — and infers as much as possible before asking a single question. The structured brainstorm protocol extracts tech stack, system topology, every external service with its access method and agent role, security model, data flows, environment variables, deployment target, and scalability approach. The output is never vague. Every field is declared. Zero TBDs. Any gap found during architecture creation is a critical finding.
Before a single line of code is written, the system converts to a rigorous QA auditor. Two full rounds of consistency checks run across 13 compliance dimensions — schema-interface alignment, endpoint-feature coverage mapping, pipeline script validation, error code registry coherence, deployment artifact consistency, and more. Each issue is ranked by severity: CRITICAL (build blocked) or WARNING (flagged but continuable). Any CRITICAL finding must be fixed and re-audited before the build begins. Issues found here take minutes. The same issues found in production take days to weeks.
Builds the complete server-side system — API architecture and routing, database schema and migrations, authentication flows (JWT, OAuth, session handling), role-based access control, middleware chains, background job systems, caching layers, and rate limiting. Every endpoint is declared against the system's architecture specification before it's built. The agent knows exactly which endpoints belong to which feature phase and cannot implement ahead of plan. Completion signals are written to disk. The session doesn't move on until they are.
Builds the complete client-side application — component architecture, routing, state management, API integration layer, form handling, authentication flows, responsive layouts, and performance optimization. The frontend build step knows which backend endpoints are live and which are being built simultaneously — and never calls endpoints outside the declared contract. UI components are built against feature specifications, not developer interpretation.
Manages all external system connections as build artifacts — not as afterthought integrations. Every external service (API providers, SaaS platforms, webhook targets, third-party data sources) is declared in the architecture with its access method, authentication mechanism, rate limit constraints, and error handling strategy. The integration specialist role provisions access, builds the connection layer, and tests each endpoint against declared behavior — all as tracked build steps with completion signals.
Handles the full data layer — relational schema design with normalized relationships and proper indexing strategies, migration sequencing, data transformation pipelines (ETL/ELT), analytics view construction, data contract definition across services, and seeding strategy. When the project includes data warehouse components (Snowflake, Databricks, dbt), this role provisions those schemas and transformation models directly inside those platforms through their APIs — with the same drift tracking applied to code.
Security isn't a single phase — it's enforced throughout. The security model is declared in the architecture (auth approach, permission model, data isolation rules, encryption strategy, secret management). The pre-build audit includes security-specific dimensions: checking that every API endpoint has an explicit auth requirement, that IAM roles follow least-privilege principles, that secrets are never hardcoded. For enterprise projects with AWS or similar infrastructure, IAM policies are generated and tracked as build artifacts — not copy-pasted from docs and forgotten.
When a project includes machine learning components — detected automatically from architecture signals like PyTorch, TensorFlow, XGBoost, LangChain, or vector database declarations — the system activates ML-specific build protocols. A separate specification layer defines model architecture, training data sources, performance thresholds, and evaluation criteria. Training pipelines are built as first-class build step types. Evaluation gates enforce pass/fail against declared metrics before promotion. Model drift is tracked separately from code drift: BASELINE → DEGRADED → RETRAIN_REQUIRED.
Provisions cloud infrastructure as structured, versioned artifacts — not as one-off CLI commands that can't be reconstructed. AWS IAM roles, S3 bucket policies, Lambda functions, API Gateway routes, Cloudflare WAF rules, Kubernetes manifests, Docker configurations — all declared against the architecture spec and built as tracked steps. Infrastructure changes trigger the same contract protocol as code changes: scan for dependencies, alert affected systems, acknowledgment window before proceeding.
Creates automation workflows on external platforms as version-controlled build artifacts via their APIs — not manual GUI point-and-click configurations. n8n workflows are generated from structured JSON specs. Zapier Zaps, Make scenarios, and similar automation logic are built programmatically, tracked in the project's drift system, and compared against declared trigger/action logic on every session. A workflow that diverges from its spec is flagged as drift — same as any other feature.
Every session, every time — a structured state check runs before any build activity. This is the Technical PM function: what is the current health of the project? What features are IN_SYNC, DRIFTING, or DIVERGED? What's the exact phase of the build plan? What remains? The health score is calculated every session. Backlog is reprioritized based on current debt. No build session starts without knowing exactly where things stand. This is what prevents "I thought we built that" moments months into a project.
For projects with content management requirements, the CMS Architect role structures the entire CMS programmatically via CLI and REST API — not a human clicking through admin panels. WordPress custom post types, taxonomies, ACF field groups, metabox configurations. Strapi content types and permission policies. Ghost membership and tag structures. All scaffolded from code, tracked as build artifacts, and compared against declared content schema on each subsequent session. CMS schema drift is as detectable as API drift.
The most unique role — and the one no other system has. Every shared interface between systems (an API endpoint consumed by the frontend, a message queue schema used by two backend services, a database table read by the data pipeline) is governed by an explicit, versioned contract file. If any contract changes — an API endpoint renamed, a field added to a shared schema — an 8-step protocol fires automatically: alert published, dependency scan across all consuming systems, 24-hour acknowledgment window opened. Any unacknowledged system blocks the entire build. No silent breaking changes. Ever.
For projects that fit the standard stack profile, the system auto-detects eligibility and generates a fully configured local infrastructure setup — database, authentication provider, blob storage, and API gateway — all pre-wired and MCP-connected. This role is unique in what it eliminates: the "infrastructure tax" that typically consumes 30–50% of an AI build session before business logic is ever written. Non-technical users skip Docker entirely. The infrastructure materializes, the agents connect to it, and every session starts with a fully operational environment.
Why this matters: A typical raw agent spends hours debugging why a local database won't accept connections. That time is budget consumed building zero features. The infrastructure layer redirects 100% of the build budget toward functionality.
You have an existing codebase. No planning files. No architecture document. Maybe it was built before Scaffold OS existed. Maybe it was handed to you from a previous team. Maybe you built it yourself months ago and the context is gone. The Code Archaeologist reverses the problem: instead of planning then building, it reads what was built, infers what the planning must have been, and reconstructs a complete set of planning files from the codebase.
Every inferred section is marked with a verification flag. Depth scales with codebase size: full read for smaller codebases, intelligent sampling for larger ones. Output: complete architecture specification, schema reconstruction from migrations/models, feature folders for detected feature areas, and API contract reconstruction from routes.
No other AI build system has a formal recovery protocol. When a session crashes mid-build — context limit hit, timeout, agent error — raw systems require you to explain everything again from scratch. Scaffold OS has a defined recovery sequence.
Read the latest recovery markers and locate the last safe progress point
Read the current recovery summary ? reconstruct current build position, open blockers, and active alerts
Run drift check — verify what was built since last session matches what was planned, surface any divergence
Present a concise resume summary ? what completed, what was in progress, and what comes next
Continue. No re-explanation. No guessing. The next agent picks up exactly where the last one stopped.
"We were 60% through building the order management feature when the session crashed. The next session didn't know what had been done. I had to manually read through 800 lines of code and re-explain what was built. Even then, the agent made conflicting assumptions about what the next feature should touch."
Every build step leaves behind structured recovery markers so the system can recover even if an agent crashes mid-feature, not just between phases. Future sessions inherit a clean paper trail instead of reconstructing state from scratch.
Scaffold OS is built on top of advanced AI capabilities — but uses them with surgical precision. The right cognitive function is invoked at exactly the right moment in the protocol. Not general-purpose chat. Purpose-built intelligence firing at the moments it actually matters.
The brainstorm protocol reads your vocabulary, response patterns, and the confidence with which you describe technical concepts — and calibrates the entire interview on the fly. Detected as NON_TECHNICAL: the agent takes on full architectural decision-making — it doesn't ask you to choose between Postgres and MySQL. Detected as TECHNICAL: peer-level debate and structured disagreement, right up to the challenge phase.
Three technical levels. One automatic detection. No configuration.
Before any architecture is finalized, the system enters a formal challenge session. It raises up to 8 structured challenges across blind spots, scaling risks, security concerns, and architectural contradictions. The agent is explicitly designed to disagree here — not to validate your choices. Every challenge that survives makes the architecture stronger. Every challenge that fails gives you confidence your original decision was sound.
Disagreement as a feature. Not a bug.
Certain protocol moments require specialized cognitive depth: architectural validation, deep security review, ML model specification, infrastructure topology reasoning. At these exact moments, Scaffold OS invokes purpose-built intelligence functions — not the same general model that writes code. The brainstorm intelligence that extracts architecture is not the same function that audits it or builds it. The right cognitive tool at the right moment.
Specialized modules. Precise invocation timing.
AI context windows are finite. Scaffold OS solves this not by ignoring it, but by compressing aggressively at phase boundaries. Between brainstorm and build, between sessions, between major milestones — the system distills the full session history down to exactly the information subsequent phases need. Each new session starts with precise, dense, relevant state — not raw conversation history from three sessions ago. This is why sessions don't degrade as a project grows.
Phase-boundary compression → consistent quality across all sessions.
The contract enforcement layer was built specifically to enable this. Once active, multiple specialized agents run simultaneously — a backend agent and frontend agent building in parallel, each with their own context slice, governed by the same contract system. When the backend changes a shared endpoint, the contract protocol fires — the frontend agent is notified, awaits acknowledgment, and the build continues. Parallel builds. Zero silent breaking changes. Build time drops 60-70% on complex projects.
The contract system was designed from day one for this moment.
From SaaS products to ML platforms. Every complex software system is supported.