No marketing fluff. If you're wondering whether Scaffold OS is the right tool for you — this page will give you a straight answer.
Scaffold OS is a coordination protocol — a structured, closed-source engine that transforms how AI agents approach software development. Instead of giving an AI a vague description of what you want and hoping for the best, Scaffold OS runs agents through a defined sequence: a structured brainstorm to extract architecture, a pre-build audit to catch contradictions before code is written, and a build plan where every step has explicit entry conditions and verifiable completion signals.
As of v4.9, Scaffold OS is also the canonical owner of workflow state — maintaining machine-readable records for project status, decisions, review readiness, and deploy readiness that any surface can consume without parsing internal files. As of v5.1, it's a reusable protocol platform: multiple products and surfaces can sit on top of the engine without requiring access to its internal implementation.
The result is that complex, multi-system software projects — the kind that require five engineers over three months under normal conditions — can be built in a fraction of the time with a fraction of the team. Not because the AI is smarter, but because the structure forces it to think before it acts.
Scaffold OS is currently in transition from a proprietary protocol system to a SaaS platform (launching soon). The system is a coordination engine — a structured set of protocols, planning files, and session state management that runs on top of AI coding agents.
You don't install it like a traditional app, and it's not a simple prompt you paste into ChatGPT. It's a complete build methodology that changes how the entire session is structured, what gets documented, and how build decisions get verified.
Solo developers and small teams who want to build production-grade systems that would normally take a full engineering team. If you're technical enough to know what you want to build but have struggled to get AI coding tools to actually produce it consistently — Scaffold OS is for you.
Non-technical founders building SaaS products. The optional infrastructure layer and technical-level detection in the brainstorm protocol mean you don't need to know Docker or PostgreSQL configuration to get a working backend.
Freelancers and agencies building complex client software. The multi-project management features on the roadmap are designed specifically for this use case.
Scaffold OS is not for people who want to generate a one-page landing page or a simple CRUD app. For small, simple projects — regular AI coding tools are probably enough.
Scaffold OS is stack-agnostic by design. The protocol governs how builds happen — not what tech stack is used. In practice, it works best with the most common production stacks:
Backend: Node.js, Python (FastAPI, Django, Flask), Go, Ruby
Frontend: React, Next.js, Vue, Svelte
Databases: PostgreSQL, MySQL, MongoDB, Supabase, Firebase
Cloud: AWS, GCP, Vercel, Railway, Render
ML/AI: PyTorch, TensorFlow, LangChain, Hugging Face, vector databases
The stack is selected during the brainstorm phase and locked into the architecture file. Every subsequent session builds against that spec — no drift, no confusion about what technology to use.
Cursor and Copilot are code completion and editing tools. They're excellent at helping you write code faster in a file you already have open. They don't know what your overall system architecture should look like, they don't track which features are built vs. pending, and they don't prevent you from changing a shared API interface without updating everything that depends on it.
Scaffold OS is a build coordination system. It operates at the project level, not the file level. It manages architecture decisions before code is written, tracks the state of each feature area across sessions, enforces contracts between systems, and prevents the build drift that makes long-running AI-assisted projects collapse.
The honest framing: Cursor and Copilot help human developers write code faster. Scaffold OS orchestrates AI agents to build entire systems from specification to production.
Devin is an autonomous AI agent that takes a task description and attempts to complete it independently. The demo results are impressive. The real-world results on complex projects are mixed — because autonomous agents without structured protocols run into the same problems: they lose context, they make architectural decisions mid-build that contradict earlier decisions, and they produce code that works in isolation but fails in integration.
Scaffold OS's approach is different: rather than maximizing autonomy, it maximizes the quality of the protocol the agents execute. The agent doesn't decide what the architecture should be — you and the system decide together in a structured brainstorm. The agent doesn't decide when a step is complete — a real verification command decides that. The agent doesn't decide how to handle a contract change — an 8-step protocol handles that.
The result is more consistent, more auditable, and more controllable than fully autonomous agents — which matters when you're building production systems, not demos.
If you're building small, well-scoped projects (landing pages, simple CRUD apps, single-endpoint APIs) — you probably don't need Scaffold OS. Existing tools are great for that.
The pattern that Scaffold OS solves is: you start a complex project with an AI tool, it goes well for the first session, starts losing coherence by session 3, and by session 6 you're fighting against the code that was generated in session 2 rather than building new features.
If that pattern sounds familiar — multi-session projects, multi-system architectures, external platform integrations, teams of more than one person working on the same codebase — that's the exact problem Scaffold OS was built to solve.
The most expensive failure mode in software development isn't technical — it's building the right product for the wrong audience, or a perfectly engineered solution to a problem nobody actually has. Previous versions of Scaffold OS produced thorough architecture documents without ever challenging whether the product was worth building.
v4.1 adds three mandatory questions before product discovery even begins: Status Quo (what do target users do today without this product — walk through their actual current process), Desperate Specificity (describe one specific person who would pay for this right now, not a user segment — a person), and Narrowest Wedge (what is the absolute minimum version that would be genuinely useful to that person this week).
The narrowest wedge answer becomes the reference point for every scope decision throughout the project. When REDUCTION planning mode is active, features are measured against it. When scope creep is suggested, it's challenged against it.
If the user can't answer these questions specifically — gaps are flagged and recorded, not silently accepted. The plan proceeds with explicit notes about what's unvalidated.
After the challenge phase and before the architecture is locked, the system argues against its own plan — presenting 2–3 specific, honest objections to the approach. These aren't generic "have you considered security?" warnings. They're targeted: "Your primary user is X, but this part of the architecture is designed for Y — that mismatch will cost you in v2." Or: "The plan doesn't address the hardest problem you identified — it works around it."
You then respond: unchanged (objections don't change anything), adjusted (something should change), or partial. The objections and your response are recorded in the session record. You can't skip the adversarial review — but you can respond with "no changes" and move on immediately.
The purpose isn't to block you. It's to make sure the strongest case against the plan has been heard and considered before weeks of building commit you to it. Architecture decisions made before code is written cost nothing to change. The same decisions made in week 3 can cost everything.
Before generating planning files, you choose how the engine approaches your plan. There are three modes:
HOLD SCOPE (default): Make what's in the architecture bulletproof exactly as specified. No scope additions, maximum rigor. Use this when you've thought through the architecture carefully and want the engine to harden it, not expand it.
EXPANSION: Surface what's missing. After generating required files, the engine runs a pass identifying gaps — unconsidered edge cases, missing error paths, features you might want — and asks if you want to add them. Notes them in the backlog as suggestions even if you decline. Use this early in a project when you want a thorough first pass.
REDUCTION: Cut ruthlessly. Features not required to prove the narrowest wedge are moved to the backlog's Deferred section and removed from Phase 1. Everything is cross-referenced against the minimum viable version. Use this when you're over-scoped and need to get to a shippable v1 fast.
Drift happens when what's actually built diverges from what was specified. In a long-running project, a feature might be partially implemented, then a later build step changes how a shared interface works, and now the earlier feature is silently broken — but no one knows because no system is tracking it.
Scaffold OS tracks every feature area against its specification using a three-state system: IN_SYNC (built matches spec exactly), DRIFTING (divergence detected, within tolerance), or DIVERGED (significant departure from spec, build blocked until resolved). This check runs at the start of every session — before any new code is written.
The practical impact: bugs that would have been discovered at integration time (when they're expensive to fix) are caught at session start (when they're cheap to fix).
Any time a shared interface changes — an API endpoint signature, a database schema field, an event payload format — the 8-step contract change protocol fires automatically. The steps are: (1) stage the change, (2) scan all systems that depend on this interface, (3) publish an alert to every dependent, (4) open a 24-hour acknowledgment window, (5) coordinator review, (6) update each dependent system, (7) audit to confirm consistency, (8) resume build.
No code that depends on a changed interface ships until every dependent system has been updated and the change has been audited. This is what prevents the "we changed the API and now the frontend is broken" problem that makes multi-system projects painful.
The session recovery protocol handles this automatically. When a new session starts, the engine reads the latest recovery summary and progress markers first, then reconstructs the exact build status from current workflow state and pending decisions. It resumes from the next incomplete step instead of blindly replaying already-finished work.
No re-explanation required. No guessing about what was done in the previous session. The build continues from the exact next action with the right context already prepared. This is one of the most underrated features in production use - crashes happen, and without this, every crash loses significant context.
Scaffold OS uses a 3-tier audit architecture that runs continuously — not just before releases.
Tier 1 — Spot Audit (5–10 min): Five rapid checks that run automatically after every build cycle as part of continuous mode. Catches regressions in spec consistency, contract integrity, security flags, test coverage, and dependency drift before they compound into bigger problems.
Tier 2 — Focused Audit (30–60 min): Six-dimension deep investigation triggered automatically when Spot Audit flags risk, before a major feature merge, or when contract changes occur across systems. Covers data flow tracing, cross-system consistency, security end-to-end, agent dry-run simulation, spec/dependency cross-validation, and fix verification.
Tier 3 — Full Pre-Build Audit (2–4 hrs): Two-round 13-dimension gate that runs before major build phases. Round 1 covers 7 dimensions (internal consistency, completeness, verification command quality, security model, build sequence logic, setup executability, feature spec quality). Round 2 covers 6 dimensions (data flow trace, Round 1 fix verification, spec/dependency cross-validation, agent dry-run, security end-to-end, machine column audit). Build begins only when both rounds return a CONSISTENT verdict.
Tier selection is automatic — the continuous mode engine decides which tier to invoke based on what changed in the last cycle. Architecture problems caught at Tier 3 cost nothing to fix. The same problems caught at session 8 can cost days.
Scaffold OS was built to handle the projects that raw AI prompting can't reliably produce. Specifically:
Full-stack SaaS products — backend API, frontend application, authentication, database, payment processing, and email — all in one coordinated build, not as isolated pieces.
Enterprise systems — Salesforce implementations, Snowflake data warehouses, dbt transformation pipelines, AWS infrastructure — built as tracked artifacts with the same contract enforcement as application code.
ML/AI products — training pipelines, evaluation gates, RAG systems, agentic architectures, fine-tuning workflows — with ML-specific drift tracking separate from code drift.
Automation-first systems — n8n workflow automation, webhook-driven architectures, scheduled job orchestration — as first-class build outputs, not afterthoughts.
Existing codebases — archaeology mode reverse-engineers what's already built, reconstructs the planning files, and brings the codebase under full protocol management without rewriting anything.
Code Archaeology is the dedicated v5.3 flow for importing an existing repository before any build work begins. It reads the codebase, infers the architecture from what was actually built, and reconstructs a complete set of planning files: architecture specification, schema reconstruction, feature area folders, and API contract registry.
Every inferred decision is marked with a [VERIFY] flag - so you can review what the system reconstructed against what you know to be true before continuing.
Once the archaeology session completes, the codebase is under full v5.3 protocol management: drift detection, contract enforcement, health scoring, structured resumes, and build planning from that point forward. Your existing codebase effectively gets a modern planning system retrofitted onto it.
It depends heavily on project complexity. A simple SaaS with standard auth, CRUD API, and basic frontend might complete the core in 3–5 build sessions. A complex multi-system product (backend + frontend + payment processing + automation + cloud infrastructure + external platform integrations) might take 15–25 sessions across a few weeks.
What Scaffold OS changes isn't just the speed — it's the predictability. Traditional AI-assisted builds often hit a wall at session 5–8 where progress slows dramatically because earlier decisions are fighting new ones. With Scaffold OS, session velocity stays consistent because the protocol prevents the architectural decay that makes late sessions slow.
Scaffold OS is currently in early access — we're onboarding teams directly before the public SaaS launch. Request access via the form on the homepage and we'll reach out directly.
Early access gives you: direct communication with the founding team, input on which features and build surfaces to prioritize, priority support during your builds, and preferred pricing locked in before public launch.
Pricing is actively being figured out — we'd rather get this right than rush it. We have a dedicated pricing exploration page where we're thinking through the options transparently. The short version: it will be usage-based in some form, with the goal of aligning cost closely with value delivered (complexity of what you're building, not just raw compute).
Early access teams are being offered preferred pricing that locks in before public launch. See the pricing exploration page →
We're exploring this as part of the pricing model. Our current thinking is that a free tier makes sense for getting familiar with the protocol (first project, limited build surfaces), with paid tiers for production use. Nothing is finalized yet.
If you join early access now, you'll have direct input on what the trial/free tier looks like and guaranteed access to it when it launches.
v4.8 introduced the first-class Update Cycle — a named, structured model for post-build work. Before v4.8, changes after launch were managed as a rolling backlog. There was no formal record of what each change round involved, where it stood, or what it produced beyond session files.
The Update Cycle changes this: each round of post-launch changes is now a defined cycle with its own scope, workspace, and durable record. That record covers the request, context, plan, build, review, audit, and result — so every change round is independently traceable from start to finish.
In practice: when a build is live and you want to add a feature or fix a bug, the engine opens a named Update Cycle rather than just continuing in an open-ended session. The cycle runs to completion, gets archived with its full record, and the next change round starts fresh — with no ambiguity about what the previous round established.
Before v4.9, any product or surface built on top of Scaffold OS had to infer project state by parsing scattered session files — which meant making assumptions about internal structure that could break when the engine evolved. This was a real problem for anyone building integrations or dashboards on top of the engine.
v4.9 fixes this by making Scaffold OS the canonical authoritative owner of workflow state. The engine now maintains compact, machine-readable records for every meaningful workflow signal: what phase the project is in, what the current health score is, what decisions are pending, whether the build is ready for review, whether it's ready to deploy, and what the session execution contract looks like.
The practical effect: any surface — a dashboard, an integration, an automation — can read current project state from one place, in a stable format, without parsing session files or making guesses about internal structure. Workflow state is no longer inferred. It's declared by the engine directly.
v5.1 makes Scaffold OS a reusable protocol foundation that multiple products and surfaces can sit on top of. Before v5.0, the engine was designed around a single-surface model — one surface consuming the protocol. v5.1 adds the infrastructure needed to support multiple surfaces, products, and integrations simultaneously.
Concretely, v5.1 adds: a target registry (canonical manifest of all declared build targets with per-target readiness), an integration registry (per-integration health and config tracking), multi-track work queue management (parallel workstreams without collision), a lifecycle event history (wrapper-safe, append-only event stream any surface can query), and visibility and packaging controls (fine-grained rules for what leaves the project boundary).
The wrapper execution contract is also extended in v5.1 — surface identity, capability declarations, queue position, and event cursor metadata are now part of the contract. This means any wrapper surface can declare what it can do, where it is in the lifecycle, and what events it has already consumed. The engine handles the rest.
Scaffold OS is a closed-source, proprietary protocol engine. The internal implementation — the protocol files, skill definitions, artifact formats, and workflow state schemas — is not publicly available and not open source.
What is public is the capability surface: what Scaffold OS can build, how the protocol phases work conceptually, what the quality gate system enforces, and what the current engine capabilities are. Everything on this site describes the engine from the outside in — what it does and how you interact with it, not how it's internally implemented.
This is intentional. The engine's protocol design is the proprietary element of real value. Making it available publicly would commoditize the approach, which we're not willing to do. If you're evaluating Scaffold OS, the right question is whether the outputs and the guarantee structure are worth the access cost — not whether you can read the source files.
Scaffold OS uses multiple model tiers through an internal routing engine — the right class of model is selected automatically based on what the current phase requires. Architecture and reasoning-heavy phases (brainstorm, challenge, audit) use frontier-class reasoning models. Code generation and scaffolding phases use faster, lower-cost tiers optimized for output velocity.
We don't expose specific model names because we route across providers and update the routing as better models become available. You get the best available capability for each task without needing to think about model selection.
External platform connections happen through MCP (Model Context Protocol) servers — a standardized, bidirectional connection layer that lets agents interact directly with external systems. When Salesforce is declared as a build target in the architecture, the agent connects to it via an MCP server and creates the declared schema objects directly, with live confirmation.
For platforms that require browser-based management (dashboards, admin consoles), browser automation handles the verification step — the agent navigates to the platform's interface and confirms what was built matches the spec. This is what makes "real build target vs. integration you hope works" a real distinction.
v4.6 introduced a formal domain skills system that replaces the older "specialist roles" framing. The key shift: instead of a fixed list of roles the protocol always runs, Scaffold OS now activates skills selectively based on your project type.
There are 36 curated skills across four categories — Planning, Specialist, Quality Gate, and Domain — that are matched to your project using one of 11 project profiles (SaaS Product, Enterprise Platform, Mobile App, ML Platform, etc.). When your project is detected as an ML Platform, ML-specific skills activate automatically. When it's a SaaS product, the SaaS-specific skills load instead.
Beyond the 36 curated skills, there's an extended catalog of 1,300+ on-demand skills across 81 classified groups that agents can request from a central skill server when they need specialist capability not included in the core set. None of these reference internal tool names or file paths — they're all capability descriptions focused on outcomes.
Quality gate skills enforce PASS / ADVISORY / FAIL outcomes rather than just flagging issues. If a skill returns FAIL, the build is blocked. ADVISORY results can be overridden with explicit acknowledgment. PASS clears normally. See the full skills system →
At the end of the brainstorm phase, the protocol matches your project to one of 11 project profiles based on the architecture you described — SaaS Product, ML Platform, E-Commerce, Mobile App, etc. The matched profile determines which domain skill loads automatically (e.g., the SaaS Product Specialist for a SaaS build, the ML Platform Specialist for an ML build).
Beyond the profile-matched domain skill, planning skills run automatically for every project (demand validation, adversarial review, failure path mapping). Specialist skills activate when the architecture calls for them — if your project uses a database, the Database Architect activates. If it touches external APIs, the Integration Specialist activates.
Quality gate skills run at defined checkpoints. Which gates are active and what their thresholds are is declared in your architecture document before the build starts — so there are no surprise blocks mid-build.
The build stops. Not pauses — stops. A FAIL verdict from a quality gate means the declared standard wasn't met and the build cannot continue until it is resolved. The agent surfaces exactly what failed and what needs to change, but it does not proceed to the next step.
ADVISORY verdicts are different — they surface an issue but allow you to acknowledge it and continue if you have a reason to. The acknowledgment is logged with a timestamp and your stated reason. This creates an audit trail of every override — nothing is silently bypassed.
PASS verdicts continue the build normally. Most gates return PASS on well-structured projects with a complete architecture spec.
No. The skills system is fully automatic. You describe your project in the brainstorm phase exactly as you always have. The protocol detects the project type, matches it to a profile, and loads the relevant skills without any configuration required from you.
The only decision you make is confirming or adjusting the detected profile at the end of brainstorm — a single question with a recommended answer. If the auto-detected profile is correct, you confirm it. If it's slightly off, you select the right one. That's the full extent of skills configuration.
The skills themselves run in the background. You won't see "skill invoked" messages in the middle of a build — you'll just notice that the architecture review is more thorough, quality gates actually block bad moves, and specialist knowledge appears exactly when it's needed.
Each role represents a distinct cognitive mode the system switches into at the appropriate phase. You don't manually invoke them — the protocol activates the right role based on what's being built.
When the architecture is being designed: Solutions Architect and AI Engineer mode. When security decisions are being validated: Security Engineer mode. When the database schema is being built: Data Engineer mode. When ML pipelines are detected: ML/AI Engineer mode. When an existing codebase is being read: Code Archaeologist mode.
In practice this means: the same session that plans the architecture, designs the database, writes the security model, and scaffolds the ML pipeline is doing so with a different cognitive approach for each — not just one "general coding assistant" mode applied uniformly to everything.
Yes. Your architecture files, build plans, session state, and code are your own. Scaffold OS is a coordination protocol running on top of your environment — your project data stays in your repository and your local planning folder. We will publish a clear data handling and privacy policy before the public SaaS launch.
After every build session, Scaffold OS computes a 0–100 health score for the project based on four inputs: sync drift (features that have diverged from their spec cost the most points), code drift (features that are drifting but not yet broken cost fewer), open decision debt (unresolved decisions from past sessions), and session staleness (time elapsed since the last active work session).
The score is always available as a single, machine-readable signal. Any surface built on top of the engine reads this file — there's no recomputation happening on the surface side. The formula is canonical and consistent across every project and every surface. A score of 80+ is considered healthy for active projects. Below 60 typically means accumulated drift or significant open decisions that need resolving.
The health score is supplemented by a complexity signal generated before the first build session — a structured estimate of how large the build will be: feature count, integration count, estimated number of sessions needed, and a complexity tier (low/medium/high/extreme). This gives you realistic expectations before any code is written.
Yes. In v5.3, one brainstorm question sets the project's git strategy - GitHub Flow, trunk-based development, GitFlow, main-only, or a custom model. The engine can recommend the common default when nothing unusual is detected, but you still confirm the final branching model before the project uses it everywhere.
From that point, deploy steps, PR creation, and release tagging are personalized to your declared model. If your project uses GitHub Flow or another PR-required setup, the engine drafts the full pull request content at the end of each feature session - title, summary, files changed, testing instructions, and merge conditions. The surface layer submits it; the engine drafts it. Nothing is hardcoded on the surface side for CI/CD behavior.
Release note drafts work the same way: when release tagging is enabled, a structured release note draft is generated at build completion, including version numbers formatted according to the project's declared tagging format.
v5.4 adds a dedicated context layer. Instead of treating uploads as one-off brainstorm attachments, the engine registers each source, tracks how it was interpreted, and prepares a compact trusted summary for later sessions. That means the next run can resume from normalized context instead of starting cold.
For spreadsheets specifically, the engine can now inspect workbook shape, named ranges, formula relationships, and likely automation patterns before planning begins.
Existing repositories now produce richer continuity signals during archaeology. Instead of only receiving a narrative summary, product surfaces can work with hotspots, ownership clues, dependency risk, entry points, and clearer extend-versus-rebuild guidance.
That makes existing-codebase onboarding feel more honest and more usable, especially when a team is inheriting a system that already has meaningful history and technical debt.
Many businesses already encode their real workflow inside spreadsheets. Earlier systems could read those files, but they mostly looked like attachments. v5.4 treats them more structurally, which helps the engine infer process shape, data flow, reporting logic, and likely automation needs earlier.
In practice, that means smarter brainstorm questions and a better chance of turning a spreadsheet-driven manual workflow into a well-scoped software system.
Yes. v5.4 adds an engine-owned secret scan artifact written before any git push or wrapper-managed publish action. If a blocking secret is detected, the engine pauses the workflow and surfaces that state explicitly instead of letting a release proceed silently.
If optional security tooling is available during audit, the engine can also write a structured security scan artifact for wrapper UI and review workflows.
No. The five-flow model stays intact. What changes is the preflight layer before those flows proceed. The engine understands more context up front, so each flow starts from better material.
That makes the release broad without being disruptive: teams get better intake, better continuity, and safer release behavior without relearning project entry from scratch.
v5.5 moves more deployment judgment into the engine itself. Instead of leaving delivery semantics to each surface, the protocol can now describe what should ship, which environment expectations matter, what checks should run, and what result the surface needs to hand back.
That makes deployment follow-through more consistent across wrappers without pretending the engine itself has to own every provider-specific implementation detail.
Yes. v5.5 makes rollback planning more explicit. The engine can now carry a clearer rollback path as part of delivery follow-through, which is much safer than leaving recovery logic entirely to product-side guesswork.
It also stays honest about limits: some recovery work can be coordinated well, while some high-risk data changes still need deliberate human oversight.
Recommendation mode is the new next-version planning workflow added in v5.5. Instead of opening a freeform conversation about "what should we build next," the engine can study shipped work, backlog pressure, health, prior decisions, and open product gaps, then rank the strongest next-version options.
In practice, that means a live product can ask for its next best move and receive structured candidates with one clear recommendation instead of a loose brainstorm.
Because after a release as broad as v5.5, the main remaining risk is false confidence. v5.6 exists to prove the environment, setup path, and release-readiness checks are actually trustworthy before real project work begins.
It makes the platform more serious operationally: less setup drift, less hidden confusion, and a much clearer answer to "are we truly ready to start?"
Yes, just in a different way. Users and wrapper teams get clearer environment checks, stronger release validation, and a cleaner first-run setup path. Those are not flashy UI features, but they meaningfully reduce wasted time and bad starts.
For serious projects, that matters as much as a brand-new workflow surface because it decides whether the engine can be trusted before the work even begins.
Reach out and we'll walk through your use case directly.