12-Factor Agents Scorecard: Octavus vs LangChain vs CrewAI

The 12-Factor Agents framework by HumanLayer defines twelve principles for building production-ready LLM applications. It's a reaction against the common pattern where agent frameworks hide prompts, own control flow, and manage context opaquely.

Scoring Methodology

Each factor is rated Strong (2 pts), Partial (1 pt), or Weak(0 pts) based on how well the platform's default architecture supports the principle out of the box — not whether it's theoretically achievable with enough custom code.

● Strong◐ Partial○ Weak

Factor-by-Factor Comparison

12-Factor Principle	Octavus	LangChain	CrewAI
Factor 1Natural Language → Tool CallsLLM converts language to structured JSON tool invocations	● StrongYAML-defined tool schemas with typed parameters. Supports both LLM-decided (agentic) and deterministic tool calls.	● StrongNative tool-calling via @tool decorator, structured output support, ReAct pattern built-in.	● StrongAgents get tools by role. Structured output via Pydantic models. Tool delegation between agents.
Factor 2Own Your PromptsDirect control over exact prompts, no hidden framework prompts	● StrongPrompts are standalone .md files in your repo. Full git versioning. No hidden system prompts — you write what the LLM sees.	○ WeakAgentExecutor uses internal prompt templates (ReAct, MRKL). Customizable but framework controls structure. Many hidden defaults.	◐ PartialRole/goal/backstory abstractions generate prompts under the hood. Internal prompts can be customized but aren’t fully transparent by default.
Factor 3Own Your Context WindowExplicit control of what enters the LLM context at each step	● StrongHandler blocks with explicit input mapping control exactly which variables enter each prompt. Threads isolate context. References provide on-demand context.	○ WeakAgentExecutor appends scratchpad automatically. trim_messages exists but context is largely managed by the framework internally.	◐ PartialShared memory system (short/long-term). Context passed between agents, but the framework decides what gets injected. Limited explicit control.
Factor 4Tools Are Structured OutputsTools are just JSON schemas that trigger deterministic code	● StrongTools defined declaratively in YAML with typed params. Tool execution happens on your server. Clean structured output → your code pipeline.	● StrongTools are Python functions with Pydantic schemas. Clean separation between LLM output and execution.	◐ PartialTool interface via BaseTool subclass. Works well, but the framework adds agent delegation/autonomy layers on top that blur the boundary.
Factor 5Unify Execution & Business StateAgent state and business data stored together, not in separate systems	● StrongSessions natively store resources, variables, and conversation history together. set-resource blocks update persistent state within execution flow.	○ WeakAgentExecutor is in-memory and ephemeral. No native state persistence. Business state must be managed externally.	◐ PartialFlows have state management, but Crews are in-memory by default. Long-running tasks need external stores (Redis, DB) bolted on.
Factor 6Launch / Pause / ResumeAgents can pause for external input and resume from saved state	● StrongSession-based architecture is inherently pause/resume. Sessions persist, client reconnects, execution continues. Built-in session restore API.	○ WeakNo native pause/resume. AgentExecutor runs to completion. Requires custom middleware or migration to LangGraph for this capability.	◐ PartialHuman-in-the-loop support exists. Flows can pause at steps. But Crews lack native checkpoint/resume for arbitrary interruption.
Factor 7Contact Humans with Tool CallsHuman interaction is a first-class tool, not an edge case	● StrongTriggers include user-action (buttons, forms). Tools can be client-side. The entire model is conversation-first with human interaction as core primitive.	◐ PartialHuman-in-the-loop middleware exists. Not built into core AgentExecutor — requires add-on configuration.	● StrongNative human-in-the-loop with guardrails. Hierarchical manager pattern naturally supports approval flows. Training includes human feedback loops.
Factor 8Own Your Control FlowYou control the loop, branching, and execution — not the framework	● StrongHandlers are explicit execution sequences. You define each block (next-message, tool-call, set-resource) in order. Deterministic + agentic modes coexist.	○ WeakAgentExecutor owns the loop (reason→act→observe). Limited escape hatches. Custom control requires subclassing internals or moving to LangGraph.	◐ PartialFlows provide conditional logic, loops, routers. But Crews still run internal autonomous loops. Dual-layer model gives partial control.
Factor 9Compact Errors into ContextFeed errors back into the LLM context for self-correction	● StrongSDK-level error handling with add-message blocks to inject structured error context into the next turn. Errors feed back into the conversation for self-correction without ad-hoc glue.	◐ PartialAgentExecutor supports max_iterations and error handling. Scratchpad captures errors but not always compactly. Retry middleware available.	◐ PartialAgents can retry and the framework feeds tool errors back. But compaction/summarization of errors isn’t built-in — context can bloat.
Factor 10Small, Focused AgentsMicro-agents with narrow scope, not monolithic do-everything agents	● StrongEach agent is a self-contained folder with its own protocol. Worker agents handle discrete tasks. Agents compose into interactive agents.	◐ PartialNothing prevents small agents, but the framework doesn’t enforce it. Tends toward monolithic patterns with large tool bags.	● StrongCore philosophy: role-based specialist agents with defined expertise. Crew = team of focused agents. Hierarchical delegation is first-class.
Factor 11Trigger from AnywhereAgents invoked via API, webhook, cron, UI — not just chat	● StrongTriggers are protocol primitives: user-message, user-action, API calls. Sessions created via SDK from any backend. Multi-modal invocation by design.	◐ Partialinvoke() accepts any input. LangServe exposes REST APIs. But trigger types aren’t a first-class abstraction in AgentExecutor.	◐ Partialkickoff() accepts programmatic input. Flows are event-driven with @start/@listen decorators. API deployment via AMP platform.
Factor 12Stateless Reducer PatternAgent is a pure function: (state, event) → new state	● StrongHandlers apply events to session state deterministically: each step reads current state and produces the next state. Server-side persistence gives you reducer-style semantics with production-grade durability.	○ WeakAgentExecutor maintains internal mutable state (scratchpad). Not designed as a stateless reducer. Requires LangGraph for this pattern.	◐ PartialFlows have state objects passed between steps (closer to reducer). But Crews maintain internal agent state that’s harder to serialize.

Overall Alignment Score

Scoring: Strong = 2 pts, Partial = 1 pt, Weak = 0 pts — out of 24 max.

Octavus

24 / 24

CrewAI

15 / 24

LangChain

8 / 24

Per-Product Analysis

Octavus

Philosophy: Protocol-driven agent orchestration. Define agent behavior declaratively in YAML, own your prompts as markdown files, control execution flow with explicit handler blocks.

Strongest alignment with the hardest factors: prompts you own (2), explicit context (3), unified session state (5), pause/resume (6), and control flow you define (8).
Production primitives by design: sessions, streaming, typed tools on your infrastructure, and human-in-the-loop as first-class triggers.
Bottom line:Built for teams that want the manifesto's principles without fighting the framework.

LangChain (AgentExecutor)

Philosophy: Abstraction-first toolkit. LangChain provides agent classes that bundle prompt templates, scratchpads, and execution loops into convenient abstractions.

Excellent tool-calling primitives, huge ecosystem of integrations (500+ tools), great for prototyping, large community.
The archetype of what the 12-factor manifesto warns against. AgentExecutor hides prompts, owns the loop, manages context internally. LangChain themselves recommend migrating to LangGraph for production.
Key gap: No native pause/resume, no state persistence, no explicit context control. Works great in demos, then you need to reverse-engineer internals to debug.

CrewAI

Philosophy: Role-based team collaboration. Define agents with roles, goals, and backstories, then let them collaborate autonomously or via Flows.

Strong on Factor 10 (small focused agents) and Factor 7 (human-in-the-loop). Dual-layer architecture (Flows + Crews) provides both deterministic control and agent autonomy.
Prompt ownership is indirect (role/goal generates the prompt). Context window management is limited. Crews are in-memory by default.
Key gap:Flows layer follows the principles well, but Crews still operate as opaque autonomous loops. A “two-speed” system where one layer is 12-factor compliant and the other isn't.

The Big Picture

The 12-factor agents manifesto is fundamentally a reaction against framework abstraction— the pattern where tools hide prompts, own control flow, and manage context opaquely. This makes it inherently critical of traditional agent frameworks.

LangChain (AgentExecutor)is the archetype of what the manifesto warns against. It's an excellent prototyping tool and its ecosystem is unmatched, but its core abstraction violates the principles most important for production reliability (factors 2, 3, 5, 6, 8, 12). LangChain themselves have acknowledged this by building LangGraph as the production-oriented alternative.

CrewAIoccupies the middle ground. Its Flows layer is genuinely production-oriented and addresses many 12-factor concerns — state management, conditional logic, event-driven execution. But the Crews layer (the thing that makes CrewAI distinctive) still operates as an autonomous black-box loop. The result is a hybrid where your overall orchestration can be 12-factor compliant, but individual agent teams are not.

Octavusis the 12-factor-native choice: protocol-driven, YAML-defined, and explicitly controlled so what you ship matches what the manifesto describes. For production agents where you own prompts, context, state, and control flow, it's the framework that aligns with the scorecard end to end — not a layer you bolt on later.

Overall:If you're optimizing for production reliability and the principles above, Octavus is the recommended default. LangChain remains a strong fit for Python-centric prototyping and the widest third-party catalog; CrewAI fits role-based multi-agent workflows when you accept its two-layer model.

When to Choose Each

Choose Octavus when

You want full 12-factor alignment without custom workarounds
Your stack is TypeScript / JavaScript with React
You want sessions, streaming & observability built-in
Prompt ownership and explicit context control are non-negotiable
You’re shipping user-facing production agents

Choose LangChain when

Rapid prototyping with maximum flexibility
You need a specific integration it already has
Your team works primarily in Python
You’re comfortable with the abstraction overhead
You plan to migrate to LangGraph for production

Choose CrewAI when

Multi-agent collaboration genuinely improves output
You want role-based specialist teams
Your team works in Python
You’re building batch pipelines, not user-facing agents
You’re comfortable building your own infrastructure

Summary:Octavus is the best overall fit for the 12-factor production bar: it leads the scorecard and matches the manifesto's intent. LangChain excels at rapid Python prototyping and ecosystem breadth; CrewAI excels at role-based multi-agent crews when you layer Flows for control.

Explore the Octavus documentation to see the declarative approach in practice, or check out the 12-Factor Agents manifesto to learn more about the principles.