12-Factor Agents: Patterns of reliable LLM applications — Dex Horthy, HumanLayer

These notes are based on the YouTube video by AI Engineer

Key Takeaways

Agents are just software – building reliable LLM‑driven agents boils down to classic software‑engineering practices: clear control flow, explicit state, and deterministic orchestration.
Own every token – the quality of an agent hinges on the prompts and context you feed the model. Treat prompts like hand‑crafted code; iterate, test, and optimise them. As explained in Why Your AI Gets Dumber After 10 Minutes, context management matters—owning each token pays off in cost and reliability.
Treat tools as deterministic functions – rather than “magical” external calls, view tool usage as a pure JSON‑in → JSON‑out transformation that your code executes.
Control‑flow ownership is critical – implement your own loops, switches, and DAG orchestration so you can pause, resume, retry, and serialize state safely.
Small, focused agents win – keep individual agent loops short (roughly 3‑10 steps) and embed them in an otherwise deterministic pipeline.
Error handling & context hygiene – surface tool errors to the model deliberately, but prune noisy stack traces before they re‑enter the context window.
Human‑in‑the‑loop is a first‑class feature – expose a clear “tool‑or‑human” decision point early in the generation so the model can ask for clarification or escalation.
Multi‑channel reach – let agents surface through email, Slack, Discord, SMS, etc., meeting users where they already work. For practical tips on shipping such multi‑channel agents, see Ship Production Software in Minutes, Not Months.
Stateless reducers → owned state – keep the LLM itself stateless; persist and manage all execution state yourself (e.g., in a DB).

Detailed Explanations of Core Concepts

1. The “JSON‑from‑sentence” primitive (Factor 1)

The most reliable thing an LLM can do today is translate a natural‑language request into a well‑structured JSON payload:

{
  "action": "create_ticket",
  "priority": "high",
  "summary": "User cannot log in"
}

The downstream code consumes this JSON deterministically. The surrounding factors (prompt design, error handling, etc.) make the whole system robust.

2. Own Your Prompts (Factor 2)

Prompt as code: Think of a prompt as a function definition. The fewer “hand‑wavy” instructions, the more predictable the output.
Iterative refinement: Use A/B tests, token‑level analysis, and prompt‑engineering tools to converge on a “banger” prompt. The five techniques that separate top agentic engineers—covered in The 5 Techniques Separating Top Agentic Engineers Right Now—are especially useful here.
Context engineering: Decide how to pack history, memory, RAG results, and system instructions into the OpenAI messages format (or equivalent). Example layout:

[
  {"role": "system", "content": "You are a helpful assistant that returns JSON only."},
  {"role": "user", "content": "Schedule a meeting with Alice tomorrow at 10 am."},
  {"role": "assistant", "content": "{\"action\":\"schedule_meeting\",\"participants\":[\"Alice\"],\"time\":\"2026-01-18T10:00:00\"}"}
]

3. Tool Use Is Not “Magical” (Factor 4)

Misconception: Treating tool calls as a mystical “agent‑to‑world” interface leads to brittle pipelines.
Correct view: The LLM emits JSON that your deterministic code executes (e.g., an HTTP request, a database query). The tool itself is just another pure function.
Implementation sketch:

def run_step(model_output):
    # model_output is JSON like {"tool":"search","query":"latest LLM papers"}
    if model_output["tool"] == "search":
        return web_search(model_output["query"])
    elif model_output["tool"] == "db_query":
        return db.execute(model_output["sql"])
    # … more deterministic branches

4. Own the Control Flow (Factor 8)

DAG vs. naïve loop: A simple “LLM decides next step, feed back, repeat” works only for tiny workflows.
Explicit DAG orchestration: Model each step as a node with defined inputs/outputs. Use a lightweight orchestrator (or a custom state machine) to enforce ordering, retries, and branching.
Pause/Resume: Serialize the current context window and any pending external calls to a DB. When a long‑running tool finishes, load the saved state, append the result, and continue.

graph LR
  A[Receive Event] --> B[Prompt LLM for next step]
  B --> C{Tool Call?}
  C -->|Yes| D[Execute deterministic tool]
  D --> E[Append result to context]
  E --> B
  C -->|No| F[Return final answer]

5. Small, Focused Agents & Hybrid Pipelines (Factor 10)

Pattern: Deterministic CI/CD pipeline → LLM decides a small, ambiguous step → human approval (if needed) → back to deterministic code.
Benefits: Keeps context windows tiny, isolates uncertainty to a bounded sub‑task, and makes debugging straightforward.

6. Error Propagation & Context Hygiene (Factor 9)

Surface errors: When a tool fails, inject a concise error summary into the next prompt so the model can retry or ask for clarification.
Avoid noise: Do not dump full stack traces into the context; they consume tokens and confuse the model.

error_summary = {"error":"Timeout while calling external API","retry":True}
prompt_context.append(error_summary)  # concise, useful info only

7. Human‑in‑the‑Loop as a First‑Class Decision (Factor 7)

Early branching: The model should decide immediately whether to:
1. Return a final answer,
2. Ask the user for clarification,
3. Escalate to a human operator.
Natural‑language token: Encode this decision in the first token(s) of the output, e.g., "human" vs. "tool" vs. "done".

8. Multi‑Channel Delivery (Factor 11)

Agents should be reachable via the channels users already use—Slack, Discord, email, SMS, etc. The underlying logic stays the same; only the transport layer changes. For a deeper dive on building multi‑channel bots quickly, see Ship Production Software in Minutes, Not Months.

9. Stateless LLM, Owned State (Factor 12)

Statelessness: The LLM never retains memory between calls. All conversation history, business state, and workflow progress live in your persistence layer.
Transducer pattern: Think of the LLM as a pure function that transforms the current state (JSON) into the next action.

10. “Create 12‑Factor Agent” – A Scaffold, Not a Wrapper (Factor 13)

Scaffold, not bootstrap: Provide the minimal plumbing (state store, orchestrator, API surface) and let developers own the actual agent code.
Goal: Shift the burden away from “hard AI parts” (prompt engineering, flow control) to the framework, letting teams focus on domain‑specific logic.

Summary

Dex Horthy’s talk reframes LLM agents as ordinary software systems that happen to use a stateless language model as a pure function. Reliability comes from:

Explicit ownership of prompts, context windows, and execution state.
Deterministic orchestration of tool calls via JSON contracts and a clear control‑flow graph.
Small, focused agent loops that keep LLM involvement limited to the parts of a workflow that truly need natural‑language reasoning.
Robust error handling and human‑in‑the‑loop pathways that are baked into the model’s output format.
Multi‑channel exposure so agents meet users where they already work.

By treating agents as modular, testable software components and applying classic engineering patterns (DAGs, state serialization, retry logic), developers can build production‑grade, customer‑facing LLM applications without relying on heavyweight “agent frameworks.” The 12‑factor checklist serves as a practical wish‑list for any framework aiming to support high‑velocity, high‑reliability AI agent development.

🔗 See Also: Why Your AI Gets Dumber After 10 Minutes 💡 Related: Claude Code Agents: The Feature That Changes Everything

# 12-Factor Agents: Patterns of reliable LLM applications — Dex Horthy, HumanLayer