4.4K
Reliability & Eval emerging low

Output Verification Loop

Verify LLM outputs by extracting individual claims, checking each against evidence sources, and returning per-claim trust scores before acting on the result.

By John Weston (@JohnnyTarrr)
Add to Pack
or

Saved locally in this browser for now.

Cite This Pattern
APA
John Weston (@JohnnyTarrr) (2026). Output Verification Loop. In *Awesome Agentic Patterns*. Retrieved April 24, 2026, from https://agentic-patterns.com/patterns/output-verification-loop
BibTeX
@misc{agentic_patterns_output-verification-loop,
  title = {Output Verification Loop},
  author = {John Weston (@JohnnyTarrr)},
  year = {2026},
  howpublished = {\url{https://agentic-patterns.com/patterns/output-verification-loop}},
  note = {Awesome Agentic Patterns}
}
01

Problem

LLM agents confidently produce outputs that contain factual errors, hallucinated citations, or unsupported claims. In multi-agent pipelines the problem compounds: one agent's hallucination becomes another agent's input. Standard reflection loops catch stylistic issues but lack grounding against external evidence, so factual errors pass through unchallenged.

02

Solution

Insert a verification step between generation and action. The step works in three phases:

  1. Claim extraction -- decompose the LLM output into individual, atomic claims.
  2. Evidence retrieval -- for each claim, retrieve supporting or contradicting evidence from authoritative sources (APIs, knowledge bases, RAG stores).
  3. Scoring -- assign a per-claim trust score based on evidence alignment and return an aggregate confidence for the full output.

The agent (or orchestrator) uses the scores to decide whether to proceed, retry with a corrected prompt, or escalate to a human.

output = agent.generate(prompt)
result = verify(output, context)

if result.trust_score >= threshold:
    proceed(output)
else:
    retry_with_feedback(result.flagged_claims)

In multi-agent systems, run verification at each hand-off between agents so errors don't propagate through the pipeline. Verification receipts (signed, timestamped results) provide an audit trail for compliance.

03

How to use it

  • Place a verification call after any agent step that produces factual claims others will depend on.
  • Set a trust-score threshold appropriate to the domain (higher for medical/financial, lower for exploratory research).
  • In a swarm or multi-agent pipeline, verify at every agent boundary rather than only at the final output.
  • Store verification receipts when you need a compliance audit trail.

04

Trade-offs

  • Pros: Catches factual errors that reflection loops miss; provides quantified confidence rather than binary pass/fail; audit trail via receipts; composable across multi-agent pipelines.
  • Cons: Adds latency per verification call; evidence quality bounds verification quality; requires an evidence source relevant to the domain.
06

References