4.4K
Security & Safety emerging

Deterministic Threat Rule Scanning

Apply deterministic regex rules as a first-pass security layer to detect known threat patterns in AI agent tool calls and skill definitions.

By eeee2345 (@eeee2345)
Add to Pack
or

Saved locally in this browser for now.

Cite This Pattern
APA
eeee2345 (@eeee2345) (2026). Deterministic Threat Rule Scanning. In *Awesome Agentic Patterns*. Retrieved April 24, 2026, from https://agentic-patterns.com/patterns/deterministic-threat-rule-scanning
BibTeX
@misc{agentic_patterns_deterministic-threat-rule-scanning,
  title = {Deterministic Threat Rule Scanning},
  author = {eeee2345 (@eeee2345)},
  year = {2026},
  howpublished = {\url{https://agentic-patterns.com/patterns/deterministic-threat-rule-scanning}},
  note = {Awesome Agentic Patterns}
}
01

Problem

AI agents that invoke external tools (MCP servers, function calls, skill files) are exposed to a class of attacks where malicious content is embedded in tool descriptions, responses, or skill definitions. These attacks include prompt injection via tool output, data exfiltration through encoded parameters, privilege escalation via hidden instructions, and tool poisoning through manipulated descriptions.

LLM-based detection alone is unreliable for these threats: adversaries can craft payloads that exploit the same reasoning flexibility that makes LLMs useful. Security requires a deterministic baseline that cannot be bypassed by clever prompt engineering.

02

Solution

Maintain a library of deterministic regex-based rules, each targeting a specific threat pattern observed in real-world agent attacks. These rules run against tool descriptions, tool call arguments, tool responses, and skill definition files before the agent processes them.

Each rule specifies:

  • A threat category (e.g., prompt injection, data exfiltration, privilege escalation)
  • One or more regex patterns matching known attack signatures
  • Severity level and recommended action (block, warn, log)
  • Test cases for both true positives and false positives
function scan(content, rules):
    findings = []
    for rule in rules:
        for pattern in rule.patterns:
            matches = regex_match(pattern, content)
            if matches:
                findings.append({
                    rule_id: rule.id,
                    severity: rule.severity,
                    match: matches[0],
                    action: rule.action
                })
    return findings

// Integration point: intercept before agent processes tool output
function on_tool_response(tool_name, response):
    findings = scan(response, threat_rules)
    critical = findings.filter(f => f.severity == "CRITICAL")
    if critical:
        block_and_alert(tool_name, critical)
    return response

The scanning layer sits between the agent and its tools, inspecting content at well-defined interception points:

graph TD A[External Tool / MCP Server] --> B[Tool Response] B --> C[Threat Rule Scanner] C --> D{Matches Found?} D -->|CRITICAL| E[Block + Alert] D -->|HIGH/MEDIUM| F[Warn + Log] D -->|None| G[Pass to Agent] H[Skill Definition File] --> C I[Tool Description] --> C

The key insight is layered defense: deterministic rules catch known patterns with near-perfect precision, while LLM-based review handles novel or ambiguous threats as a second layer. The deterministic layer provides a floor of protection that does not degrade with adversarial prompt engineering.

03

How to use it

  1. Choose interception points based on your agent framework:

    • Tool descriptions (scan when tools are registered)
    • Tool call arguments (scan before execution)
    • Tool responses (scan before agent processes output)
    • Skill/plugin definition files (scan at install time)
  2. Start with high-confidence rules targeting well-documented attack patterns:

    • Base64-encoded payloads in tool responses
    • Hidden instructions using Unicode or whitespace tricks
    • Cross-tool data exfiltration patterns
    • Privilege escalation via sudo, chmod, or role manipulation
  3. Integrate into your agent pipeline as a synchronous check:

    • Block on CRITICAL findings (known dangerous patterns)
    • Warn on HIGH findings (suspicious but may be legitimate)
    • Log everything for post-incident analysis
  4. Evolve rules continuously as new attack techniques emerge. Each rule should include test cases to prevent false positives from regressing precision.

04

Trade-offs

  • Pros:

    • Deterministic and auditable: every detection decision is traceable to a specific rule
    • Near-zero latency compared to LLM-based security review
    • Cannot be bypassed by prompt injection (rules run outside the LLM context)
    • Rules are composable and independently testable
    • Complements LLM-based detection as part of a layered defense
  • Cons:

    • Regex rules only catch known patterns; novel attacks require rule updates
    • Recall is inherently limited by the rule library's coverage
    • May produce false positives on legitimate content that resembles attack patterns
    • Requires ongoing maintenance as the threat landscape evolves
    • Pattern matching cannot understand semantic intent (e.g., distinguishing a legitimate base64 string from an exfiltration payload)
06

References