CLI-First Skill Design

Problem

When building agent skills (reusable capabilities), there's tension between:

API-first design: Skills as functions/classes—great for programmatic use, but hard to debug and test manually
GUI-first design: Skills as visual tools—easy for humans, but agents can't invoke them

Teams end up building two interfaces or choosing one audience over the other.

Solution

Design all skills as CLI tools first. A well-designed CLI is naturally dual-use: humans can invoke it from the terminal, and agents can invoke it via shell commands.

graph LR A[Skill Logic] --> B[CLI Interface] B --> C[Human: Terminal] B --> D[Agent: Bash Tool] B --> E[Scripts: Automation] B --> F[Cron: Scheduled]

Core principles:

One script, one skill: Each capability is a standalone executable
Subcommands for operations: skill.sh list, skill.sh get <id>, skill.sh create
Structured output: JSON for programmatic use, human-readable for TTY (auto-detect via isatty())
Exit codes: 0 for success, 1 for errors, 2 for incorrect usage, 127 if not found
Environment config: Credentials via env vars, not hardcoded
Default non-interactive: Avoid prompts; provide --yes or --force flags instead

# Example: Trello skill as CLI
trello.sh boards                    # List all boards
trello.sh cards <BOARD_ID>          # List cards on board
trello.sh create <LIST_ID> "Title"  # Create card
trello.sh move <CARD_ID> <LIST_ID>  # Move card

# Human usage
$ trello.sh boards
{"id": "abc123", "name": "Personal", "url": "..."}
{"id": "def456", "name": "Work", "url": "..."}

# Agent usage (via Bash tool)
Bash: trello.sh cards abc123 | jq '.[0].name'

How to use it

Skill structure:

~/.claude/skills/
├── trello/
│   └── scripts/
│       └── trello.sh          # Main CLI entry point
├── asana/
│   └── scripts/
│       └── asana.sh
├── honeybadger/
│   └── scripts/
│       └── honeybadger.sh
└── priority-report/
    └── scripts/
        └── priority-report.sh  # Composes other skills

CLI design checklist:

[ ] Standalone executable with shebang (#!/bin/bash)
[ ] Help text via --help or no-args
[ ] Subcommands for CRUD operations
[ ] JSON output (or TTY auto-detection: sys.stdout.isatty() / process.stdout.isTTY)
[ ] Credentials from ~/.envrc or environment
[ ] Meaningful exit codes (0=success, 1=error, 2=usage, 127=not found)
[ ] Stderr for errors, stdout for data
[ ] Non-interactive mode with --yes/--force flags

Composition example:

# priority-report.sh composes multiple skill CLIs
#!/bin/bash
echo "-- GitHub --"
gh pr list --search "review-requested:@me"

echo "-- Trello --"
~/.claude/skills/trello/scripts/trello.sh cards abc123

echo "-- Asana --"
~/.claude/skills/asana/scripts/asana.sh tasks personal

Trade-offs

Pros:

Dual-use by default: Same interface for humans and agents
Debuggable: Run manually to test, inspect output
Composable: Pipe, chain, and combine with Unix tools
Portable: Works in any shell, no runtime dependencies
Transparent: Agent's tool calls are visible shell commands
Testable: Easy to write integration tests

Cons:

Shell limitations: Complex data structures awkward in bash
Error handling: Less structured than exceptions
Performance: Process spawn overhead vs function calls
State management: No persistent state between invocations
Windows compatibility: Requires WSL or Git Bash

When to use something else:

High-frequency calls (>100/sec): Use in-process functions
Complex object graphs: Use structured API
Real-time streaming: Use WebSocket/SSE

References

Unix Philosophy (Doug McIlroy): "Write programs that do one thing and do it well"
POSIX exit code conventions: IEEE Std 1003.1
Dual-Use Tool Design pattern
Intelligent Bash Tool Execution pattern
12-Factor App: Config via environment
Claude Code skills directory structure

Primary source: https://github.com/anthropics/claude-code
anthropics/skills: https://github.com/anthropics/skills