Problem
Large files (PDFs, DOCXs, images) overwhelm the context window when loaded naively. A 5-10MB PDF may contain only 10-20KB of relevant text/tables, but the entire file is often shoved into context, wasting tokens and degrading performance.
Solution
Apply progressive disclosure: load file metadata first, then provide tools to load content on-demand.
Core approach:
-
Always include file metadata in the prompt (not full content):
Files: - id: f_a1 name: my_image.png size: 500,000 preloaded: false - id: f_b3 name: report.pdf size: 8500000 preloaded: false -
Optionally preload first N KB of appropriate mimetypes (configurable per-workflow, can be 0)
-
Provide three file operations:
load_file(id)- Load entire file into contextpeek_file(id, start, stop)- Load a section of fileextract_file(id)- Transform PDF/DOCX/PPT into simplified text
-
Include a
large_filesskill explaining when/how to use these tools
# Agent workflow for document comparison
1. Prompt includes file metadata for report_2024.pdf and report_2025.pdf
2. Agent sees large PDFs, checks large_files skill
3. Agent calls: extract_file("report_2024.pdf")
4. Agent calls: extract_file("report_2025.pdf")
5. Agent compares extracted summaries using minimal context
# Agent workflow for image analysis
1. Prompt includes metadata for screenshot.png
2. Agent sees PNG type, calls: load_file("screenshot.png")
3. Image content is loaded, agent analyzes visual content
How to use it
Best for:
- Document comparison workflows (multiple PDFs)
- Ticket systems with file attachments (images, PDFs)
- Data export analysis (large reports in various formats)
- Any workflow where agents need file content but files are large
Implementation considerations:
- File
idshould be a stable reference for tool calls extract_fileshould return simplified text (tables, text content)- Consider making
extract_filereturn a virtualfile_idfor very large extractions - Preloading first N KB is optional - can give agent initial context without full load
- Recommended preload amounts: text 10-50 KB, PDF first page/5 KB, images metadata only
- Cache extracted content to avoid re-processing (TTL: text 24h, tables 7 days, metadata 1h)
Tool design:
def load_file(file_id: str, format: str = "text") -> str:
"""Load entire file content into context window."""
def peek_file(file_id: str, offset: int, length: int, unit: str = "bytes") -> str:
"""Load a specific range from file. Unit options: bytes, lines, pages, tokens."""
def extract_file(file_id: str, extraction: str = "text") -> str:
"""Convert PDF/DOCX/PPT to simplified representation.
Extraction options: text, structure, tables, summary."""
Trade-offs
Pros:
- Enables working with files much larger than context window
- Agent has control over what/when to load
- Reusable across workflows via
large_filesskill - Extracted content is often 100x smaller than original file
Cons:
- Adds tool call overhead (multiple round-trips)
- Requires preloading heuristics (how much is enough?)
- Extraction from complex formats (DOCX) can be slow without native dependencies
- Agent may make poor loading decisions without proper guidance
Trade-offs in preloading:
- Preloading: Gives agent immediate context but reduces control
- No preloading: Maximum agent control but requires explicit load calls
References
- Building an internal agent: Progressive disclosure and handling large files - Will Larson (2025)
- Related: Progressive Tool Discovery - Similar lazy-loading concept for tools
- Related: Context-Minimization Pattern - Complementary pattern for reducing context bloat
- Yang et al. (2016). "Hierarchical Attention Networks for Document Classification." NAACL - Academic foundation for hierarchical processing
- LangChain - Document loaders with metadata-first approach (github.com/langchain-ai/langchain)