This directory contains comprehensive fuzz testing for the vb6parse library using cargo-fuzz and libFuzzer. Fuzzing helps discover edge cases, malformed input handling, and potential panics that traditional unit tests might miss.
VB6Parse processes untrusted input from legacy VB6 projects, making robustness critical:
- Legacy code variety: Real-world VB6 projects contain encoding quirks, malformed syntax, and IDE-generated edge cases
- Binary formats: FRX files are binary with multiple header formats that must be parsed safely
- Partial success model: Parsers should handle malformed input gracefully, returning partial results rather than panicking
- Complex state machines: Tokenization and CST construction involve intricate state transitions that benefit from mutation testing
- Install cargo-fuzz (requires nightly Rust):
cargo install cargo-fuzz- Ensure you have the nightly toolchain:
rustup install nightlyList all available fuzz targets:
cargo +nightly fuzz listRun a specific fuzzer (replace <target> with any target below):
# I/O Layer targets
cargo +nightly fuzz run sourcefile_decode
cargo +nightly fuzz run sourcestream
# Lexer Layer target
cargo +nightly fuzz run tokenize
# Parsers Layer target
cargo +nightly fuzz run cst_parse
# Files Layer targets (high-level parsers)
cargo +nightly fuzz run project_file
cargo +nightly fuzz run class_file
cargo +nightly fuzz run module_file
cargo +nightly fuzz run form_file
cargo +nightly fuzz run form_resourceRun with a time limit (recommended for development):
# Run for 60 seconds
cargo +nightly fuzz run sourcefile_decode -- -max_total_time=60
# Run for 5 minutes
cargo +nightly fuzz run cst_parse -- -max_total_time=300Run with specific options for deeper fuzzing:
cargo +nightly fuzz run form_resource -- \
-max_total_time=3600 \
-timeout=60 \
-rss_limit_mb=4096 \
-print_final_stats=1 \
-jobs=4Common libFuzzer options:
-max_total_time=<seconds>: Stop after N seconds-timeout=<seconds>: Timeout for individual test cases (default: 1200)-rss_limit_mb=<MB>: Memory limit per test case-jobs=<N>: Number of parallel fuzzing jobs-print_final_stats=1: Show statistics when done-dict=<file>: Use a dictionary for mutation guidance
VB6Parse has 9 fuzz targets covering all layers of the parsing pipeline:
Tests Windows-1252 decoding and character encoding robustness.
What it tests:
- Arbitrary byte sequences that may not be valid Windows-1252
- Invalid UTF-8 sequences
- Null bytes and control characters
- Encoding boundary conditions
- Replacement character insertion for invalid bytes
Why it matters: VB6 projects use Windows-1252 encoding, which has undefined bytes in certain ranges. The decoder must handle these gracefully.
Tests low-level character stream navigation and pattern matching.
What it tests:
- Character peeking at arbitrary offsets
- Pattern matching with malformed patterns
- Forward/backward navigation edge cases
- Case-insensitive comparisons
- Line/column tracking accuracy
- Offset boundary conditions
Why it matters: SourceStream is the foundation for all parsing. Bugs here affect everything built on top.
Tests tokenization of arbitrary VB6-like text input.
What it tests:
- Invalid VB6 syntax combinations
- Unterminated string literals
- Malformed numeric literals (e.g.,
&H,&Owith no digits) - Line continuation edge cases (
_at unexpected positions) - Comment handling (single quote,
Remstatement) - Keyword vs identifier ambiguity
- Whitespace and newline handling
Why it matters: The tokenizer must never panic on malformed source, even when syntax is completely invalid.
Tests Concrete Syntax Tree construction from token streams.
What it tests:
- Invalid VB6 syntax patterns
- Mismatched control structures (
IfwithoutEnd If,ForwithoutNext) - Deeply nested code structures (potential stack overflow)
- Incomplete statements and expressions
- Complex expressions with unusual operator combinations
- Unexpected token sequences
- Missing required tokens
Why it matters: CST construction involves complex state machines. Fuzzing helps find unexpected token sequences that could cause panics or infinite loops.
Tests VB6 project file (.vbp) parsing.
What it tests:
- Malformed project file syntax
- Invalid property names and values
- Missing required sections (e.g.,
Type=,Form=) - Duplicate entries
- Incorrect reference formats
- Version number edge cases
Why it matters: Project files are the entry point to VB6 codebases. Robust parsing ensures the library can handle projects from any VB6 version or IDE quirk.
Tests VB6 class module (.cls) parsing.
What it tests:
- Malformed
VERSIONlines - Invalid
Attributestatements - Missing or duplicate
CLASSattribute - Properties:
MultiUse,Persistable,DataBindingBehavior,DataSourceBehavior - Combination of header and code parsing
- Invalid VB6 code in class body
Why it matters: Class files have a unique header structure that differs from modules. Fuzzing ensures robust handling of all class-specific properties.
Tests VB6 standard module (.bas) parsing.
What it tests:
- Malformed
VERSIONlines in modules - Invalid
Attribute VB_Namestatements - Module-level variable declarations
- Public/Private procedure definitions
- Option statements (
Option Explicit,Option Base) - Invalid code in module body
Why it matters: Modules are the simplest VB6 file type, but still have header/body structure that must be parsed correctly.
Tests VB6 form file (.frm) parsing - the most complex file type.
What it tests:
- Form header with control hierarchy
- Nested control structures (Forms → Frames → Controls)
- Property parsing for 50+ control types
- Menu control definitions
- Begin/End block matching
- Combination of visual designer output and VB6 code
- Missing or malformed control properties
- Invalid control types
Why it matters: Form files are the most complex VB6 file type, containing both visual designer output and code. They have the most edge cases and IDE-generated quirks.
Tests VB6 form resource file (.frx) parsing - pure binary format.
What it tests:
- Invalid binary data sequences
- Multiple FRX header formats (12-byte, 8-byte, 4-byte, 3-byte, 1-byte)
- Corrupted header fields
- Entry size mismatches
- Property GUID lookups
- String data with invalid encoding
- Binary blob handling (icons, images, etc.)
- Truncated files and incomplete entries
Why it matters: FRX files are binary with multiple header formats used across VB6 versions. This is the most crash-prone area, making fuzzing essential.
The corpus/ directory contains seed inputs for each fuzzer. The corpus is crucial for effective fuzzing:
- Seed corpus: Initial inputs in
corpus/<target>/provide starting points for mutation - Automatic expansion: LibFuzzer discovers new "interesting" inputs during fuzzing and adds them to the corpus
- Coverage-guided: Inputs that trigger new code paths are kept; redundant ones are discarded
- Persistent: Corpus grows over time, improving fuzzing effectiveness in future runs
Initial corpus is seeded from:
tests/data/: Real VB6 project files (submodules)- Hand-crafted edge cases
- Previously discovered crash cases (minimized)
As you fuzz, the corpus automatically grows:
# Before fuzzing
$ ls corpus/form_file/ | wc -l
15
# After fuzzing for 1 hour
$ ls corpus/form_file/ | wc -l
247View corpus statistics:
# Count corpus entries
ls -1 corpus/<target>/ | wc -l
# Show total corpus size
du -sh corpus/<target>/Minimize corpus (remove redundant entries):
cargo +nightly fuzz cmin <target>Merge corpus from multiple runs:
# Merge corpus from another machine or CI run
cargo +nightly fuzz cmin <target> -- corpus/<target>/ other_corpus/<target>/If a fuzzer discovers a crash or timeout, artifacts are saved in artifacts/<fuzzer_name>/:
artifacts/
├── form_resource/
│ ├── crash-da39a3ee5e6b4b0d # Crash-causing input
│ ├── timeout-8b3f9c1a7d2e4f # Input that caused timeout
│ └── ...
Run the fuzzer with the crash file to reproduce:
cargo +nightly fuzz run form_resource artifacts/form_resource/crash-da39a3ee5e6b4b0dThis will:
- Load the crash-causing input
- Re-run the fuzzer with that exact input
- Show the panic/error message and stack trace
Crash inputs often contain redundant bytes. Minimize them for easier debugging:
cargo +nightly fuzz tmin form_resource artifacts/form_resource/crash-da39a3ee5e6b4b0dThis produces the smallest input that still triggers the crash, making it easier to:
- Understand root cause
- Write a minimal reproduction test case
- Fix the bug
- Reproduce to confirm the crash
- Minimize to get the smallest crashing input
- Debug with the minimized input:
# Run with debugger rust-lldb target/x86_64-unknown-linux-gnu/release/form_resource artifacts/form_resource/crash-minimized # Or add print debugging to the fuzz target
- Create test case from the crash to prevent regression
- Fix the bug in the parser
- Verify the fix:
cargo +nightly fuzz run form_resource artifacts/form_resource/crash-da39a3ee5e6b4b0d -- -runs=1
- Delete artifact once fixed
See Recent_Failures.md for a log of recent fuzzing discoveries and their fixes.
For day-to-day development, quick fuzzing sessions help catch issues early:
# Quick smoke test (1 minute per target)
for target in sourcefile_decode sourcestream tokenize cst_parse project_file class_file module_file form_file form_resource; do
echo "Fuzzing $target..."
cargo +nightly fuzz run $target -- -max_total_time=60
doneWhen to fuzz during development:
- After implementing a new parser or significant refactor
- Before committing changes to critical parsing code
- When fixing a bug to ensure the fix doesn't introduce new issues
- After updating dependencies that affect parsing logic
For thorough testing, run longer sessions on specific targets:
# Focus on the most complex parsers
cargo +nightly fuzz run form_file -- -max_total_time=3600 # 1 hour
cargo +nightly fuzz run form_resource -- -max_total_time=3600 # 1 hour
cargo +nightly fuzz run cst_parse -- -max_total_time=1800 # 30 minutesRecommended priorities:
- form_resource - Binary parsing, highest crash risk
- form_file - Most complex file format
- cst_parse - Complex state machine
- project_file - Entry point, critical for library users
- tokenize - Foundation for all parsing
If running in CI/CD or overnight, use longer durations:
# Overnight session (8 hours per target)
cargo +nightly fuzz run form_resource -- -max_total_time=28800 -jobs=4Benefits of longer runs:
- Discover rare edge cases
- Build more comprehensive corpus
- Achieve deeper code coverage
- Find timing-dependent issues
View code coverage achieved by fuzzing:
# Generate coverage report
cargo +nightly fuzz coverage form_resource
# View HTML coverage report
open fuzz/coverage/form_resource/index.htmlThis shows:
- Which code paths the fuzzer exercised
- Uncovered branches that might need seed inputs
- Comparison with unit test coverage
Note: Fuzzing coverage often differs from unit test coverage:
- Fuzzers discover edge cases unit tests miss
- Some code paths may require specific seeds to reach
- Coverage complements but doesn't replace traditional testing
Monitor fuzzing performance during runs:
# Run with statistics
cargo +nightly fuzz run form_file -- -print_final_stats=1
# Watch live progress
cargo +nightly fuzz run form_file -- -print_progress=1Key metrics:
- exec/s: Executions per second (higher is better, indicates fuzzer efficiency)
- cov: Coverage (unique code paths found)
- corp: Corpus size (interesting inputs discovered)
Typical exec/s rates:
sourcefile_decode: 50,000+ exec/s (simple, fast)tokenize: 10,000-20,000 exec/s (moderate complexity)form_file: 1,000-5,000 exec/s (complex parsing)form_resource: 5,000-10,000 exec/s (binary format)
If exec/s is very low (<100), the fuzzer may be hitting timeouts or slow paths frequently.
Don't commit to long fuzzing sessions initially. Run 1-5 minutes first to catch obvious issues.
Some inputs can cause excessive memory allocation. Set reasonable limits:
cargo +nightly fuzz run form_file -- -rss_limit_mb=2048Multiple jobs speed up fuzzing but increase resource usage:
# Good for overnight runs
cargo +nightly fuzz run form_resource -- -jobs=4 -max_total_time=28800
# Bad: too many jobs can thrash CPU
cargo +nightly fuzz run form_resource -- -jobs=32 # Probably overkillWhen you find a crash:
- Copy it to a safe location (artifacts can be overwritten)
- Minimize it immediately
- Create a test case before deleting
Don't delete corpus entries unless they're truly redundant. A rich corpus makes future fuzzing more effective.
Not all fuzz targets need equal attention:
- High priority: Binary formats (form_resource), complex parsers (form_file, cst_parse)
- Medium priority: File parsers (project_file, class_file, module_file)
- Lower priority: Foundation layers (sourcefile_decode, sourcestream) - these are simpler and well-tested
Fuzzing complements but doesn't replace:
- Unit tests (specific scenarios)
- Integration tests (real-world files)
- Property-based tests (invariants)
- Manual testing (usability)
- ✅ No crashes after reasonable fuzzing time (5+ minutes)
- ✅ Corpus grows steadily then plateaus (coverage maximized)
- ✅ High exec/s rate (fuzzer is efficient)
- ✅ Good coverage of target code
⚠️ Repeated timeouts (infinite loops or very slow paths)⚠️ Memory limit hits (unbounded allocation)⚠️ Corpus grows without bound (fuzzer finding too many "interesting" inputs)⚠️ Very low exec/s (<100) (fuzzer spending too much time per input)
- Corpus size stabilizes (no new inputs for 10+ minutes)
- Coverage plateaus (no new code paths discovered)
- Time limit reached
- Acceptable exec count achieved (e.g., 1M+ executions)
- Check if inputs are triggering slow code paths
- Add timeouts:
-timeout=10 - Profile the fuzz target to find bottlenecks
- Reduce memory limit:
-rss_limit_mb=1024 - Check for unbounded allocations in parser
- Minimize inputs before fuzzing
- Corpus may be exhausted for this seed set
- Try running a different target
- Add new seed inputs from real VB6 projects
- Check if these are expected panics (e.g.,
unimplemented!()) - Adjust parser to return errors instead of panicking
- Use
std::panic::catch_unwindif intentional panics are acceptable
# List all targets
cargo +nightly fuzz list
# Quick test (1 minute)
cargo +nightly fuzz run <target> -- -max_total_time=60
# Deep test (1 hour)
cargo +nightly fuzz run <target> -- -max_total_time=3600
# Reproduce crash
cargo +nightly fuzz run <target> artifacts/<target>/<crash_file>
# Minimize crash
cargo +nightly fuzz tmin <target> artifacts/<target>/<crash_file>
# View coverage
cargo +nightly fuzz coverage <target>
# Minimize corpus
cargo +nightly fuzz cmin <target>For ad-hoc fuzzing sessions, test in this order:
form_resource- Binary format, highest riskform_file- Most complex text formatcst_parse- Core parsing logicproject_file- Library entry pointclass_file- Common file typemodule_file- Common file typetokenize- Foundation layersourcestream- Low-level operationssourcefile_decode- Character decoding
Found a bug with fuzzing? Great! Please:
- Minimize the crash input
- Create a regression test from the minimized input
- File an issue with:
- The minimized crash input (or attach it)
- Fuzzer target that found it
- Error message/stack trace
- Submit a PR with the fix and regression test
Crashes found by fuzzing are valuable - they represent real edge cases that could affect users with legacy VB6 projects.