Test Suite Memory Management¶
Problem¶
Running the full test suite with parallel execution can cause Out-of-Memory (OOM) crashes on EC2 instances, especially smaller instances with limited RAM.
Root Cause¶
The test suite includes:
- 50+ test files with extensive test code
- Heavy dependencies including:
- Sentence-transformers embedding models (~120-200MB per process)
- FAISS vector indexes
- Full FastAPI application stack
When using pytest-xdist with -n auto, pytest spawns one worker process per CPU core (4 workers on a 4-core EC2 instance). Each worker loads:
- The embedding model
- FAISS indexes
- Test fixtures and data
- The full application
Memory multiplication: 4 workers × ~500MB per worker = ~2GB+ just for test processes
This can overwhelm EC2 instances with 8-16GB of RAM, especially when the OS and other services are also running.
Solution¶
Default Behavior (Serial Execution)¶
The test suite now runs serially by default to prevent OOM crashes:
Parallel Execution (Use with Caution)¶
If you have sufficient memory (16GB+ RAM), you can enable parallel execution:
# Run with 2 workers (safer for smaller EC2 instances)
python scripts/test.py full -n 2
# Run fast tests with 2 workers
python scripts/test.py fast
# Run unit tests with 4 workers (requires more memory)
python scripts/test.py unit -n 4
Monitoring Memory Usage¶
Before running tests with parallelization, check available memory:
# Check memory usage
free -h
# Monitor memory in real-time
watch -n 1 free -h
# Check processes by memory usage
ps aux --sort=-%mem | head -20
Memory Guidelines¶
| EC2 Instance Type | Recommended Workers | Notes |
|---|---|---|
| t3.small (2GB) | 1 (serial) | Parallel execution will crash |
| t3.medium (4GB) | 1 (serial) | May work with -n 2 for unit tests |
| t3.large (8GB) | 2 | Safe for most tests |
| t3.xlarge (16GB) | 3-4 | Can handle full parallelization |
| t3.2xlarge (32GB) | auto | Full parallel execution safe |
Test Commands¶
Recommended Commands for EC2¶
# Check dependencies first
python scripts/test.py check
# Run unit tests only (fastest, safest)
python scripts/test.py unit
# Run integration tests
python scripts/test.py integration
# Run fast tests with 2 workers
python scripts/test.py fast
# Run full test suite serially (safe but slow)
python scripts/test.py full
# Generate coverage report (always serial)
python scripts/test.py coverage
Advanced Options¶
# Run specific domain tests
python scripts/test.py auth # Authentication tests
python scripts/test.py servers # Server management tests
python scripts/test.py search # Search and AI tests
python scripts/test.py health # Health monitoring tests
python scripts/test.py core # Core infrastructure tests
# Enable debug logging
python scripts/test.py unit --debug
# Run with custom worker count
python scripts/test.py unit -n 3
Direct pytest Usage¶
If using pytest directly, be aware of memory implications:
# DANGEROUS: May crash EC2 instance
pytest -n auto # Spawns workers = CPU cores
# SAFER: Limit workers
pytest -n 2
# SAFEST: Serial execution (no -n flag)
pytest
Optimizations¶
For Local Development¶
If running locally with sufficient RAM (16GB+):
# Fast parallel execution for unit tests
pytest tests/unit -n auto
# Fast parallel for specific domains
pytest tests/unit/auth -n auto
For CI/CD¶
GitHub Actions and other CI environments typically have limited memory. Use:
Future Improvements¶
To further reduce memory usage:
- Mock Heavy Dependencies: Mock sentence-transformers and FAISS in unit tests
- Test Fixtures Optimization: Share model loading across tests using session-scoped fixtures
- Test Categorization: Split heavy integration tests from lightweight unit tests
- Lazy Loading: Only load ML models when actually needed in tests
Troubleshooting¶
OOM Crash Symptoms¶
- EC2 instance becomes unresponsive
- SSH connection drops
- Test suite hangs indefinitely
- System logs show "Out of memory: Killed process"
Recovery Steps¶
- Reboot the EC2 instance if unresponsive
- Run tests serially:
python scripts/test.py full - Consider upgrading to a larger instance type
- Run tests in batches by domain:
Debugging Memory Issues¶
# Check which process is using memory during tests
watch -n 1 'ps aux --sort=-%mem | head -20'
# Check for OOM killer logs
dmesg | grep -i "out of memory"
sudo journalctl | grep -i "out of memory"
Summary¶
- Default: Tests run serially to prevent OOM crashes
- Safe Parallel: Use
-n 2for faster execution on typical EC2 instances - Full Parallel: Only use
-n autoor higher worker counts on instances with 16GB+ RAM - Monitor: Always monitor memory usage when experimenting with parallelization