# LeakGuard Documentation # Project Overview LeakGuard is a fast, lightweight secret scanner designed to detect accidentally committed credentials, tokens, and sensitive configuration values in codebases. Built in Rust with Python bindings, it provides both a command-line interface and a Python package for integration into development workflows. ## Key Features - **104 built-in detection rules** covering cloud providers, LLM platforms, databases, HTTP authentication, observability tools, and SaaS ecosystems - **High-performance scanning** optimized for local development and CI environments - **Multiple output formats**: `pretty` (default), `json`, `sarif`, and `markdown` - **GitHub Actions integration** with `--github-summary` flag for workflow summaries - **False positive management**: - Inline ignore markers (`# leakguard:ignore`) - Rule-level disabling via configuration - **Safe defaults**: - Binary files automatically skipped - `.env` files excluded by default - **Cross-platform support** (Linux, macOS, Windows) - **Python package distribution** via PyPI ## Use Cases - Pre-commit scanning in local development - CI/CD pipeline security checks - Repository audits for exposed secrets - Integration with security monitoring tools --- # Architecture & Components LeakGuard follows a modular architecture with clear separation between core scanning logic and interface layers. ## Core Components ### 1. Rule Engine - Located in `src/rules/` - Contains 104+ detection patterns for common secret formats - Rules are defined as Rust structs with regex patterns and metadata - Supports rule versioning and categorization ### 2. Scanner - Implements file traversal using `walkdir` - Handles file filtering (extensions, paths) - Manages parallel scanning of files - Applies detection rules to file contents ### 3. Output Formatters - `pretty`: Human-readable colored terminal output - `json`: Machine-readable output for tooling - `sarif`: Static Analysis Results Interchange Format (for GitHub Advanced Security) - `markdown`: GitHub-flavored markdown for reporting ### 4. Configuration System - TOML-based configuration files - Environment variable overrides - Command-line argument parsing ## Technical Stack | Component | Technology | Purpose | |--------------------|---------------------|----------------------------------| | Core | Rust | High-performance scanning engine | | Python Bindings | PyO3 + Maturin | Python package distribution | | CLI | Clap | Command-line argument parsing | | Configuration | TOML | User settings | | Output Formatting | Serde + Custom impl | Multiple output formats | | File Traversal | Walkdir | Directory scanning | ## Build System LeakGuard uses a hybrid build system: 1. **Rust Toolchain**: - Primary build system for the core scanner - `cargo build` for development - `cargo test` for unit/integration tests 2. **Python Packaging**: - Maturin for building Python wheels - `pyproject.toml` defines build requirements - Supports both pure Rust and Python extension modules --- # Getting Started ## Prerequisites ### System Requirements - Rust 1.70+ (for development) - Python 3.8+ (for Python package) - pip (for Python installation) ### Supported Platforms - Linux (x86_64, aarch64) - macOS (x86_64, arm64) - Windows (x86_64) ## Installation ### From PyPI (Recommended) ```bash pip install leakguard ``` ### From Source ```bash # Clone repository git clone https://github.com/adrian-lorenz/leakguard.git cd leakguard # Install Rust toolchain if needed curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Build and install Python package pip install maturin maturin develop --release ``` ### Pre-built Binaries Download pre-built binaries from GitHub Releases: ```bash # Example for Linux x86_64 wget https://github.com/adrian-lorenz/leakguard/releases/download/v1.0.4/leakguard-v1.0.4-x86_64-unknown-linux-gnu.tar.gz tar -xzf leakguard-v1.0.4-x86_64-unknown-linux-gnu.tar.gz ./leakguard --version ``` ## Running LeakGuard ### Basic Scan ```bash leakguard scan /path/to/codebase ``` ### Common Options ```bash # Scan with JSON output leakguard scan --output json /path/to/codebase # Scan with SARIF output (for GitHub Advanced Security) leakguard scan --output sarif /path/to/codebase # Generate GitHub Actions summary leakguard scan --github-summary /path/to/codebase # Scan with custom configuration leakguard scan --config leakguard.toml /path/to/codebase ``` ### Python Usage ```python from leakguard import scan results = scan( path="/path/to/codebase", output_format="json", github_summary=True ) print(results) ``` --- # Configuration LeakGuard supports configuration through command-line arguments, environment variables, and configuration files. ## Configuration File Create a `leakguard.toml` file in your project root or specify a custom path with `--config`: ```toml # Example leakguard.toml [scanner] exclude = ["**/node_modules/**", "**/dist/**"] include_extensions = [".py", ".js", ".go", ".rs"] max_file_size = 1048576 # 1MB [rules] disable = ["aws-access-key", "slack-token"] severity_threshold = "medium" [output] format = "json" github_summary = true ``` ## Environment Variables | Variable | Description | Default | |------------------------|--------------------------------------|---------| | `LEAKGUARD_CONFIG` | Path to config file | None | | `LEAKGUARD_OUTPUT` | Output format (`pretty`, `json`, etc)| `pretty`| | `LEAKGUARD_GITHUB` | Enable GitHub summary | `false` | ## Command-line Arguments ```bash leakguard scan --help ``` Key arguments: ``` USAGE: leakguard scan [OPTIONS] ARGS: Path to scan OPTIONS: -c, --config Path to config file -o, --output Output format [default: pretty] [possible values: pretty, json, sarif, markdown] --github-summary Generate GitHub Actions summary -e, --exclude ... Glob patterns to exclude -i, --include ... File extensions to include --max-file-size Maximum file size to scan (bytes) [default: 1048576] --severity Minimum severity to report [default: low] [possible values: low, medium, high, critical] -h, --help Print help information ``` ## Rule Configuration Disable specific rules in your config file: ```toml [rules] disable = [ "aws-access-key-id", "slack-webhook", "github-pat" ] ``` ## Path Exclusion Exclude paths using glob patterns: ```toml [scanner] exclude = [ "**/node_modules/**", "**/vendor/**", "**/dist/**", "**/build/**", "**/.git/**", "**/.env", "**/*.min.js", "**/*.lock" ] ``` --- # API / Usage Reference ## Command-line Interface ### `scan` Command Primary command for scanning directories: ```bash leakguard scan [OPTIONS] ``` #### Options | Option | Description | Default | |-------------------------|-----------------------------------------------------------------------------|---------------| | `--config ` | Path to configuration file | `leakguard.toml` | | `--output ` | Output format (`pretty`, `json`, `sarif`, `markdown`) | `pretty` | | `--github-summary` | Generate GitHub Actions summary | `false` | | `--exclude ` | Glob patterns to exclude (can be specified multiple times) | | | `--include ` | File extensions to include (e.g., `.py`, `.js`) | All text files| | `--max-file-size `| Maximum file size to scan (bytes) | `1048576` (1MB)| | `--severity ` | Minimum severity to report (`low`, `medium`, `high`, `critical`) | `low` | | `--no-ignore` | Don't respect `.gitignore` files | `false` | ### Example Commands 1. Basic scan with default settings: ```bash leakguard scan . ``` 2. Scan with JSON output and GitHub summary: ```bash leakguard scan --output json --github-summary . ``` 3. Scan with custom exclusions and severity threshold: ```bash leakguard scan --exclude "**/tests/**" --exclude "**/fixtures/**" --severity high . ``` ## Python API ### `scan()` Function Main function for Python integration: ```python def scan( path: str, *, config_path: Optional[str] = None, output_format: str = "json", github_summary: bool = False, exclude: Optional[List[str]] = None, include_extensions: Optional[List[str]] = None, max_file_size: int = 1048576, severity_threshold: str = "low", no_ignore: bool = False ) -> Union[dict, str]: """ Scan a directory for secrets. Args: path: Path to scan config_path: Path to config file output_format: Output format ('json', 'sarif', 'markdown') github_summary: Generate GitHub Actions summary exclude: List of glob patterns to exclude include_extensions: List of file extensions to include max_file_size: Maximum file size in bytes severity_threshold: Minimum severity to report no_ignore: Don't respect .gitignore files Returns: Scan results in specified format (dict for JSON, str for others) """ pass ``` ### Example Usage ```python from leakguard import scan # Basic scan results = scan("/path/to/codebase") print(results) # Advanced scan with configuration results = scan( path="/path/to/codebase", output_format="sarif", github_summary=True, exclude=["**/tests/**", "**/fixtures/**"], severity_threshold="medium" ) # Process results if results: print(f"Found {len(results['results'])} potential secrets") ``` ## Output Formats ### Pretty (Default) Human-readable terminal output with colors: ``` Found 2 potential secrets in 42 files: [HIGH] AWS Access Key ID → /src/config/prod.py:42 42 | AWS_ACCESS_KEY_ID = "AKIAIOSFODNN7EXAMPLE" [MEDIUM] Slack Webhook → /scripts/deploy.sh:15 15 | SLACK_WEBHOOK="https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX" ``` ### JSON Machine-readable output: ```json { "version": "1.0.4", "results": [ { "rule_id": "aws-access-key-id", "rule_name": "AWS Access Key ID", "severity": "high", "file": "/src/config/prod.py", "line": 42, "match": "AKIAIOSFODNN7EXAMPLE", "context": "AWS_ACCESS_KEY_ID = \"AKIAIOSFODNN7EXAMPLE\"" } ], "stats": { "files_scanned": 42, "files_skipped": 8, "secrets_found": 2, "duration_ms": 125 } } ``` ### SARIF Static Analysis Results Interchange Format (for GitHub Advanced Security): ```json { "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json", "version": "2.1.0", "runs": [ { "tool": { "driver": { "name": "leakguard", "version": "1.0.4", "informationUri": "https://github.com/adrian-lorenz/leakguard" } }, "results": [ { "ruleId": "aws-access-key-id", "level": "error", "message": { "text": "AWS Access Key ID detected" }, "locations": [ { "physicalLocation": { "artifactLocation": { "uri": "file:///src/config/prod.py" }, "region": { "startLine": 42, "snippet": { "text": "AWS_ACCESS_KEY_ID = \"AKIAIOSFODNN7EXAMPLE\"" } } } } } } ] } ] } ``` ### Markdown GitHub-flavored markdown for reporting: ```markdown # LeakGuard Scan Results **Version**: 1.0.4 **Files scanned**: 42 **Secrets found**: 2 **Duration**: 125ms ## Findings ### HIGH: AWS Access Key ID **File**: `/src/config/prod.py:42` **Rule ID**: aws-access-key-id ```python 42 | AWS_ACCESS_KEY_ID = "AKIAIOSFODNN7EXAMPLE" ``` ### MEDIUM: Slack Webhook **File**: `/scripts/deploy.sh:15` **Rule ID**: slack-webhook ```bash 15 | SLACK_WEBHOOK="https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX" ``` ``` ## Ignoring Findings ### Inline Ignore Add a comment to ignore specific findings: ```python # leakguard:ignore aws-access-key-id AWS_ACCESS_KEY_ID = "AKIAIOSFODNN7EXAMPLE" # This is a test key ``` ### File-level Ignore Add a `.leakguardignore` file to your project root: ``` # Ignore specific rules aws-access-key-id slack-webhook # Ignore specific files **/test_keys.py **/fixtures/* ``` --- # Contributing We welcome contributions to LeakGuard! Here's how you can help: ## Development Setup 1. Clone the repository: ```bash git clone https://github.com/adrian-lorenz/leakguard.git cd leakguard ``` 2. Install Rust toolchain: ```bash curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh rustup update ``` 3. Install Python dependencies: ```bash pip install maturin pytest ``` 4. Build the project: ```bash maturin develop --release ``` ## Code Structure ``` . ├── .github/ # GitHub workflows and issue templates ├── src/ │ ├── lib.rs # Main library code │ ├── main.rs # CLI entry point │ ├── rules/ # Detection rules │ ├── scanner.rs # Scanning logic │ ├── output.rs # Output formatters │ └── config.rs # Configuration handling ├── tests/ # Integration tests ├── Cargo.toml # Rust manifest └── pyproject.toml # Python package configuration ``` ## Testing ### Rust Tests ```bash cargo test ``` ### Python Tests ```bash pytest ``` ### Integration Tests ```bash # Run with test fixtures cargo test -- --ignored ``` ## Adding New Rules 1. Create a new rule in `src/rules/`: ```rust // Example rule definition pub fn aws_access_key_id() -> Rule { Rule { id: "aws-access-key-id".to_string(), name: "AWS Access Key ID".to_string(), pattern: Regex::new(r"(?i)aws(.{0,20})?(?-i)['\"][0-9a-z/+]{20,40}['\"]").unwrap(), severity: Severity::High, description: "Detects AWS Access Key IDs".to_string(), tags: vec!["aws".to_string(), "cloud".to_string()], } } ``` 2. Register the rule in `src/rules/mod.rs`: ```rust pub fn all_rules() -> Vec { vec![ aws::aws_access_key_id(), // ... other rules your_new_rule(), ] } ``` 3. Add test cases in `tests/rules.rs`: ```rust #[test] fn test_your_new_rule() { let rule = your_new_rule(); assert!(rule.pattern.is_match("AKIAIOSFODNN7EXAMPLE")); assert!(!rule.pattern.is_match("not-a-key")); } ``` ## Code Style ### Rust - Follow Rust's official style guidelines - Use `cargo fmt` for formatting - Use `cargo clippy` for linting ```bash cargo fmt cargo clippy -- -D warnings ``` ### Python - Follow PEP 8 guidelines - Use `black` for formatting - Use `flake8` for linting ```bash black . flake8 ``` ## Pull Request Process 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/your-feature`) 3. Commit your changes (`git commit -am 'Add some feature'`) 4. Push to the branch (`git push origin feature/your-feature`) 5. Open a Pull Request ## Release Process 1. Update version in `Cargo.toml` 2. Update changelog 3. Create a tag (`git tag -a vX.Y.Z -m "Release X.Y.Z"`) 4. Push the tag (`git push origin vX.Y.Z`) 5. GitHub Actions will build and publish the release ## Community - **Issue Tracker**: https://github.com/adrian-lorenz/leakguard/issues - **Discussions**: https://github.com/adrian-lorenz/leakguard/discussions - **Security Policy**: Report vulnerabilities to security@noa-x.de