diff --git a/docs/alert-quality-improvement-plan.md b/docs/alert-quality-improvement-plan.md new file mode 100644 index 0000000..54d97b1 --- /dev/null +++ b/docs/alert-quality-improvement-plan.md @@ -0,0 +1,599 @@ +# Alert Quality Improvement Plan + +**Status:** Draft +**Date:** 2026-02-05 +**Branch:** hackathon +**Prerequisite:** Alert enrichment quick wins (commit `cfcee2f`) — CWE/OWASP extraction, auto-generated references, and `detailedReport` markdown are already in place. + +--- + +## Problem Statement + +Customers see alerts with descriptions like *"Generic ad-hoc alert, uploaded by user or produced by system diagnostics"* and a raw rule ID like `python-sql-injection` as the title. This happens because: + +1. **Alert `type` is always `'generic'`** — the Socket Dashboard renders a generic fallback when it doesn't recognize the alert type +2. **Rule messages are sparse** — ~60% of our 499 rules have terse one-liner messages with no remediation context +3. **No human-readable vulnerability name** — customers see `python-sql-injection`, not "SQL Injection" +4. **Missing metadata** — `vulnerability_class`, `likelihood`, `impact`, OWASP (81% of rules lack it), `references` (99.8% lack it), and `fix` (99.6% lack it) are absent from rule definitions + +The enrichment work in `cfcee2f` extracts everything the rules *already provide*, but the rules themselves need to provide more. + +--- + +## Phase Overview + +| Phase | Scope | Effort | Files Changed | Impact | +|-------|-------|--------|---------------|--------| +| **1** | CWE lookup table + connector improvements | Small | 1-2 files | Alerts with a CWE (498/499) get a human-readable title and description | +| **2** | Rule metadata enrichment (all 499 rules) | Medium | 15 YAML files | OWASP, references, vulnerability_class on every alert | +| **3** | Rule message rewrite | Large | 15 YAML files | Every alert explains What/Why/How | +| **4** | Dataflow traces + advanced enrichment | Medium | 1-2 files | Taint-mode alerts show source-to-sink flow | + +--- + +## Phase 1: CWE Lookup Table + Connector Improvements + +**Goal:** Every alert with a CWE (498 of 499 rules) gets a human-readable vulnerability name and description, sourced from a CWE lookup table, with zero rule file changes. The one rule without a CWE (`js-react-missing-key`) will fall back to the raw rule message. + +**Files to change:** +- `socket_basics/core/connector/opengrep/__init__.py` (alert construction block) +- New: `socket_basics/core/connector/opengrep/cwe_catalog.py` (lookup table) + +### 1.1 Create CWE Catalog Lookup Table + +A Python dict mapping CWE IDs to human-readable names and descriptions. 498 of our 499 rules reference one of 90 unique CWEs. The top 20 CWEs by rule count cover 68% of rules (339/498): + +```python +CWE_CATALOG = { + "CWE-327": { + "name": "Broken or Risky Cryptographic Algorithm", + "description": "The code uses a cryptographic algorithm that is known to be weak or insufficient. This may allow attackers to decrypt sensitive data or bypass integrity checks.", + "category": "Cryptographic Weakness", + }, + "CWE-89": { + "name": "SQL Injection", + "description": "User-supplied input is included in a SQL query without proper sanitization, potentially allowing attackers to read, modify, or delete database contents.", + "category": "Injection Vulnerability", + }, + "CWE-798": { + "name": "Hard-coded Credentials", + "description": "Credentials such as passwords, API keys, or cryptographic keys are embedded directly in source code, making them easily discoverable if the code is exposed.", + "category": "Authentication Weakness", + }, + # ... remaining 87 CWEs ... +} +``` + +Full catalog of all 90 CWEs is provided in [Appendix A](#appendix-a-cwe-catalog). + +### 1.2 Use CWE Catalog in Alert Construction + +In the alert construction block (after the existing enrichment code), add: + +```python +from .cwe_catalog import CWE_CATALOG + +# After extracting _cwe from metadata: +_cwe_info = CWE_CATALOG.get(_cwe, {}) + +# Add human-readable fields to props +if _cwe_info: + alert['props']['vulnerabilityName'] = _cwe_info.get('name', '') + alert['props']['vulnerabilityCategory'] = _cwe_info.get('category', '') + # Use CWE description as fallback when rule message is sparse + if len(message) < 60: # sparse message threshold + alert['props']['enrichedDescription'] = _cwe_info.get('description', '') +``` + +### 1.3 Extract Additional Metadata Fields + +Extract fields that some rules already provide but the connector currently ignores: + +```python +# Already extracted: cwe, owasp, subcategory, fix, references, confidence +# Add these: +_vulnerability_class = _metadata.get('vulnerability_class', '') +_likelihood = _metadata.get('likelihood', '') +_impact = _metadata.get('impact', '') +_technology = _metadata.get('technology', '') +_framework = _metadata.get('framework', '') + +if _vulnerability_class: + alert['props']['vulnerabilityClass'] = _vulnerability_class +if _likelihood: + alert['props']['likelihood'] = _likelihood +if _impact: + alert['props']['impact'] = _impact +if _technology: + alert['props']['technology'] = _technology +if _framework: + alert['props']['framework'] = _framework +``` + +### 1.4 Improve `detailedReport` Markdown + +Incorporate CWE catalog data into the markdown report: + +```markdown +## SQL Injection + +**Description:** SQL injection vulnerability detected. User-controlled data flows into +SQL query without proper sanitization. Use parameterized queries with placeholders +(?, %s) to prevent SQL injection. + +**Location:** `app/models/user.py` (line 42) + +```python +cursor.execute("SELECT * FROM users WHERE id = " + user_id) +``` + +**Severity:** critical | **Confidence:** high + +**What is CWE-89?** User-supplied input is included in a SQL query without proper +sanitization, potentially allowing attackers to read, modify, or delete database contents. + +**References:** [CWE-89](https://cwe.mitre.org/data/definitions/89.html) | [OWASP Top 10 A03:2021](https://owasp.org/Top10/A03/) +``` + +The "What is CWE-X?" section is pulled from the CWE catalog and provides context even when the rule message is sparse. + +### 1.5 Acceptance Criteria + +- [ ] Every alert with a CWE has `vulnerabilityName` in props (e.g., "SQL Injection") +- [ ] Every alert with a CWE has `vulnerabilityCategory` in props (e.g., "Injection Vulnerability") +- [ ] Sparse messages (< 60 chars) get `enrichedDescription` from CWE catalog +- [ ] `detailedReport` includes CWE explainer section +- [ ] All 90 CWEs in our rules are covered in the catalog +- [ ] No changes to rule YAML files + +--- + +## Phase 2: Rule Metadata Enrichment + +**Goal:** Add `vulnerability_class`, OWASP mappings, and `references` to all 499 rules across 15 YAML files. + +**Files to change:** All files in `socket_basics/rules/*.yml` + +### 2.1 Add `vulnerability_class` to All Rules + +Map each rule to one of 20 standardized vulnerability class names derived from the existing `subcategory` values and CWE associations: + +| vulnerability_class | Maps From subcategory | Associated CWEs | +|---|---|---| +| Injection Vulnerability | `injection`, `process` | CWE-78, CWE-89, CWE-90, CWE-94, CWE-95, CWE-943 | +| Cross-Site Scripting (XSS) | `xss` | CWE-79 | +| Cryptographic Weakness | `crypto` | CWE-208, CWE-295, CWE-310, CWE-319, CWE-326, CWE-327, CWE-338 | +| Authentication Weakness | `authentication` | CWE-287, CWE-347, CWE-384, CWE-521, CWE-798, CWE-916 | +| Access Control Violation | `access-control` | CWE-22, CWE-601, CWE-639, CWE-862, CWE-863 | +| Security Misconfiguration | `configuration`, `proxy` | CWE-16, CWE-200, CWE-209, CWE-489, CWE-614, CWE-693, CWE-732 | +| Insecure Deserialization | `integrity` | CWE-502 | +| Sensitive Data Exposure | `logging` | CWE-312, CWE-522, CWE-532 | +| Server-Side Request Forgery | `ssrf` | CWE-918 | +| Unrestricted File Upload | `upload` | CWE-434 | +| Insecure File Operation | `file-operations` | CWE-73, CWE-377 | +| XML External Entity (XXE) | — | CWE-611 | +| Denial of Service | `dos` | CWE-400, CWE-1333, CWE-409 | +| Improper Error Handling | `error-handling`, `async` | CWE-396, CWE-703, CWE-755 | +| Type Safety Violation | `type-safety` | CWE-697, CWE-704 | +| Insecure Design | `design` | CWE-20, CWE-307, CWE-330 | +| Memory Safety Violation | `deprecated` (for C/C++) | CWE-119, CWE-120, CWE-131, CWE-190, CWE-415, CWE-416, CWE-476 | +| Template Injection | — | CWE-1336 | +| Prototype Pollution | — | CWE-1321 | +| Unsafe Reflection | — | CWE-470 | + +For rules that currently lack `subcategory`, derive `vulnerability_class` from the CWE using the table above. + +Example rule change: + +```yaml +# Before +- id: java-sql-injection + message: "SQL injection vulnerability detected..." + metadata: + category: security + cwe: CWE-89 + confidence: high + +# After +- id: java-sql-injection + message: "SQL injection vulnerability detected..." + metadata: + category: security + cwe: CWE-89 + confidence: high + subcategory: injection + vulnerability_class: Injection Vulnerability + owasp: "A03:2021" +``` + +### 2.2 Add OWASP Mappings + +Currently 93/499 rules (19%) have OWASP. Target: 100% of applicable rules. + +CWE-to-OWASP mapping table for bulk application: + +| OWASP Category | CWEs | +|---|---| +| A01:2021 (Broken Access Control) | CWE-22, CWE-73, CWE-601, CWE-639, CWE-862, CWE-863, CWE-918 | +| A02:2021 (Cryptographic Failures) | CWE-208, CWE-259, CWE-295, CWE-310, CWE-312, CWE-319, CWE-322, CWE-326, CWE-327, CWE-338, CWE-522, CWE-798, CWE-916 | +| A03:2021 (Injection) | CWE-74, CWE-78, CWE-79, CWE-89, CWE-90, CWE-91, CWE-94, CWE-95, CWE-117, CWE-134, CWE-611, CWE-943, CWE-1321, CWE-1336 | +| A04:2021 (Insecure Design) | CWE-20, CWE-307, CWE-330, CWE-362, CWE-367 | +| A05:2021 (Security Misconfiguration) | CWE-16, CWE-200, CWE-209, CWE-276, CWE-489, CWE-614, CWE-693, CWE-732, CWE-942 | +| A06:2021 (Vulnerable Components) | CWE-477, CWE-1104 | +| A07:2021 (Auth Failures) | CWE-287, CWE-347, CWE-384, CWE-521 | +| A08:2021 (Data Integrity Failures) | CWE-345, CWE-353, CWE-434, CWE-494, CWE-502 | +| A09:2021 (Logging Failures) | CWE-532, CWE-778 | +| A10:2021 (SSRF) | CWE-918 | + +Not all CWEs have a natural OWASP mapping (e.g., CWE-190 Integer Overflow, CWE-416 Use After Free). Memory safety and low-level CWEs (~30 rules, primarily in `c_cpp.yml`) should be left without OWASP rather than forcing an inaccurate mapping. + +### 2.3 Add `subcategory` to Rules Missing It + +Currently 103/499 rules (21%) have `subcategory`. The remaining 396 rules need it. + +Derivation: Use the CWE→subcategory mapping from Phase 2.1 in reverse. For each rule, look up its CWE and assign the corresponding subcategory. + +### 2.4 Add `references` to Rules + +Currently 1/499 rules has explicit `references`. While the connector auto-generates CWE/OWASP URLs, adding references directly to rules enables: +- Framework-specific documentation links +- Language-specific remediation guides +- More targeted reference URLs than generic CWE pages + +Priority: Add references to all rules with `framework` metadata (53 rules) first, linking to the framework's security documentation. + +Example: + +```yaml +- id: js-react-dangerous-html + metadata: + framework: react + references: + - https://react.dev/reference/react-dom/components/common#dangerously-setting-the-inner-html + - https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html +``` + +### 2.5 Acceptance Criteria + +- [ ] All 499 rules have `subcategory` in metadata +- [ ] All 499 rules have `vulnerability_class` in metadata +- [ ] All applicable rules (~470) have `owasp` in metadata +- [ ] All 53 framework-specific rules have `references` with framework docs +- [ ] All existing tests pass (if any) + +--- + +## Phase 3: Rule Message Rewrite + +**Goal:** Every rule message follows the "What/Why/How" pattern so customers immediately understand the finding and what to do about it. + +**Files to change:** All files in `socket_basics/rules/*.yml` + +### 3.1 Message Format Standard + +Every rule `message` should follow this three-part structure: + +``` +{What is wrong}. {Why it matters}. {How to fix it}. +``` + +**Example — sparse message (before):** +```yaml +message: "Route handler missing authentication/authorization check" +``` + +**Example — improved message (after):** +```yaml +message: >- + Route handler is missing an authentication or authorization check. Without + access control, any user can invoke this endpoint and access or modify + protected resources. Add an authentication decorator (e.g., @login_required) + or middleware check before processing the request. +``` + +### 3.2 Message Length Guidelines + +| Severity | Target Length | Rationale | +|----------|-------------|-----------| +| Critical | 2-4 sentences (150-300 chars) | Urgent, needs clear remediation | +| High | 2-3 sentences (120-250 chars) | Important, needs fix guidance | +| Medium | 1-3 sentences (80-200 chars) | Informational with context | +| Low | 1-2 sentences (60-150 chars) | Awareness, minimal action | + +### 3.3 Prioritization + +Rewrite messages in this order: + +1. **Critical severity rules** (most customer-visible, highest urgency) — 95 rules +2. **High severity rules with sparse messages** — 184 rules +3. **Medium severity rules** — 159 rules +4. **Low severity rules** — 61 rules + +Within each severity tier, prioritize by CWE frequency (rules for CWE-89, CWE-78, CWE-79 first since they fire most often). + +### 3.4 Add `fix` Metadata + +Currently 2/499 rules have `fix`. The `fix` field provides a concise remediation instruction that appears separately from the message in the `detailedReport`. + +Target: Add `fix` to all critical and high severity rules (279 rules). + +Example: + +```yaml +- id: python-sql-injection + metadata: + fix: >- + Use parameterized queries with placeholders (cursor.execute("SELECT * FROM + users WHERE id = %s", (user_id,))). For ORMs like SQLAlchemy or Django, + use the query builder API instead of raw SQL. +``` + +The `fix` field should be **language-specific** and **actionable** — not a repeat of the message, but a concrete code-level instruction. + +### 3.5 Acceptance Criteria + +- [ ] All critical rules have 2-4 sentence messages with What/Why/How +- [ ] All high rules have 2-3 sentence messages with What/Why/How +- [ ] All critical + high rules have `fix` metadata +- [ ] No rule has a message shorter than 60 characters +- [ ] All existing tests pass + +--- + +## Phase 4: Dataflow Traces + Advanced Enrichment + +**Goal:** For taint-mode rules (SQL injection, command injection, XSS, etc.), show the data flow path from source to sink in the alert. + +**Files to change:** +- `socket_basics/core/connector/opengrep/__init__.py` (CLI invocation + alert construction) + +### 4.1 Enable `--dataflow-traces` Flag + +Add the flag to the OpenGrep CLI invocation: + +```python +# In the command construction (around line 162-174): +cmd = ['opengrep', '--json', '--dataflow-traces', '--output', out_file] +``` + +This causes OpenGrep to include `extra.dataflow_trace` in results for taint-mode rules, with structure: + +```json +{ + "extra": { + "dataflow_trace": { + "taint_source": { + "location": { "path": "...", "start": {...}, "end": {...} }, + "content": "request.args.get('id')" + }, + "intermediate_vars": [ + { + "location": { "path": "...", "start": {...}, "end": {...} }, + "content": "user_id = request.args.get('id')" + } + ], + "taint_sink": { + "location": { "path": "...", "start": {...}, "end": {...} }, + "content": "cursor.execute(query + user_id)" + } + } + } +} +``` + +### 4.2 Extract and Format Dataflow Trace + +In the alert construction block, after existing enrichment: + +```python +_dataflow = (r.get('extra') or {}).get('dataflow_trace', {}) +if _dataflow: + _source = _dataflow.get('taint_source', {}) + _sink = _dataflow.get('taint_sink', {}) + _intermediates = _dataflow.get('intermediate_vars', []) + + alert['props']['dataflowTrace'] = { + 'source': { + 'content': _source.get('content', ''), + 'location': _format_location(_source.get('location', {})), + }, + 'sink': { + 'content': _sink.get('content', ''), + 'location': _format_location(_sink.get('location', {})), + }, + 'intermediates': [ + { + 'content': v.get('content', ''), + 'location': _format_location(v.get('location', {})), + } + for v in _intermediates + ], + } +``` + +### 4.3 Add Dataflow to `detailedReport` + +For alerts with a dataflow trace, append a "Data Flow" section to the markdown: + +```markdown +### Data Flow + +1. **Source** (`app/routes.py:12`): + ```python + user_id = request.args.get('id') + ``` + +2. **Intermediate** (`app/routes.py:15`): + ```python + query = "SELECT * FROM users WHERE id = " + user_id + ``` + +3. **Sink** (`app/routes.py:16`): + ```python + cursor.execute(query) + ``` +``` + +This turns an abstract "SQL injection detected" into a concrete story: *here is where the user input enters, here is where it flows, and here is where it reaches the dangerous operation*. + +### 4.4 Verify Performance Impact + +The `--dataflow-traces` flag adds overhead to OpenGrep's analysis. Measure: +- Scan time on `app_tests/python` (baseline vs. with flag) +- Scan time on a larger real-world codebase +- Output JSON size increase + +If overhead is >20% scan time increase, make the flag configurable via a config parameter (e.g., `opengrep_dataflow_traces: true/false`). + +### 4.5 Acceptance Criteria + +- [ ] `--dataflow-traces` is passed to OpenGrep +- [ ] Taint-mode alerts include `dataflowTrace` in props +- [ ] `detailedReport` includes "Data Flow" section for taint alerts +- [ ] Performance impact measured and documented +- [ ] Flag is configurable if overhead is significant + +--- + +## Appendix A: CWE Catalog + +Complete lookup table for all 90 CWEs referenced in our rules, sorted by frequency. + +| CWE | Name | Customer Description | Category | Rules | +|-----|------|---------------------|----------|-------| +| CWE-327 | Broken or Risky Cryptographic Algorithm | The code uses a cryptographic algorithm that is known to be weak or insufficient. This may allow attackers to decrypt sensitive data or bypass integrity checks. | Cryptographic Weakness | 46 | +| CWE-89 | SQL Injection | User-supplied input is included in a SQL query without proper sanitization, potentially allowing attackers to read, modify, or delete database contents. | Injection Vulnerability | 32 | +| CWE-798 | Hard-coded Credentials | Credentials such as passwords, API keys, or cryptographic keys are embedded directly in source code, making them easily discoverable if the code is exposed. | Authentication Weakness | 32 | +| CWE-79 | Cross-Site Scripting (XSS) | User-supplied data is rendered in a web page without proper escaping, potentially allowing attackers to inject malicious scripts that execute in other users' browsers. | Injection Vulnerability | 21 | +| CWE-94 | Code Injection | User-controlled input is passed to a code evaluation function, potentially allowing attackers to execute arbitrary code on the server. | Injection Vulnerability | 20 | +| CWE-78 | OS Command Injection | User-supplied input is incorporated into an operating system command without proper sanitization, potentially allowing attackers to execute arbitrary system commands. | Injection Vulnerability | 19 | +| CWE-295 | Improper Certificate Validation | The application does not properly validate TLS/SSL certificates, which could allow attackers to intercept encrypted communications via man-in-the-middle attacks. | Cryptographic Weakness | 18 | +| CWE-22 | Path Traversal | User input is used to construct a file path without proper validation, potentially allowing attackers to access files outside the intended directory. | Access Control Violation | 18 | +| CWE-502 | Insecure Deserialization | The application deserializes data from an untrusted source without validation, which can lead to remote code execution or denial of service. | Insecure Deserialization | 17 | +| CWE-338 | Weak PRNG | The code uses a non-cryptographic random number generator for security-sensitive operations, producing predictable values that an attacker could guess. | Cryptographic Weakness | 17 | +| CWE-319 | Cleartext Transmission | Sensitive data is transmitted over an unencrypted channel, allowing network attackers to intercept and read the information. | Cryptographic Weakness | 16 | +| CWE-532 | Sensitive Information in Logs | Sensitive data such as passwords or tokens is written to log files, where it may be accessible to unauthorized parties. | Sensitive Data Exposure | 14 | +| CWE-601 | Open Redirect | The application redirects users to a URL from user input without validation, which can be exploited for phishing. | Access Control Violation | 13 | +| CWE-489 | Active Debug Code | Debug code or development-only features are left enabled in production, potentially exposing sensitive information. | Security Misconfiguration | 10 | +| CWE-862 | Missing Authorization | The application does not perform authorization checks before granting access to a resource, allowing unauthorized actions. | Access Control Violation | 9 | +| CWE-434 | Unrestricted File Upload | The application allows file uploads without validating type or content, potentially enabling upload of malicious code. | Unrestricted File Upload | 9 | +| CWE-732 | Incorrect Permission Assignment | Resources are created with overly permissive access rights, potentially exposing them to unauthorized access. | Security Misconfiguration | 7 | +| CWE-614 | Sensitive Cookie Without 'Secure' Flag | A sensitive cookie may be transmitted over unencrypted HTTP connections, making it interceptable by attackers. | Security Misconfiguration | 7 | +| CWE-352 | Cross-Site Request Forgery (CSRF) | The application does not verify that requests originated from its own interface, allowing forged requests from malicious sites. | Access Control Violation | 7 | +| CWE-347 | Improper Cryptographic Signature Verification | The application does not properly verify digital signatures, potentially allowing attackers to tamper with data. | Cryptographic Weakness | 7 | +| CWE-90 | LDAP Injection | User-supplied input is included in an LDAP query without sanitization, potentially allowing attackers to modify query logic. | Injection Vulnerability | 6 | +| CWE-703 | Improper Exception Handling | The application does not properly handle errors, which may lead to unexpected behavior or information disclosure. | Improper Error Handling | 6 | +| CWE-611 | XML External Entity (XXE) | The application parses XML that can reference external entities, potentially allowing attackers to read files or perform SSRF. | Injection Vulnerability | 6 | +| CWE-200 | Information Exposure | The application exposes sensitive information such as internal paths or configuration details to unauthorized users. | Security Misconfiguration | 6 | +| CWE-20 | Improper Input Validation | The application does not sufficiently validate user input, potentially allowing malformed data to trigger vulnerabilities. | Insecure Design | 5 | +| CWE-943 | NoSQL Injection | User-supplied input is included in a NoSQL query without sanitization, potentially allowing attackers to manipulate query logic. | Injection Vulnerability | 4 | +| CWE-755 | Improper Exception Handling | The application fails to properly handle unexpected situations, potentially leading to crashes or exploitable behavior. | Improper Error Handling | 4 | +| CWE-400 | Uncontrolled Resource Consumption | The application does not limit resource usage, making it vulnerable to denial-of-service attacks. | Denial of Service | 4 | +| CWE-377 | Insecure Temporary File | Temporary files are created insecurely, potentially allowing attackers to read or replace them. | Insecure File Operation | 4 | +| CWE-16 | Insecure Configuration | The application uses an insecure configuration that may weaken its security posture. | Security Misconfiguration | 4 | +| CWE-918 | Server-Side Request Forgery (SSRF) | The application fetches a remote resource using a user-controlled URL, allowing attackers to make requests to unintended destinations. | Server-Side Request Forgery | 3 | +| CWE-778 | Insufficient Logging | The application does not adequately log security events, making incident detection difficult. | Sensitive Data Exposure | 3 | +| CWE-639 | Insecure Direct Object Reference (IDOR) | A user-supplied identifier is used to look up resources without authorization checks, enabling unauthorized access. | Access Control Violation | 3 | +| CWE-521 | Weak Password Requirements | The application does not enforce strong password policies, making accounts vulnerable to brute-force attacks. | Authentication Weakness | 3 | +| CWE-416 | Use After Free | The application accesses memory after it has been freed, which can lead to crashes or code execution. | Memory Safety Violation | 3 | +| CWE-415 | Double Free | The application frees memory more than once, which can corrupt memory and allow code execution. | Memory Safety Violation | 3 | +| CWE-367 | TOCTOU Race Condition | A resource is checked and then used in separate operations, creating a window for attacker manipulation. | Insecure Design | 3 | +| CWE-326 | Inadequate Encryption Strength | Encryption uses an insufficient key length, making brute-force decryption feasible. | Cryptographic Weakness | 3 | +| CWE-310 | Cryptographic Issues | The application contains a general cryptographic weakness that may undermine data protection. | Cryptographic Weakness | 3 | +| CWE-287 | Improper Authentication | The application does not properly verify user identity, potentially allowing unauthorized access. | Authentication Weakness | 3 | +| CWE-250 | Execution with Unnecessary Privileges | The application runs with more permissions than required, increasing exploit impact. | Security Misconfiguration | 3 | +| CWE-209 | Error Message Information Leak | Error messages include sensitive details that could help an attacker plan further attacks. | Security Misconfiguration | 3 | +| CWE-190 | Integer Overflow | An arithmetic operation exceeds the integer range, potentially leading to buffer overflows or logic errors. | Memory Safety Violation | 3 | +| CWE-134 | Format String Vulnerability | User-supplied input is used as a format string, potentially allowing attackers to read or write memory. | Injection Vulnerability | 3 | +| CWE-1333 | ReDoS (Regular Expression Denial of Service) | A regular expression can be exploited with crafted input to cause catastrophic backtracking. | Denial of Service | 3 | +| CWE-120 | Buffer Overflow | Data is copied into a fixed-size buffer without checking input length, potentially enabling code execution. | Memory Safety Violation | 3 | +| CWE-119 | Buffer Overrun | Operations read or write beyond memory boundaries, which can cause crashes or enable code execution. | Memory Safety Violation | 3 | +| CWE-74 | Injection | User-supplied input is passed to a downstream interpreter without sanitization. | Injection Vulnerability | 2 | +| CWE-704 | Incorrect Type Conversion | An unsafe type conversion may lead to data truncation or memory corruption. | Type Safety Violation | 2 | +| CWE-697 | Incorrect Comparison | A flawed comparison can lead to logic bypasses or security check circumvention. | Type Safety Violation | 2 | +| CWE-693 | Protection Mechanism Failure | A security mechanism is absent or bypassable, reducing the application's security. | Security Misconfiguration | 2 | +| CWE-494 | Download Without Integrity Check | Code or updates are downloaded without verifying integrity, allowing supply of malicious code. | Insecure Deserialization | 2 | +| CWE-409 | Decompression Bomb | Compressed data is processed without size limits, potentially causing resource exhaustion. | Denial of Service | 2 | +| CWE-401 | Memory Leak | Allocated memory is never released, potentially leading to resource exhaustion. | Memory Safety Violation | 2 | +| CWE-384 | Session Fixation | Session identifiers are not regenerated after authentication, enabling session hijacking. | Authentication Weakness | 2 | +| CWE-362 | Race Condition | Shared resources are accessed without synchronization, potentially leading to data corruption. | Insecure Design | 2 | +| CWE-353 | Missing Integrity Check | Data is accepted without verifying integrity, allowing in-transit tampering. | Insecure Deserialization | 2 | +| CWE-322 | Key Exchange Without Authentication | Cryptographic key exchange lacks entity authentication, enabling man-in-the-middle attacks. | Cryptographic Weakness | 2 | +| CWE-312 | Cleartext Storage of Sensitive Information | Sensitive data is stored in plaintext, readable by anyone with storage access. | Sensitive Data Exposure | 2 | +| CWE-307 | Excessive Authentication Attempts | Failed login attempts are not limited, enabling brute-force attacks. | Insecure Design | 2 | +| CWE-276 | Incorrect Default Permissions | Resources are created with overly permissive defaults. | Security Misconfiguration | 2 | +| CWE-248 | Uncaught Exception | Unhandled exceptions may cause crashes or information disclosure. | Improper Error Handling | 2 | +| CWE-1336 | Template Injection | User input in template expressions can enable server-side code execution. | Template Injection | 2 | +| CWE-98 | Remote File Inclusion | User input controls which file is loaded, potentially enabling remote code execution. | Injection Vulnerability | 1 | +| CWE-95 | Eval Injection | User-supplied input is passed to eval(), allowing arbitrary code execution. | Injection Vulnerability | 1 | +| CWE-942 | Permissive CORS Policy | Overly permissive CORS allows malicious sites to access sensitive data. | Security Misconfiguration | 1 | +| CWE-926 | Improper Android Component Export | An Android component is exported without access restrictions. | Security Misconfiguration | 1 | +| CWE-916 | Weak Password Hashing | Passwords are hashed with a fast or weak algorithm, making cracking feasible. | Authentication Weakness | 1 | +| CWE-915 | Mass Assignment | Users can set arbitrary object attributes through input binding. | Access Control Violation | 1 | +| CWE-91 | XML/XPath Injection | User input in XML or XPath queries can alter query logic. | Injection Vulnerability | 1 | +| CWE-88 | Argument Injection | User input is passed as command arguments without delimiter neutralization. | Injection Vulnerability | 1 | +| CWE-863 | Incorrect Authorization | Authorization checks are implemented incorrectly, allowing unauthorized access. | Access Control Violation | 1 | +| CWE-73 | External File Path Control | User input determines which file to access, enabling arbitrary file operations. | Insecure File Operation | 1 | +| CWE-667 | Improper Locking | Lock mismanagement may lead to deadlocks or race conditions. | Insecure Design | 1 | +| CWE-522 | Insufficiently Protected Credentials | Credentials are stored or transmitted with inadequate protection. | Sensitive Data Exposure | 1 | +| CWE-479 | Signal Handler Safety | A signal handler calls a non-reentrant function, causing undefined behavior. | Memory Safety Violation | 1 | +| CWE-477 | Obsolete Function | A deprecated function with known weaknesses is used. | Security Misconfiguration | 1 | +| CWE-476 | NULL Pointer Dereference | A NULL pointer is used, causing a crash or undefined behavior. | Memory Safety Violation | 1 | +| CWE-470 | Unsafe Reflection | User input selects classes dynamically, enabling arbitrary code execution. | Injection Vulnerability | 1 | +| CWE-396 | Generic Exception Catch | Catching broad exceptions may mask security-relevant failures. | Improper Error Handling | 1 | +| CWE-345 | Insufficient Data Authenticity | Data is accepted without verifying its source or integrity. | Insecure Deserialization | 1 | +| CWE-330 | Insufficient Randomness | Random values are not unpredictable enough for their security context. | Insecure Design | 1 | +| CWE-259 | Hard-coded Password | A password is embedded in source code, easily discoverable and unchangeable. | Authentication Weakness | 1 | +| CWE-242 | Inherently Dangerous Function | An inherently unsafe function is called that cannot be used securely. | Memory Safety Violation | 1 | +| CWE-208 | Timing Side Channel | Timing differences in responses may allow attackers to extract secrets. | Cryptographic Weakness | 1 | +| CWE-1321 | Prototype Pollution | User input modifies JavaScript object prototypes, altering application logic. | Prototype Pollution | 1 | +| CWE-131 | Incorrect Buffer Size Calculation | Buffer size miscalculation can lead to overflow and code execution. | Memory Safety Violation | 1 | +| CWE-117 | Log Injection | Unsanitized input in logs allows forged entries or malicious content. | Sensitive Data Exposure | 1 | +| CWE-1104 | Unmaintained Third-Party Components | A dependency is no longer maintained, leaving vulnerabilities unpatched. | Security Misconfiguration | 1 | +| CWE-1059 | Insufficient Documentation | Inadequate code documentation makes security issues harder to find and fix. | Security Misconfiguration | 1 | + +## Appendix B: Subcategory-to-Vulnerability-Class Mapping + +For rules that already have `subcategory`, this table drives `vulnerability_class` assignment: + +| subcategory | vulnerability_class | Rule Count | +|---|---|---| +| injection | Injection Vulnerability | 21 | +| crypto | Cryptographic Weakness | 16 | +| configuration | Security Misconfiguration | 11 | +| access-control | Access Control Violation | 9 | +| authentication | Authentication Weakness | 8 | +| error-handling | Improper Error Handling | 5 | +| integrity | Insecure Deserialization | 5 | +| design | Insecure Design | 5 | +| logging | Sensitive Data Exposure | 4 | +| deprecated | Memory Safety Violation | 3 | +| dos | Denial of Service | 3 | +| file-operations | Insecure File Operation | 2 | +| ssrf | Server-Side Request Forgery | 2 | +| type-safety | Type Safety Violation | 2 | +| upload | Unrestricted File Upload | 2 | +| xss | Cross-Site Scripting (XSS) | 1 | +| process | Injection Vulnerability | 1 | +| async | Improper Error Handling | 1 | +| proxy | Security Misconfiguration | 1 | +| performance | *(exclude — not security)* | 1 | + +## Appendix C: Framework Metadata + +53 rules have `framework` metadata across 32 unique frameworks. These should receive framework-specific `references` URLs in Phase 2.4. + +| Framework | Rules | Documentation Link (for references) | +|---|---|---| +| phoenix | 8 | https://hexdocs.pm/phoenix/ | +| play | 4 | https://www.playframework.com/documentation/ | +| rails | 4 | https://guides.rubyonrails.org/security.html | +| aspnet | 2 | https://learn.microsoft.com/en-us/aspnet/core/security/ | +| express | 2 | https://expressjs.com/en/advanced/best-practice-security.html | +| jpa | 2 | https://docs.oracle.com/javaee/7/tutorial/persistence-intro.htm | +| react | 2 | https://react.dev/reference/react-dom/ | +| spring | 2 | https://docs.spring.io/spring-security/reference/ | +| otp | 2 | https://www.erlang.org/doc/design_principles/ | +| cowboy | 2 | https://ninenines.eu/docs/en/cowboy/ | +| coredata | 2 | https://developer.apple.com/documentation/coredata | +| *(22 others)* | 1 each | *(framework-specific docs)* | diff --git a/scripts/enrich_rules.py b/scripts/enrich_rules.py new file mode 100644 index 0000000..0cd3cda --- /dev/null +++ b/scripts/enrich_rules.py @@ -0,0 +1,400 @@ +#!/usr/bin/env python3 +""" +Phase 2: Enrich SAST rules with missing metadata fields. + +This script adds the following metadata fields to rules in socket_basics/rules/: + - subcategory (derived from CWE) + - vulnerability_class (derived from subcategory) + - owasp (derived from CWE) + - references (for framework-specific rules) + +It uses a string-based approach to insert lines into metadata blocks without +reformatting the entire YAML file, preserving comments and formatting. + +Usage: + python scripts/enrich_rules.py +""" + +import os +import re +import sys +import yaml + +# --------------------------------------------------------------------------- +# Mapping tables +# --------------------------------------------------------------------------- + +CWE_TO_SUBCATEGORY = { + # injection + "CWE-78": "injection", "CWE-79": "xss", "CWE-89": "injection", + "CWE-90": "injection", "CWE-91": "injection", "CWE-94": "injection", + "CWE-95": "injection", "CWE-98": "injection", "CWE-74": "injection", + "CWE-88": "injection", "CWE-134": "injection", "CWE-943": "injection", + "CWE-915": "injection", "CWE-1321": "injection", "CWE-1336": "injection", + "CWE-470": "injection", "CWE-611": "injection", "CWE-117": "injection", + # crypto + "CWE-208": "crypto", "CWE-295": "crypto", "CWE-310": "crypto", + "CWE-319": "crypto", "CWE-322": "crypto", "CWE-326": "crypto", + "CWE-327": "crypto", "CWE-330": "crypto", "CWE-338": "crypto", + "CWE-347": "crypto", + # authentication + "CWE-259": "authentication", "CWE-287": "authentication", + "CWE-384": "authentication", "CWE-521": "authentication", + "CWE-798": "authentication", "CWE-916": "authentication", + # access-control + "CWE-22": "access-control", "CWE-73": "access-control", + "CWE-601": "access-control", "CWE-639": "access-control", + "CWE-862": "access-control", "CWE-863": "access-control", + "CWE-915": "access-control", # mass assignment could go here too + # configuration + "CWE-16": "configuration", "CWE-200": "configuration", + "CWE-209": "configuration", "CWE-250": "configuration", + "CWE-276": "configuration", "CWE-489": "configuration", + "CWE-614": "configuration", "CWE-693": "configuration", + "CWE-732": "configuration", "CWE-926": "configuration", + "CWE-942": "configuration", "CWE-477": "configuration", + "CWE-1104": "configuration", "CWE-1059": "configuration", + # integrity + "CWE-345": "integrity", "CWE-353": "integrity", + "CWE-494": "integrity", "CWE-502": "integrity", + # logging + "CWE-532": "logging", "CWE-778": "logging", + # error-handling + "CWE-248": "error-handling", "CWE-396": "error-handling", + "CWE-703": "error-handling", "CWE-755": "error-handling", + # design + "CWE-20": "design", "CWE-307": "design", "CWE-362": "design", + "CWE-367": "design", "CWE-667": "design", "CWE-697": "design", + "CWE-704": "design", + # dos + "CWE-400": "dos", "CWE-409": "dos", "CWE-1333": "dos", + # file-operations + "CWE-377": "file-operations", + # upload + "CWE-434": "upload", + # ssrf + "CWE-918": "ssrf", + # deprecated (memory safety for C/C++) + "CWE-119": "deprecated", "CWE-120": "deprecated", "CWE-131": "deprecated", + "CWE-190": "deprecated", "CWE-242": "deprecated", "CWE-401": "deprecated", + "CWE-415": "deprecated", "CWE-416": "deprecated", "CWE-476": "deprecated", + "CWE-479": "deprecated", + # misc + "CWE-312": "crypto", "CWE-522": "crypto", + "CWE-352": "access-control", +} + +SUBCATEGORY_TO_VULN_CLASS = { + "injection": "Injection Vulnerability", + "xss": "Cross-Site Scripting (XSS)", + "crypto": "Cryptographic Weakness", + "authentication": "Authentication Weakness", + "access-control": "Access Control Violation", + "configuration": "Security Misconfiguration", + "integrity": "Insecure Deserialization", + "logging": "Sensitive Data Exposure", + "error-handling": "Improper Error Handling", + "design": "Insecure Design", + "dos": "Denial of Service", + "file-operations": "Insecure File Operation", + "upload": "Unrestricted File Upload", + "ssrf": "Server-Side Request Forgery", + "deprecated": "Memory Safety Violation", + "async": "Improper Error Handling", + "process": "Injection Vulnerability", + "proxy": "Security Misconfiguration", + "type-safety": "Insecure Design", + "performance": "Other", +} + +CWE_TO_OWASP = { + # A01:2021 — Broken Access Control + **{f"CWE-{n}": "A01:2021" for n in [22, 23, 35, 59, 200, 219, 264, 275, 276, 284, 285, 352, 359, 377, 402, 425, 441, 497, 538, 540, 548, 552, 566, 601, 639, 651, 668, 706, 862, 863, 913, 922, 1275]}, + # A02:2021 — Cryptographic Failures + **{f"CWE-{n}": "A02:2021" for n in [261, 296, 310, 319, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 335, 336, 337, 338, 340, 347, 523, 720, 757, 759, 760, 780, 798, 916]}, + # A03:2021 — Injection + **{f"CWE-{n}": "A03:2021" for n in [20, 74, 75, 77, 78, 79, 80, 83, 87, 88, 89, 90, 91, 93, 94, 95, 96, 97, 98, 99, 100, 113, 116, 117, 134, 138, 184, 470, 471, 564, 610, 643, 644, 652, 917, 943, 1236, 1321, 1336]}, + # A04:2021 — Insecure Design + **{f"CWE-{n}": "A04:2021" for n in [73, 183, 209, 213, 235, 256, 257, 266, 269, 280, 311, 312, 313, 316, 419, 430, 434, 444, 451, 472, 501, 522, 525, 539, 579, 598, 602, 642, 646, 650, 653, 656, 657, 799, 807, 840, 841, 927, 1021, 1173]}, + # A05:2021 — Security Misconfiguration + **{f"CWE-{n}": "A05:2021" for n in [2, 11, 13, 15, 16, 260, 315, 489, 497, 520, 526, 537, 541, 547, 611, 614, 693, 732, 756, 776, 942, 1004, 1032, 1174]}, + # A06:2021 — Vulnerable and Outdated Components + **{f"CWE-{n}": "A06:2021" for n in [477, 1104, 1059]}, + # A07:2021 — Auth Failures + **{f"CWE-{n}": "A07:2021" for n in [255, 259, 287, 288, 290, 294, 295, 297, 300, 302, 304, 306, 307, 346, 384, 521, 613, 620, 640, 798, 940, 1216]}, + # A08:2021 — Data Integrity Failures + **{f"CWE-{n}": "A08:2021" for n in [345, 353, 426, 494, 502, 565, 784, 829, 830, 915]}, + # A09:2021 — Logging Failures + **{f"CWE-{n}": "A09:2021" for n in [117, 223, 532, 778]}, + # A10:2021 — SSRF + **{f"CWE-{n}": "A10:2021" for n in [918]}, +} + +FRAMEWORK_REFS = { + "phoenix": ["https://hexdocs.pm/phoenix/security.html"], + "play": ["https://www.playframework.com/documentation/latest/SecurityHeaders"], + "rails": ["https://guides.rubyonrails.org/security.html"], + "aspnet": ["https://learn.microsoft.com/en-us/aspnet/core/security/"], + "aspnetcore": ["https://learn.microsoft.com/en-us/aspnet/core/security/"], + "express": ["https://expressjs.com/en/advanced/best-practice-security.html"], + "react": ["https://react.dev/reference/react-dom/components/common"], + "spring": ["https://docs.spring.io/spring-security/reference/"], + "django": ["https://docs.djangoproject.com/en/stable/topics/security/"], + "flask": ["https://flask.palletsprojects.com/en/stable/security/"], + "laravel": ["https://laravel.com/docs/master/security"], + "symfony": ["https://symfony.com/doc/current/security.html"], + "wordpress": ["https://developer.wordpress.org/advanced-administration/security/"], + "nextjs": ["https://nextjs.org/docs/app/building-your-application/authentication"], + "otp": ["https://www.erlang.org/doc/design_principles/"], + "cowboy": ["https://ninenines.eu/docs/en/cowboy/"], + "coredata": ["https://developer.apple.com/documentation/coredata"], + "swiftui": ["https://developer.apple.com/documentation/swiftui"], + "rocket": ["https://rocket.rs/guide/"], + "tokio": ["https://tokio.rs/tokio/tutorial"], + "actix": ["https://actix.rs/docs/"], + "warp": ["https://docs.rs/warp/"], + "diesel": ["https://diesel.rs/guides/"], +} + + +def parse_metadata_from_yaml(filepath): + """Parse YAML file and return list of rule metadata dicts with rule IDs.""" + with open(filepath) as f: + data = yaml.safe_load(f) + rules = [] + for rule in data.get("rules", []): + meta = rule.get("metadata", {}) + rules.append({ + "id": rule.get("id", ""), + "metadata": meta, + }) + return rules + + +def find_metadata_blocks(content): + """ + Find all metadata blocks in the YAML content. + + Returns a list of (start, end, indent, fields_dict) tuples where: + - start: byte offset of 'metadata:' line + - end: byte offset of the end of the metadata block + - indent: the indentation string of the metadata key (e.g., ' ') + - fields_dict: dict of field_name -> True for existing fields + """ + blocks = [] + lines = content.split("\n") + i = 0 + while i < len(lines): + line = lines[i] + # Match a metadata: line (must be at proper indentation level) + m = re.match(r"^(\s+)metadata:\s*$", line) + if m: + meta_indent = m.group(1) + field_indent = meta_indent + " " + start_line = i + fields = {} + j = i + 1 + # Collect all fields in this metadata block + while j < len(lines): + fline = lines[j] + # Empty line or comment at field indent level -- keep going + if fline.strip() == "" or fline.strip().startswith("#"): + # Check if next non-empty line is still in the metadata block + # An empty line signals end of rule in most YAML files + # But we need to handle references list items + if fline.strip() == "": + # End of metadata block + break + j += 1 + continue + # Check if this line is a field in the metadata block + fm = re.match(r"^" + re.escape(field_indent) + r"(\w[\w_-]*):", fline) + if fm: + fields[fm.group(1)] = True + j += 1 + # If this is a list field (like references), skip list items + while j < len(lines): + list_line = lines[j] + if re.match(r"^" + re.escape(field_indent) + r" - ", list_line): + j += 1 + else: + break + continue + else: + # Line doesn't match field pattern -- end of metadata block + break + blocks.append((start_line, j, meta_indent, fields)) + i = j + else: + i += 1 + return blocks + + +def enrich_file(filepath, stats): + """Enrich a single YAML file with missing metadata fields.""" + with open(filepath) as f: + content = f.read() + + # Parse with PyYAML to get structured metadata + rules_meta = parse_metadata_from_yaml(filepath) + + # Find metadata blocks in the raw text + blocks = find_metadata_blocks(content) + + if len(blocks) != len(rules_meta): + print(f"WARNING: {filepath}: found {len(blocks)} metadata blocks but " + f"{len(rules_meta)} rules. Skipping file.") + return content + + lines = content.split("\n") + insertions = [] # (line_number, lines_to_insert) + + for idx, (start_line, end_line, meta_indent, existing_fields) in enumerate(blocks): + rule = rules_meta[idx] + meta = rule["metadata"] + rule_id = rule["id"] + field_indent = meta_indent + " " + new_lines = [] + + cwe = str(meta.get("cwe", "")) + existing_subcategory = meta.get("subcategory") + existing_owasp = meta.get("owasp") + existing_vuln_class = meta.get("vulnerability_class") + existing_references = meta.get("references") + existing_framework = meta.get("framework") + + # 1. Add subcategory if missing + subcategory = existing_subcategory + if "subcategory" not in existing_fields and cwe in CWE_TO_SUBCATEGORY: + subcategory = CWE_TO_SUBCATEGORY[cwe] + new_lines.append(f"{field_indent}subcategory: {subcategory}") + stats["subcategory_added"] += 1 + + # 2. Add vulnerability_class if missing + if "vulnerability_class" not in existing_fields: + # Use the subcategory (existing or just computed) to look up vuln class + sc = subcategory or existing_subcategory + if sc and sc in SUBCATEGORY_TO_VULN_CLASS: + vc = SUBCATEGORY_TO_VULN_CLASS[sc] + new_lines.append(f"{field_indent}vulnerability_class: \"{vc}\"") + stats["vulnerability_class_added"] += 1 + + # 3. Add owasp if missing + if "owasp" not in existing_fields and cwe in CWE_TO_OWASP: + owasp = CWE_TO_OWASP[cwe] + new_lines.append(f"{field_indent}owasp: \"{owasp}\"") + stats["owasp_added"] += 1 + + # 4. Add references for framework rules if missing + if ("references" not in existing_fields and existing_framework + and str(existing_framework) in FRAMEWORK_REFS): + refs = FRAMEWORK_REFS[str(existing_framework)] + new_lines.append(f"{field_indent}references:") + for ref in refs: + new_lines.append(f"{field_indent} - {ref}") + stats["references_added"] += 1 + + if new_lines: + # Insert new lines just before the end of the metadata block + # (i.e., right before the blank line or next rule) + insertions.append((end_line, new_lines)) + + if not insertions: + return content + + # Apply insertions in reverse order to preserve line numbers + for insert_line, new_lines in reversed(insertions): + lines[insert_line:insert_line] = new_lines + + return "\n".join(lines) + + +def main(): + rules_dir = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), + "socket_basics", "rules") + + if not os.path.isdir(rules_dir): + print(f"ERROR: Rules directory not found: {rules_dir}") + sys.exit(1) + + stats = { + "subcategory_added": 0, + "vulnerability_class_added": 0, + "owasp_added": 0, + "references_added": 0, + "files_processed": 0, + "rules_processed": 0, + } + + yml_files = sorted(f for f in os.listdir(rules_dir) if f.endswith(".yml")) + print(f"Found {len(yml_files)} YAML files in {rules_dir}") + + for fname in yml_files: + filepath = os.path.join(rules_dir, fname) + rules_meta = parse_metadata_from_yaml(filepath) + stats["rules_processed"] += len(rules_meta) + + enriched = enrich_file(filepath, stats) + with open(filepath, "w") as f: + f.write(enriched) + stats["files_processed"] += 1 + print(f" Processed {fname} ({len(rules_meta)} rules)") + + print(f"\n{'='*60}") + print(f"Enrichment complete!") + print(f"{'='*60}") + print(f"Files processed: {stats['files_processed']}") + print(f"Rules processed: {stats['rules_processed']}") + print(f"subcategory added: {stats['subcategory_added']}") + print(f"vulnerability_class added:{stats['vulnerability_class_added']}") + print(f"owasp added: {stats['owasp_added']}") + print(f"references added: {stats['references_added']}") + + # Verify all files still parse + print(f"\nVerifying YAML syntax...") + errors = 0 + for fname in yml_files: + filepath = os.path.join(rules_dir, fname) + try: + with open(filepath) as f: + data = yaml.safe_load(f) + if not data or "rules" not in data: + print(f" WARNING: {fname} has no 'rules' key") + errors += 1 + except yaml.YAMLError as e: + print(f" ERROR: {fname} failed to parse: {e}") + errors += 1 + + if errors: + print(f"\n{errors} file(s) had issues!") + sys.exit(1) + else: + print(f"All {len(yml_files)} files parse successfully.") + + # Count final field coverage + print(f"\nFinal field coverage:") + total_rules = 0 + field_counts = { + "subcategory": 0, + "vulnerability_class": 0, + "owasp": 0, + "references": 0, + } + for fname in yml_files: + filepath = os.path.join(rules_dir, fname) + with open(filepath) as f: + data = yaml.safe_load(f) + for rule in data.get("rules", []): + total_rules += 1 + meta = rule.get("metadata", {}) + for field in field_counts: + if field in meta: + field_counts[field] += 1 + + print(f" Total rules: {total_rules}") + for field, count in field_counts.items(): + print(f" {field}: {count}/{total_rules} ({count*100//total_rules}%)") + + +if __name__ == "__main__": + main() diff --git a/scripts/rewrite_messages.py b/scripts/rewrite_messages.py new file mode 100644 index 0000000..6523947 --- /dev/null +++ b/scripts/rewrite_messages.py @@ -0,0 +1,774 @@ +#!/usr/bin/env python3 +""" +Phase 3: Rewrite sparse rule messages and add fix metadata. + +This script: + 1. Rewrites messages shorter than 60 characters to follow a "What/Why/How" pattern + 2. Adds `fix` metadata to all CRITICAL and HIGH severity rules that lack one + +It uses a string-based approach to modify lines in-place without reformatting the +entire YAML file, preserving comments and formatting. + +Usage: + python scripts/rewrite_messages.py +""" + +import os +import re +import sys +import yaml + +# --------------------------------------------------------------------------- +# CWE -> Expanded message (What/Why/How pattern, 120-250 chars) +# Only used when existing message < 60 characters. +# --------------------------------------------------------------------------- + +CWE_MESSAGES = { + "CWE-89": "SQL injection vulnerability detected. User-supplied input is included in a SQL query without sanitization, potentially allowing attackers to read, modify, or delete database contents. Use parameterized queries or an ORM instead of string concatenation.", + "CWE-78": "OS command injection vulnerability detected. User-controlled data flows into a system command without proper sanitization, allowing attackers to execute arbitrary commands. Use safe APIs with argument lists instead of shell execution.", + "CWE-79": "Cross-site scripting (XSS) vulnerability detected. User input is rendered in HTML output without proper escaping, allowing attackers to inject malicious scripts. Sanitize or escape all user input before rendering.", + "CWE-502": "Unsafe deserialization detected. Deserializing untrusted data can lead to remote code execution or denial of service. Use safe serialization formats like JSON or validate data before deserializing.", + "CWE-798": "Hard-coded credentials detected. Embedding secrets in source code makes them easily discoverable and impossible to rotate. Use environment variables or a secrets manager instead.", + "CWE-327": "Weak cryptographic algorithm detected. Using broken or outdated algorithms may allow attackers to decrypt data or forge signatures. Use modern algorithms like AES-256, SHA-256, or Ed25519.", + "CWE-338": "Insecure random number generator used. Non-cryptographic PRNGs produce predictable values that attackers can guess. Use a cryptographically secure random generator for security-sensitive operations.", + "CWE-295": "TLS/SSL certificate validation is disabled or bypassed. This allows man-in-the-middle attacks where attackers can intercept encrypted communications. Always validate certificates in production.", + "CWE-22": "Path traversal vulnerability detected. User input is used in file paths without validation, allowing attackers to access files outside the intended directory. Validate and canonicalize paths before use.", + "CWE-319": "Sensitive data transmitted over an unencrypted channel. Using HTTP instead of HTTPS allows network attackers to intercept data in transit. Use HTTPS for all communications containing sensitive data.", + "CWE-532": "Sensitive information written to log files. Passwords, tokens, or personal data in logs can be exposed to unauthorized parties. Redact sensitive values before logging.", + "CWE-601": "Open redirect vulnerability detected. The application redirects to a user-controlled URL without validation, enabling phishing attacks. Validate redirect targets against an allowlist.", + "CWE-434": "Unrestricted file upload detected. Accepting uploads without validating file type and content can allow execution of malicious code. Validate file type, size, and content before storing.", + "CWE-862": "Missing authorization check. The application does not verify user permissions before granting access to resources. Add authorization checks to all protected endpoints.", + "CWE-352": "Cross-site request forgery (CSRF) vulnerability. The application does not verify that requests originate from its own interface. Implement CSRF tokens on all state-changing operations.", + "CWE-611": "XML External Entity (XXE) vulnerability. The XML parser processes external entity references that can read local files or make network requests. Disable external entity processing in XML parsers.", + "CWE-918": "Server-side request forgery (SSRF) detected. The application fetches resources from user-controlled URLs, allowing attackers to access internal services. Validate and restrict URL targets.", + "CWE-94": "Code injection vulnerability detected. User-controlled input is passed to a code evaluation function, allowing arbitrary code execution. Avoid eval/exec with user input; use safe alternatives.", + "CWE-489": "Debug mode or debug code is enabled in a production context. Debug features expose sensitive information and increase attack surface. Disable debug mode before deploying to production.", + "CWE-400": "Uncontrolled resource consumption detected. The application does not limit resource usage, making it vulnerable to denial-of-service attacks. Implement rate limiting and resource caps.", + "CWE-200": "Information exposure detected. The application reveals sensitive internal details to unauthorized users. Remove or restrict access to sensitive information in responses.", + "CWE-209": "Error messages expose sensitive information. Stack traces or internal details in error responses help attackers plan attacks. Use generic error messages in production.", + "CWE-732": "Overly permissive file or resource permissions. Resources are accessible to unauthorized users due to incorrect permission settings. Apply the principle of least privilege.", + "CWE-614": "Sensitive cookie missing the Secure flag. The cookie may be transmitted over unencrypted HTTP, allowing interception. Set the Secure flag on all sensitive cookies.", + "CWE-347": "Cryptographic signature verification is missing or improper. Unverified signatures allow attackers to tamper with data. Always verify signatures before trusting signed data.", + "CWE-90": "LDAP injection vulnerability detected. User input in LDAP queries without sanitization allows attackers to modify query logic. Escape special characters in LDAP filters.", + "CWE-703": "Improper error handling detected. The application does not properly handle exceptions, which may cause crashes or information leaks. Catch specific exceptions and handle them gracefully.", + "CWE-943": "NoSQL injection vulnerability detected. User input in NoSQL queries without sanitization allows data theft or modification. Use parameterized queries and validate input types.", + "CWE-755": "Improper handling of exceptional conditions. The application may crash or behave unexpectedly on errors. Add proper exception handling for all expected failure modes.", + "CWE-377": "Insecure temporary file creation. Predictable temp file names or insecure permissions can be exploited via symlink attacks. Use secure temp file creation functions.", + "CWE-1333": "Regular expression denial of service (ReDoS) risk. A regex with catastrophic backtracking can freeze the application on crafted input. Simplify the regex or set a timeout.", + "CWE-409": "Decompression bomb risk detected. Processing compressed data without size limits can exhaust memory. Set maximum decompression size limits.", + "CWE-20": "Missing input validation. User input is not sufficiently validated, enabling injection or logic bypass attacks. Validate all inputs against expected format, type, and range.", + "CWE-307": "No limit on authentication attempts. The application allows unlimited login attempts, enabling brute-force password attacks. Implement account lockout or rate limiting.", + "CWE-384": "Session fixation vulnerability. Session tokens are not regenerated after authentication, allowing session hijacking. Regenerate session IDs on login.", + "CWE-521": "Weak password requirements. The application does not enforce strong password policies. Require minimum length, complexity, and check against breached password lists.", + "CWE-287": "Improper authentication detected. The application does not properly verify user identity before granting access. Implement proper authentication mechanisms.", + "CWE-916": "Weak password hashing algorithm. Fast hash algorithms make password cracking feasible. Use bcrypt, scrypt, or Argon2 for password hashing.", + "CWE-190": "Integer overflow risk detected. Arithmetic operations may exceed integer bounds, causing unexpected behavior. Validate arithmetic operands and use safe math functions.", + "CWE-120": "Buffer overflow vulnerability. Data is copied without checking buffer size, potentially allowing code execution. Always validate input length before buffer operations.", + "CWE-119": "Memory buffer boundary violation. Operations read or write beyond buffer bounds, causing crashes or code execution. Use bounds-checked functions and validate sizes.", + "CWE-416": "Use-after-free vulnerability. Accessing freed memory can cause crashes or arbitrary code execution. Nullify pointers after free and use smart pointers where possible.", + "CWE-415": "Double free vulnerability. Freeing memory twice corrupts memory management and may allow code execution. Track allocation state and nullify freed pointers.", + "CWE-476": "Null pointer dereference risk. Using a null pointer causes crashes. Check pointers for null before dereferencing.", + "CWE-134": "Format string vulnerability. User input in format strings allows memory read/write. Never pass user input as a format string argument.", + "CWE-242": "Use of inherently dangerous function. The called function cannot be used safely regardless of input handling. Replace with a safe alternative.", + "CWE-401": "Memory leak detected. Allocated memory is never freed, leading to resource exhaustion. Free all dynamically allocated memory when no longer needed.", + "CWE-362": "Race condition on shared resource. Concurrent access without synchronization can corrupt data. Use proper locking or atomic operations.", + "CWE-367": "Time-of-check time-of-use (TOCTOU) race condition. A resource can change between check and use, allowing bypass. Perform check and use atomically.", + "CWE-667": "Improper locking detected. Lock mismanagement may cause deadlocks or data corruption. Ensure locks are acquired and released in consistent order.", + "CWE-697": "Incorrect comparison detected. Flawed comparison logic can bypass security checks. Use strict equality and proper type checking.", + "CWE-704": "Incorrect type conversion. Unsafe type casting may cause data loss or memory corruption. Validate types before conversion and handle edge cases.", + "CWE-248": "Uncaught exception risk. Unhandled exceptions can crash the application or leak information. Catch and handle all expected exception types.", + "CWE-396": "Generic exception handler detected. Catching broad exceptions masks errors and may hide security issues. Catch specific exception types.", + "CWE-693": "Security protection mechanism bypassed or missing. A security feature is absent or improperly configured. Verify all security mechanisms are active.", + "CWE-494": "Code downloaded without integrity verification. Executing unverified code allows supply chain attacks. Verify checksums or signatures before execution.", + "CWE-345": "Insufficient data authenticity verification. Data is accepted without verifying its source. Validate data origin using signatures or MACs.", + "CWE-353": "Missing integrity check. Data modifications go undetected without integrity verification. Implement checksums or HMACs on sensitive data.", + "CWE-312": "Sensitive data stored in cleartext. Plaintext storage of secrets allows unauthorized access. Encrypt sensitive data at rest.", + "CWE-522": "Credentials transmitted or stored with insufficient protection. Weak protection makes credentials recoverable. Use strong encryption and secure channels.", + "CWE-276": "Incorrect default permissions. Resources are created with overly permissive access. Set restrictive permissions explicitly on creation.", + "CWE-250": "Code runs with unnecessary privileges. Excess permissions increase the impact of exploits. Apply the principle of least privilege.", + "CWE-639": "Insecure direct object reference (IDOR). User-supplied IDs access resources without authorization checks. Verify user permissions for every resource access.", + "CWE-477": "Use of obsolete or deprecated function. The function may have known security weaknesses. Replace with the recommended modern alternative.", + "CWE-1104": "Use of unmaintained third-party component. Unpatched dependencies contain known vulnerabilities. Update to a maintained version or find an alternative.", + "CWE-1059": "Insufficient code documentation. Missing documentation makes security issues harder to identify. Document security-sensitive code sections.", + "CWE-778": "Insufficient security logging. Critical events are not logged, hindering incident detection. Log authentication, authorization, and data access events.", + "CWE-926": "Improperly exported Android component. The component is accessible to other apps without restrictions. Set android:exported=false or add permission checks.", + "CWE-942": "Overly permissive CORS policy. Any origin can access the API, enabling cross-origin data theft. Restrict CORS to trusted origins only.", + "CWE-915": "Mass assignment vulnerability. Users can set object attributes they should not control. Use an allowlist of permitted attributes.", + "CWE-74": "Injection vulnerability. User input reaches a downstream interpreter without sanitization. Sanitize or parameterize all user-controlled data.", + "CWE-95": "Eval injection risk. User input is passed to an eval-like function, enabling code execution. Avoid eval with user input; use safe parsing.", + "CWE-98": "Remote file inclusion vulnerability. User input controls file include paths, enabling remote code execution. Validate file paths against an allowlist.", + "CWE-88": "Argument injection detected. User input is passed as command arguments without sanitization. Validate and escape all command arguments.", + "CWE-91": "XML/XPath injection detected. User input in XML queries can alter query logic. Use parameterized XPath queries.", + "CWE-863": "Incorrect authorization implementation. Authorization checks exist but are flawed. Review and test authorization logic for bypass conditions.", + "CWE-73": "External file path control. User input determines which file is accessed. Validate paths against an allowlist and canonicalize before use.", + "CWE-259": "Hard-coded password detected. Passwords embedded in code are easily discovered. Use environment variables or a secrets manager.", + "CWE-208": "Timing side-channel detected. Response time differences can leak sensitive information. Use constant-time comparison for secrets.", + "CWE-1321": "Prototype pollution vulnerability. User input modifies object prototypes, affecting application behavior. Freeze prototypes or validate input keys.", + "CWE-1336": "Server-side template injection detected. User input in template expressions enables code execution. Sandbox templates and validate user input.", + "CWE-470": "Unsafe reflection detected. User input selects classes or methods dynamically, enabling code execution. Use an allowlist of permitted class names.", + "CWE-131": "Incorrect buffer size calculation. Buffer size miscalculation can cause overflow. Double-check size arithmetic and use safe allocation wrappers.", + "CWE-117": "Log injection risk. Unsanitized input in logs allows forged entries. Sanitize newlines and special characters before logging.", + "CWE-479": "Signal handler uses non-reentrant function. This causes undefined behavior on signal delivery. Only call async-signal-safe functions in signal handlers.", + "CWE-310": "Cryptographic weakness detected. Misuse of cryptographic primitives undermines data protection. Use well-tested crypto libraries with recommended configurations.", + "CWE-322": "Key exchange without authentication. Unauthenticated key exchange enables man-in-the-middle attacks. Use authenticated key exchange protocols.", + "CWE-326": "Inadequate encryption strength. Short key lengths make brute-force attacks feasible. Use at least 128-bit symmetric keys or 2048-bit RSA keys.", + "CWE-330": "Insufficiently random values used. Predictable random values can be guessed by attackers. Use a cryptographically secure random generator.", + "CWE-16": "Insecure configuration detected. A misconfigured setting weakens the application's security posture. Review and harden security-relevant configuration.", +} + +# --------------------------------------------------------------------------- +# (CWE, language) -> language-specific fix text +# --------------------------------------------------------------------------- + +FIX_BY_CWE_LANG = { + # SQL Injection (CWE-89) + ("CWE-89", "python"): "Use parameterized queries: cursor.execute('SELECT * FROM t WHERE id = %s', (user_id,)). For Django, use the ORM or QuerySet API. For SQLAlchemy, use bound parameters.", + ("CWE-89", "java"): "Use PreparedStatement with parameter binding: ps = conn.prepareStatement('SELECT * FROM t WHERE id = ?'); ps.setString(1, userId);", + ("CWE-89", "javascript_typescript"): "Use parameterized queries with your database driver, e.g., db.query('SELECT * FROM t WHERE id = $1', [userId]) for pg, or use an ORM like Prisma or Sequelize.", + ("CWE-89", "go"): "Use parameterized queries: db.Query('SELECT * FROM t WHERE id = ?', userId). Never concatenate user input into SQL strings.", + ("CWE-89", "php"): "Use PDO prepared statements: $stmt = $pdo->prepare('SELECT * FROM t WHERE id = ?'); $stmt->execute([$userId]);", + ("CWE-89", "ruby"): "Use parameterized queries with ActiveRecord: User.where('id = ?', user_id) or use the ORM query interface.", + ("CWE-89", "dotnet"): "Use parameterized queries with SqlCommand: cmd.Parameters.AddWithValue(\"@id\", userId); or use Entity Framework LINQ queries.", + ("CWE-89", "kotlin"): "Use PreparedStatement with parameter binding or use an ORM like Exposed or Room with parameterized queries.", + ("CWE-89", "scala"): "Use Slick's type-safe query DSL or JDBC PreparedStatement with parameter binding.", + ("CWE-89", "elixir"): "Use Ecto parameterized queries: Repo.all(from u in User, where: u.id == ^user_id). Never interpolate user input into raw SQL.", + ("CWE-89", "erlang"): "Use parameterized queries with your database driver, e.g., epgsql:equery(C, \"SELECT * FROM t WHERE id = $1\", [UserId]).", + ("CWE-89", "rust"): "Use parameterized queries with sqlx: sqlx::query('SELECT * FROM t WHERE id = $1').bind(user_id). Never format user input into SQL strings.", + ("CWE-89", "c_cpp"): "Use parameterized queries with your database API (e.g., sqlite3_bind_text for SQLite, PQexecParams for PostgreSQL). Never use sprintf to build SQL.", + ("CWE-89", "objective-c"): "Use parameterized queries with sqlite3_bind_text() or NSPredicate with substitution variables. Never concatenate user input into SQL strings.", + ("CWE-89", "swift"): "Use parameterized queries with sqlite3_bind_text() or a Swift ORM like GRDB with parameterized statements.", + + # Command Injection (CWE-78) + ("CWE-78", "python"): "Use subprocess.run() with a list of arguments instead of shell=True: subprocess.run(['cmd', arg1, arg2]). Never pass user input to os.system() or shell commands.", + ("CWE-78", "java"): "Use ProcessBuilder with separate arguments: new ProcessBuilder('cmd', arg1, arg2). Never concatenate user input into Runtime.exec() strings.", + ("CWE-78", "javascript_typescript"): "Use child_process.execFile() or spawn() with argument arrays instead of exec(). Never interpolate user input into shell commands.", + ("CWE-78", "go"): "Use exec.Command() with separate arguments: exec.Command('cmd', arg1, arg2). Never pass user input to exec.Command('sh', '-c', userInput).", + ("CWE-78", "ruby"): "Use system() with separate arguments: system('cmd', arg1, arg2). Avoid backticks or %x{} with user input.", + ("CWE-78", "php"): "Use escapeshellarg() for arguments and escapeshellcmd() for commands. Prefer language-level APIs over shell execution.", + ("CWE-78", "dotnet"): "Use Process.Start() with Arguments set separately. Never concatenate user input into process arguments.", + ("CWE-78", "kotlin"): "Use ProcessBuilder with separate arguments: ProcessBuilder('cmd', arg1, arg2). Never concatenate user input into command strings.", + ("CWE-78", "scala"): "Use ProcessBuilder or scala.sys.process with separate arguments. Never interpolate user input into shell command strings.", + ("CWE-78", "elixir"): "Use System.cmd/2 with separate arguments: System.cmd(\"cmd\", [arg1, arg2]). Never pass user input to :os.cmd/1.", + ("CWE-78", "erlang"): "Use erlang:open_port with {spawn_executable, Cmd} and {args, Args} instead of os:cmd/1. Never interpolate user input into commands.", + ("CWE-78", "rust"): "Use std::process::Command with separate arguments: Command::new('cmd').arg(arg1).arg(arg2). Never pass user input to shell commands.", + ("CWE-78", "c_cpp"): "Use execve() or posix_spawn() with separate argument arrays. Never pass user input to system() or popen().", + ("CWE-78", "objective-c"): "Use NSTask with launchPath and arguments array. Never pass user input to system() or popen().", + ("CWE-78", "swift"): "Use Process (NSTask) with separate arguments array. Never pass user input to system() or shell commands.", + + # XSS (CWE-79) + ("CWE-79", "python"): "Use template engine auto-escaping (Jinja2 autoescape=True, Django default). Use markupsafe.escape() for manual escaping. Never render raw user input with |safe or Markup().", + ("CWE-79", "java"): "Use context-aware output encoding (OWASP Java Encoder: Encode.forHtml()). Enable auto-escaping in templates (Thymeleaf, JSP with JSTL).", + ("CWE-79", "javascript_typescript"): "Use textContent instead of innerHTML. Use DOMPurify.sanitize() if HTML must be rendered. Enable auto-escaping in template engines (React JSX, Handlebars).", + ("CWE-79", "php"): "Use htmlspecialchars($input, ENT_QUOTES, 'UTF-8') for output. Enable auto-escaping in Twig/Blade templates. Never echo raw user input.", + ("CWE-79", "ruby"): "Use ERB auto-escaping (<%= %>) or sanitize helper. Never use raw() or html_safe on user input without sanitization.", + ("CWE-79", "dotnet"): "Use Razor auto-encoding or HtmlEncoder.Default.Encode(). Never use Html.Raw() with user input. Validate input on both client and server.", + ("CWE-79", "kotlin"): "Use context-aware output encoding. Enable auto-escaping in template engines. Use OWASP Java Encoder for manual escaping.", + ("CWE-79", "scala"): "Use Play framework's Twirl templates (auto-escaped by default). Use Html() only for trusted content. Sanitize user input before rendering.", + ("CWE-79", "elixir"): "Phoenix templates auto-escape by default. Never use raw/1 or {:safe, ...} with user input. Use Phoenix.HTML.html_escape/1 for manual escaping.", + ("CWE-79", "erlang"): "Escape all user input before inserting into HTML output. Use a templating library with auto-escaping enabled.", + ("CWE-79", "objective-c"): "Escape user input before rendering in web views. Use NSString methods to encode HTML entities. Avoid loading untrusted HTML in WKWebView.", + ("CWE-79", "swift"): "Escape user input before rendering in web views. Use String extension to encode HTML entities. Set WKWebView configuration to restrict JavaScript.", + + # Path Traversal (CWE-22) + ("CWE-22", "python"): "Use os.path.abspath() and verify the result starts with your allowed base directory. Use pathlib.Path.resolve() for canonicalization.", + ("CWE-22", "java"): "Use File.getCanonicalPath() and verify the result starts with the allowed base directory. Use java.nio.file.Path.normalize() and resolve().", + ("CWE-22", "javascript_typescript"): "Use path.resolve() and verify the result starts with the allowed base directory using path.relative() to check for '..' traversal.", + ("CWE-22", "go"): "Use filepath.Clean() and filepath.Abs(), then verify the result is within the allowed base directory with strings.HasPrefix().", + ("CWE-22", "php"): "Use realpath() and verify the result starts with the allowed base directory. Reject paths containing '..' sequences.", + ("CWE-22", "ruby"): "Use File.expand_path() and verify the result starts with the allowed base directory. Use Pathname#cleanpath for normalization.", + ("CWE-22", "dotnet"): "Use Path.GetFullPath() and verify the result starts with the allowed base directory. Use Path.Combine() instead of string concatenation.", + ("CWE-22", "kotlin"): "Use File.canonicalPath and verify it starts with the allowed base directory. Use java.nio.file.Path.normalize().", + ("CWE-22", "scala"): "Use java.io.File.getCanonicalPath() and verify it starts with the allowed base directory. Use java.nio.file.Path.normalize().", + ("CWE-22", "elixir"): "Use Path.expand/1 and verify the result starts with the allowed base directory. Reject paths containing '..' components.", + ("CWE-22", "erlang"): "Use filename:absname/1 and verify the result starts with the allowed base directory. Reject paths containing '..' components.", + ("CWE-22", "rust"): "Use std::fs::canonicalize() and verify the result starts with the allowed base directory. Use Path::starts_with() for validation.", + ("CWE-22", "objective-c"): "Use -[NSString stringByStandardizingPath] and verify the result starts with the allowed base directory. Reject '..' path components.", + ("CWE-22", "swift"): "Use URL.standardizedFileURL or (path as NSString).standardizingPath and verify the result is within the allowed base directory.", + + # Unsafe Deserialization (CWE-502) + ("CWE-502", "python"): "Replace pickle/shelve with json.loads() for data interchange. If pickle is required, use hmac to verify data integrity before deserializing.", + ("CWE-502", "java"): "Use ObjectInputFilter (JEP 290) to restrict deserializable classes. Prefer JSON (Jackson/Gson) or Protocol Buffers for data interchange.", + ("CWE-502", "javascript_typescript"): "Avoid eval() or Function() for deserialization. Use JSON.parse() with a reviver function to validate types.", + ("CWE-502", "php"): "Replace unserialize() with json_decode(). If unserialize() is required, use the allowed_classes option to restrict types.", + ("CWE-502", "ruby"): "Replace Marshal.load/YAML.load with JSON.parse or YAML.safe_load. Never deserialize untrusted data with Marshal.", + ("CWE-502", "dotnet"): "Use System.Text.Json or Newtonsoft.Json instead of BinaryFormatter/SoapFormatter. Set TypeNameHandling.None in Newtonsoft.Json.", + ("CWE-502", "kotlin"): "Use kotlinx.serialization with JSON format. Avoid Java ObjectInputStream for untrusted data. Use ObjectInputFilter if needed.", + ("CWE-502", "scala"): "Use circe, play-json, or upickle for JSON deserialization. Avoid Java ObjectInputStream for untrusted data.", + ("CWE-502", "elixir"): "Use Jason.decode/1 or :erlang.binary_to_term/2 with [:safe] option. Never use :erlang.binary_to_term/1 on untrusted data.", + ("CWE-502", "erlang"): "Use binary_to_term/2 with [safe] option. Use jsx or jiffy for JSON deserialization. Never use binary_to_term/1 on untrusted data.", + ("CWE-502", "rust"): "Use serde with JSON/MessagePack instead of bincode for untrusted data. Validate deserialized data before use.", + ("CWE-502", "objective-c"): "Use NSJSONSerialization instead of NSKeyedUnarchiver for untrusted data. Use NSSecureCoding with allowedClasses for type validation.", + ("CWE-502", "swift"): "Use JSONDecoder with Codable instead of NSKeyedUnarchiver. Use NSSecureCoding with unarchivedObject(ofClass:from:) for type-safe unarchiving.", + + # Hard-coded credentials (CWE-798) + ("CWE-798", "python"): "Store secrets in environment variables (os.environ['KEY']) or use a secrets manager (AWS Secrets Manager, HashiCorp Vault, python-dotenv).", + ("CWE-798", "java"): "Store secrets in environment variables (System.getenv('KEY')), a secrets manager, or externalized configuration (Spring Vault, AWS Secrets Manager).", + ("CWE-798", "javascript_typescript"): "Store secrets in environment variables (process.env.KEY) or use a secrets manager. Use dotenv for local development.", + ("CWE-798", "go"): "Store secrets in environment variables (os.Getenv('KEY')) or use a secrets manager (HashiCorp Vault, AWS Secrets Manager).", + ("CWE-798", "php"): "Store secrets in environment variables (getenv('KEY')) or use a secrets manager. Use vlucas/phpdotenv for local development.", + ("CWE-798", "ruby"): "Store secrets in environment variables (ENV['KEY']) or use Rails credentials (config/credentials.yml.enc) or a secrets manager.", + ("CWE-798", "dotnet"): "Store secrets in environment variables, Azure Key Vault, or user-secrets for development. Use IConfiguration to access secrets.", + ("CWE-798", "kotlin"): "Store secrets in environment variables (System.getenv('KEY')) or use Android Keystore / a secrets manager. Never commit secrets to source control.", + ("CWE-798", "scala"): "Store secrets in environment variables or use a secrets manager. Use Typesafe Config with environment variable substitution.", + ("CWE-798", "elixir"): "Store secrets in environment variables (System.get_env/1) or use runtime configuration. Use config/runtime.exs for production secrets.", + ("CWE-798", "erlang"): "Store secrets in environment variables (os:getenv/1) or use a secrets manager. Load secrets from configuration files excluded from version control.", + ("CWE-798", "rust"): "Store secrets in environment variables (std::env::var('KEY')) or use a secrets manager. Use dotenvy for local development.", + ("CWE-798", "c_cpp"): "Store secrets in environment variables (getenv('KEY')) or read from a protected configuration file. Never embed secrets in source code.", + ("CWE-798", "objective-c"): "Store secrets in the iOS Keychain or environment variables. Never embed secrets in source code or property lists.", + ("CWE-798", "swift"): "Store secrets in the iOS Keychain or environment variables. Use a secrets manager for server-side Swift. Never embed secrets in source code.", + + # Weak crypto (CWE-327) + ("CWE-327", "python"): "Use hashlib.sha256() or hashlib.sha3_256() instead of md5/sha1. Use cryptography library with AES-GCM or ChaCha20-Poly1305 for encryption.", + ("CWE-327", "java"): "Use MessageDigest.getInstance('SHA-256') instead of MD5/SHA1. Use AES/GCM/NoPadding for encryption. Use Cipher from javax.crypto with strong algorithms.", + ("CWE-327", "javascript_typescript"): "Use crypto.createHash('sha256') instead of md5/sha1. Use crypto.createCipheriv('aes-256-gcm', ...) for encryption.", + ("CWE-327", "go"): "Use crypto/sha256 instead of crypto/md5 or crypto/sha1. Use crypto/aes with GCM mode for encryption.", + ("CWE-327", "ruby"): "Use OpenSSL::Digest::SHA256 instead of MD5/SHA1. Use OpenSSL::Cipher.new('aes-256-gcm') for encryption.", + ("CWE-327", "dotnet"): "Use SHA256.Create() instead of MD5/SHA1. Use Aes.Create() with CipherMode.CBC or AesGcm for encryption.", + ("CWE-327", "kotlin"): "Use MessageDigest.getInstance('SHA-256') instead of MD5/SHA1. Use Cipher with 'AES/GCM/NoPadding' for encryption.", + ("CWE-327", "scala"): "Use MessageDigest.getInstance('SHA-256') instead of MD5/SHA1. Use javax.crypto.Cipher with AES-GCM for encryption.", + ("CWE-327", "elixir"): "Use :crypto.hash(:sha256, data) instead of :md5/:sha. Use :crypto.crypto_one_time_aead for AES-GCM encryption.", + ("CWE-327", "erlang"): "Use crypto:hash(sha256, Data) instead of md5/sha. Use crypto:crypto_one_time_aead for AES-GCM encryption.", + ("CWE-327", "rust"): "Use sha2 crate (Sha256::digest) instead of md5. Use aes-gcm crate for authenticated encryption.", + ("CWE-327", "c_cpp"): "Use SHA-256 (e.g., EVP_sha256() from OpenSSL) instead of MD5/SHA-1. Use AES-GCM for authenticated encryption.", + ("CWE-327", "objective-c"): "Use CC_SHA256 from CommonCrypto instead of CC_MD5/CC_SHA1. Use CCCrypt with kCCAlgorithmAES for encryption.", + ("CWE-327", "swift"): "Use SHA256 from CryptoKit instead of Insecure.MD5/SHA1. Use AES.GCM.seal() for authenticated encryption.", + + # TLS/SSL bypass (CWE-295) + ("CWE-295", "python"): "Always use verify=True (the default) in requests. Set ssl_context properly. Never set CERT_NONE or disable hostname checking.", + ("CWE-295", "java"): "Never override TrustManager to accept all certificates. Use the default SSLContext or configure with trusted CA certificates only.", + ("CWE-295", "javascript_typescript"): "Never set rejectUnauthorized: false or NODE_TLS_REJECT_UNAUTHORIZED=0. Configure proper CA certificates for custom TLS needs.", + ("CWE-295", "go"): "Never set InsecureSkipVerify: true in tls.Config. Use the default TLS configuration which validates certificates properly.", + ("CWE-295", "ruby"): "Never set verify_mode = OpenSSL::SSL::VERIFY_NONE. Use the default certificate validation in Net::HTTP and OpenSSL.", + ("CWE-295", "dotnet"): "Never return true from ServerCertificateCustomValidationCallback. Use the default certificate validation from ServicePointManager.", + ("CWE-295", "kotlin"): "Never override TrustManager to accept all certificates. Use the default SSLContext or OkHttp's CertificatePinner for pinning.", + ("CWE-295", "scala"): "Never override TrustManager to accept all certificates. Use the default SSLContext or configure trusted CA certificates.", + ("CWE-295", "rust"): "Never use danger_accept_invalid_certs(true) in reqwest or rustls. Use the default TLS verification with proper CA certificates.", + ("CWE-295", "c_cpp"): "Always call SSL_CTX_set_verify with SSL_VERIFY_PEER. Never skip certificate or hostname verification in OpenSSL/BoringSSL.", + ("CWE-295", "objective-c"): "Never override NSURLSession delegate to accept invalid certificates. Use App Transport Security (ATS) defaults.", + ("CWE-295", "swift"): "Never override URLSession delegate to accept invalid certificates. Use App Transport Security (ATS) defaults.", + + # Code injection (CWE-94) + ("CWE-94", "python"): "Remove eval()/exec() calls with user input. Use ast.literal_eval() for safe parsing of Python literals. Use a sandboxed environment if dynamic execution is required.", + ("CWE-94", "java"): "Avoid ScriptEngine.eval() with user input. Use a sandboxed interpreter or template engine. Restrict class loading with a SecurityManager.", + ("CWE-94", "javascript_typescript"): "Remove eval(), Function(), and setTimeout/setInterval with string arguments. Use JSON.parse() for data parsing. Use a sandboxed environment (vm2) if dynamic execution is needed.", + ("CWE-94", "php"): "Remove eval(), assert(), and preg_replace with /e flag. Use proper parsing functions for data. Never include user-controlled file paths.", + ("CWE-94", "ruby"): "Remove eval(), instance_eval(), and send() with user input. Use safe parsing methods. Use a sandboxed environment if dynamic execution is required.", + ("CWE-94", "dotnet"): "Avoid CSharpScript.EvaluateAsync() or Roslyn compilation with user input. Use expression parsers or sandboxed environments.", + ("CWE-94", "kotlin"): "Avoid ScriptEngine.eval() or javax.tools.JavaCompiler with user input. Use a sandboxed interpreter or expression parser.", + ("CWE-94", "scala"): "Avoid scala.tools.reflect.ToolBox.eval() with user input. Use a sandboxed interpreter or safe expression parser.", + ("CWE-94", "elixir"): "Remove Code.eval_string/1 calls with user input. Use pattern matching and safe parsing instead. Never evaluate untrusted Elixir code.", + ("CWE-94", "erlang"): "Avoid erl_eval:expr/2 and erl_scan with user input. Use pattern matching and safe parsing instead. Never evaluate untrusted Erlang terms.", + + # XXE (CWE-611) + ("CWE-611", "python"): "Use defusedxml instead of xml.etree or lxml. Set resolve_entities=False and no_network=True in lxml parsers.", + ("CWE-611", "java"): "Set XMLConstants.FEATURE_SECURE_PROCESSING and disable DOCTYPE declarations: factory.setFeature('http://apache.org/xml/features/disallow-doctype-decl', true).", + ("CWE-611", "javascript_typescript"): "Use a safe XML parser. In libxmljs, set noent: false and nonet: true. In xml2js, external entities are disabled by default.", + ("CWE-611", "php"): "Call libxml_disable_entity_loader(true) before parsing XML. Use LIBXML_NOENT flag carefully. Use json_decode() if possible.", + ("CWE-611", "dotnet"): "Set XmlReaderSettings.DtdProcessing = DtdProcessing.Prohibit and XmlReaderSettings.XmlResolver = null.", + ("CWE-611", "kotlin"): "Set XMLConstants.FEATURE_SECURE_PROCESSING and disable DOCTYPE: factory.setFeature('http://apache.org/xml/features/disallow-doctype-decl', true).", + + # SSRF (CWE-918) + ("CWE-918", "python"): "Validate URLs against an allowlist of permitted hosts and schemes. Block private IP ranges (10.x, 172.16-31.x, 192.168.x, 127.x). Use urllib.parse to validate before fetching.", + ("CWE-918", "javascript_typescript"): "Validate URLs against an allowlist of permitted hosts. Block private IP ranges and localhost. Use the URL constructor to parse and validate before fetching.", + ("CWE-918", "dotnet"): "Validate URLs against an allowlist of permitted hosts. Block private IP ranges and localhost. Resolve DNS and check the IP before making requests.", + + # File upload (CWE-434) + ("CWE-434", "python"): "Validate file extension, MIME type, and content. Use werkzeug.utils.secure_filename(). Store uploads outside the web root with randomized names.", + ("CWE-434", "java"): "Validate file extension, MIME type, and content. Store uploads outside the web root. Use Apache Tika for content-type detection.", + ("CWE-434", "javascript_typescript"): "Validate file extension, MIME type, and content. Use multer with file filter. Store uploads outside the web root with randomized names.", + ("CWE-434", "php"): "Validate file extension, MIME type with finfo_file(), and content. Store uploads outside the web root. Never trust $_FILES['type'].", + ("CWE-434", "kotlin"): "Validate file extension, MIME type, and content. Store uploads outside the web root. Use Apache Tika for content-type detection.", + ("CWE-434", "scala"): "Validate file extension, MIME type, and content. Store uploads outside the web root with randomized names.", + ("CWE-434", "elixir"): "Validate file extension, MIME type, and content. Store uploads outside the web root. Use Plug.Upload metadata for validation.", + ("CWE-434", "erlang"): "Validate file extension, MIME type, and content. Store uploads outside the web root with randomized filenames.", + + # LDAP injection (CWE-90) + ("CWE-90", "python"): "Use ldap3 library with safe filter escaping: ldap3.utils.conv.escape_filter_chars(user_input). Never concatenate user input into LDAP filters.", + ("CWE-90", "java"): "Use javax.naming.ldap with properly escaped filter values. Use LdapEncoder.filterEncode() from Spring LDAP for escaping.", + ("CWE-90", "javascript_typescript"): "Use ldapjs with properly escaped filter values. Use ldapEscape.filter() to escape special characters in LDAP filters.", + ("CWE-90", "php"): "Use ldap_escape() to sanitize filter values: ldap_escape($input, '', LDAP_ESCAPE_FILTER). Never concatenate user input into LDAP filters.", + ("CWE-90", "dotnet"): "Use System.DirectoryServices with parameterized searches. Escape special LDAP characters before inserting into filters.", + ("CWE-90", "kotlin"): "Use javax.naming.ldap with properly escaped filter values. Use Spring LDAP's LdapEncoder.filterEncode() for escaping.", + + # Insecure random (CWE-338) + ("CWE-338", "python"): "Use secrets.token_bytes(), secrets.token_hex(), or secrets.choice() for security-sensitive operations.", + ("CWE-338", "java"): "Use java.security.SecureRandom instead of java.util.Random for security-sensitive operations.", + ("CWE-338", "javascript_typescript"): "Use crypto.randomBytes() or crypto.getRandomValues() for security-sensitive operations.", + ("CWE-338", "dotnet"): "Use System.Security.Cryptography.RandomNumberGenerator instead of System.Random for security-sensitive operations.", + ("CWE-338", "kotlin"): "Use java.security.SecureRandom instead of kotlin.random.Random or java.util.Random for security-sensitive operations.", + + # CSRF (CWE-352) + ("CWE-352", "java"): "Enable Spring Security CSRF protection or use a custom CSRF token filter. Include CSRF tokens in all HTML forms and AJAX requests.", + ("CWE-352", "kotlin"): "Enable Spring Security CSRF protection or use a framework-provided CSRF middleware. Include CSRF tokens in all state-changing requests.", + + # JWT/signature verification (CWE-347) + ("CWE-347", "python"): "Always verify JWT signatures: jwt.decode(token, key, algorithms=['HS256']). Never use options={'verify_signature': False}.", + ("CWE-347", "javascript_typescript"): "Always verify JWT signatures with jsonwebtoken: jwt.verify(token, secret). Never use jwt.decode() without verification for authorization.", + ("CWE-347", "objective-c"): "Always verify cryptographic signatures before trusting signed data. Use Security.framework's SecKeyVerifySignature for RSA/EC verification.", + ("CWE-347", "swift"): "Always verify cryptographic signatures before trusting signed data. Use CryptoKit's isValidSignature or Security.framework's SecKeyVerifySignature.", + + # NoSQL injection (CWE-943) + ("CWE-943", "python"): "Use parameterized queries with PyMongo. Validate input types (reject dicts/lists where strings expected). Never pass raw request data to MongoDB queries.", + ("CWE-943", "javascript_typescript"): "Validate input types before MongoDB queries. Use mongoose schema validation. Replace $where with aggregation pipeline. Sanitize with mongo-sanitize.", + ("CWE-943", "objective-c"): "Validate all user input before including in NoSQL queries. Use parameterized queries and type-check inputs to reject injection payloads.", + ("CWE-943", "swift"): "Validate all user input before including in NoSQL queries. Use parameterized queries and type-check inputs to reject injection payloads.", + + # Prototype pollution (CWE-1321) + ("CWE-1321", "javascript_typescript"): "Use Object.create(null) for lookup objects. Validate keys against a denylist (__proto__, constructor, prototype). Use Map instead of plain objects.", + + # Template injection (CWE-1336) + ("CWE-1336", "python"): "Never pass user input to Jinja2 Template() or Mako Template(). Use render_template() with variables. Enable sandboxed environment for dynamic templates.", + ("CWE-1336", "javascript_typescript"): "Never pass user input to template compilation functions. Use pre-compiled templates with data binding. Enable strict mode in template engines.", + + # Debug mode (CWE-489) + ("CWE-489", "python"): "Set DEBUG=False in production. Use environment variables to control debug mode: DEBUG = os.environ.get('DEBUG', 'False').lower() == 'true'.", + ("CWE-489", "javascript_typescript"): "Set NODE_ENV=production in deployment. Remove console.log/debug statements. Use a logging library with configurable log levels.", + + # Code integrity (CWE-494) + ("CWE-494", "javascript_typescript"): "Use Subresource Integrity (SRI) for CDN scripts: