fix: sanitise Marked.js output with DOMPurify (closes #11) by timon0305 · Pull Request #12 · cppalliance/cppa-cursor-browser

timon0305 · 2026-05-04T22:09:29Z

Problem

marked.parse(...) doesn't sanitise. Two call sites — templates/workspace.html (→ innerHTML) and static/js/download.js (→ download blob) — were piping its output straight into the DOM, exposing XSS via javascript: links, data: URIs, raw <script> tags, and event-handler attributes. A hit reads the entire local chat store.

Change

templates/base.html — load DOMPurify 3.2.4 (after Marked.js, before app.js).
static/js/app.js — new renderMarkdownSafe(text) helper: marked.parse → DOMPurify.sanitize. Falls back to escapeHtml if either lib is missing — never a bare Marked call.
templates/workspace.html and static/js/download.js — both call sites now use the helper.

Test plan

145/145 unit tests pass (8 new in tests/test_xss_sanitization.py, including a source-level guard that fails if any future frontend file outside the helper contains a bare marked.parse(...)).
Live smoke: home returns 200, served HTML loads both Marked + DOMPurify, app.js exposes renderMarkdownSafe.
Browser DevTools verification (run on a workspace page):
- renderMarkdownSafe('[click](javascript:alert(1))') → href stripped
- renderMarkdownSafe('<script>alert(1)</script>') → <script> stripped
- renderMarkdownSafe('# H\n\n**bold**') → full HTML preserved (sanity)

Closes #11.

`marked.parse(...)` does not sanitise. Two call sites were piping its return value straight into the DOM (workspace.html → main.innerHTML) or into a downloadable HTML blob (download.js), which exposed XSS via javascript: links, dangerous data: URIs, raw <script> blocks, and event-handler attributes — all of which Marked passes through. A successful XSS in the dashboard origin reads every chat in the local store and exfiltrates it. Changes: - templates/base.html: add DOMPurify 3.2.4 CDN script after Marked.js and before app.js (load order matters — app.js references DOMPurify). - static/js/app.js: add `renderMarkdownSafe(text)` helper. Wraps `marked.parse(...)` with `DOMPurify.sanitize(...)`. If either lib failed to load (CDN failure, ad blocker, tests), falls back to `escapeHtml(text)` rather than emitting raw markdown HTML — never fall through to a bare Marked call. - templates/workspace.html: `renderMarkdown(text)` now delegates to the shared helper. - static/js/download.js: HTML download blob uses the helper too. The downloaded file is opened in a browser and any payload would execute in the file:// origin, so the same threat model applies. Regression coverage in tests/test_xss_sanitization.py adds 8 cases: - DOMPurify CDN present in base.html, loaded before app.js - renderMarkdownSafe defined, invokes DOMPurify.sanitize, falls back to escapeHtml - both call sites use the helper - a source-level guard that fails if any frontend file outside the helper itself contains a bare `marked.parse(...)` call Test plan: 145/145 unit tests pass (8 new + 137 existing). Live smoke on `python3 app.py --port 3001` confirmed both Marked.js and DOMPurify reach the page and `renderMarkdownSafe` is exposed in the served app.js bundle.

coderabbitai · 2026-05-04T22:09:40Z

📝 Walkthrough

Walkthrough

This PR adds XSS sanitization to Marked.js markdown rendering across the frontend. DOMPurify is loaded via CDN, a new renderMarkdownSafe() helper wraps marked.parse() with DOMPurify.sanitize(), both unsanitized call sites (workspace HTML and download export) are refactored to use the helper, and comprehensive regression tests prevent reintroduction of bare markdown parsing patterns.

Changes

XSS Sanitization via DOMPurify

Layer / File(s)	Summary
Library Dependency `templates/base.html`	DOMPurify v3.2.4 is loaded from CDN in `<head>`, positioned to initialize before `static/js/app.js`.
Safe Rendering Helper `static/js/app.js`	New `renderMarkdownSafe(text)` function wraps `marked.parse(text, { breaks: true, gfm: true })` with `DOMPurify.sanitize(...)`. Falls back to `escapeHtml(text)` if either library is missing or if parsing/sanitization throws; returns empty string for falsy input.
Call Site Updates `templates/workspace.html`, `static/js/download.js`	`renderMarkdown(text)` in workspace.html now delegates to `renderMarkdownSafe(text)` instead of inline `marked.parse()`. HTML export in download.js calls `renderMarkdownSafe(md)` instead of bare `marked.parse(md)` to produce sanitized content.
Regression Tests `tests/test_xss_sanitization.py`	Four test classes verify DOMPurify is loaded and ordered correctly, `renderMarkdownSafe` exists and sanitizes, both call sites use the safe helper, and `marked.parse(` appears exactly once across all listed frontend files (only within the helper).

Sequence Diagram

sequenceDiagram
    participant Browser
    participant MarkdownContent as Markdown<br/>Content
    participant RenderSafe as renderMarkdownSafe()
    participant MarkedLib as marked.parse()
    participant DOMPurify as DOMPurify<br/>.sanitize()
    participant DOM as DOM /<br/>Download

    Browser->>RenderSafe: text
    RenderSafe->>MarkedLib: raw markdown
    MarkedLib-->>RenderSafe: HTML (unsanitized)
    RenderSafe->>DOMPurify: unsanitized HTML
    DOMPurify-->>RenderSafe: sanitized HTML
    RenderSafe-->>DOM: safe to insert /<br/>download
    
    Note over RenderSafe: Fallback to escapeHtml()<br/>if marked or DOMPurify missing

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

The core refactoring is straightforward (two call sites → new helper), but the security implications require careful verification of the sanitization logic, DOMPurify usage, and fallback paths. The comprehensive test suite covers multiple attack vectors and static invariants, which is thorough but adds review scope. Changes span five files with mixed purposes (library, logic, tests), and the heterogeneity of file types demands separate reasoning for each layer.

Poem

🐰 A sanitizer's tale, oh what a sight!
DOMPurify shields us from XSS blight,
Marked's output tamed with a helper so neat,
No bare parse today—our chat's secure and sweet!
Tests keep the dragons from creeping back in, 🛡️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly describes the main change: sanitizing Marked.js output with DOMPurify, and indicates it closes issue `#11`.
Linked Issues check	✅ Passed	All objectives from issue `#11` are implemented: DOMPurify loaded in base.html with correct load order, renderMarkdownSafe helper created in app.js with sanitization and fallback, call sites in workspace.html and download.js updated, and regression tests added.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the security objective to prevent XSS via unsanitized Marked.js output; no out-of-scope modifications detected.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

tests/test_xss_sanitization.py (2)

103-137: ⚡ Quick win

Scan the frontend directories instead of a fixed allowlist.

FRONTEND_FILES only covers today's files, so a future templates/*.html or static/js/*.js file can reintroduce a bare marked.parse( and this guard will still pass. Walking the frontend directories keeps the regression test aligned with the security goal.

Proposed fix

+from pathlib import Path
 import os
 import re
 import unittest
@@
-    FRONTEND_FILES = [
-        "templates/workspace.html",
-        "templates/index.html",
-        "templates/search.html",
-        "templates/config.html",
-        "static/js/app.js",
-        "static/js/download.js",
-    ]
+    FRONTEND_DIRS = ("templates", "static/js")
@@
-        for rel in self.FRONTEND_FILES:
-            full = os.path.join(REPO_ROOT, rel)
-            if not os.path.exists(full):
-                continue
+        frontend_files = []
+        for root in self.FRONTEND_DIRS:
+            base = Path(REPO_ROOT, root)
+            frontend_files.extend(
+                path.relative_to(REPO_ROOT).as_posix()
+                for path in base.rglob("*")
+                if path.suffix in {".html", ".js"}
+            )
+
+        for rel in frontend_files:
+            full = os.path.join(REPO_ROOT, rel)
             with open(full, "r", encoding="utf-8") as f:
                 src = f.read()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_xss_sanitization.py` around lines 103 - 137, The test
test_marked_parse_appears_only_inside_safe_helper currently uses a fixed
FRONTEND_FILES list which can miss new frontend files; replace that static list
with a dynamic directory walk/glob under REPO_ROOT to collect all
templates/*.html and static/js/*.js (e.g., using os.walk or glob.glob) so every
current and future frontend file is scanned for the marked.parse regex. Keep the
existing logic and assertions (including the special-case check that
static/js/app.js has exactly one occurrence) but build per_file by iterating the
discovered file paths instead of the hardcoded FRONTEND_FILES; ensure you still
skip non-existent paths and preserve encoding="utf-8" when opening files.

40-52: ⚡ Quick win

Assert the Marked.js-before-DOMPurify part of the contract.

This test name says "after_marked", but today it only proves DOMPurify comes before app.js. If those two CDN tags are swapped, the test still passes. Add the missing marked_pos < purify_pos assertion so the regression guard matches its stated invariant.

Proposed fix

         self.assertGreater(marked_pos, 0, "Marked.js must be loaded")
         self.assertGreater(purify_pos, 0, "DOMPurify must be loaded")
         self.assertGreater(app_js_pos, 0, "app.js must be loaded")
+        self.assertLess(marked_pos, purify_pos,
+                        "DOMPurify should load after Marked.js")
         self.assertLess(purify_pos, app_js_pos,
                         "DOMPurify must load before app.js so renderMarkdownSafe can use it")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_xss_sanitization.py` around lines 40 - 52, The test
test_base_html_loads_dompurify_after_marked currently only asserts DOMPurify
loads before app.js but doesn't verify that Marked.js appears before DOMPurify;
add an assertion comparing marked_pos and purify_pos (e.g.,
assertLess(marked_pos, purify_pos, "Marked.js must load before DOMPurify so
renderMarkdownSafe can use it")) to ensure marked_pos < purify_pos, updating the
test function where marked_pos and purify_pos are computed.

static/js/app.js (1)

62-71: ⚡ Quick win

Preserve line breaks in the fallback path.

When this drops to escapeHtml(text), both current call sites inject the result into normal HTML flow, so multiline chats collapse into a single paragraph exactly in the CDN-failure path this helper is meant to keep usable. Converting newlines to <br> here would keep the safe fallback readable in both templates/workspace.html and static/js/download.js.

Proposed fix

 function renderMarkdownSafe(text) {
   if (!text) return '';
+  const escaped = escapeHtml(text).replace(/\r?\n/g, '<br>');
   if (typeof marked === 'undefined' || typeof DOMPurify === 'undefined') {
-    return escapeHtml(text);
+    return escaped;
   }
   try {
     const html = marked.parse(text, { breaks: true, gfm: true });
     return DOMPurify.sanitize(html);
   } catch (e) {
-    return escapeHtml(text);
+    return escaped;
   }
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@static/js/app.js` around lines 62 - 71, The fallback paths in
renderMarkdownSafe currently return escapeHtml(text) which collapses newlines;
update both the undefined-CDN branch (the if checking typeof marked/DOMPurify)
and the catch block in renderMarkdownSafe to convert escaped newlines into <br>
tags before returning (e.g. take the escaped string and replace /\r?\n/g with
'<br>'). This preserves line breaks safely when marked/DOMPurify are unavailable
while still returning escaped HTML via escapeHtml in both locations.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@static/js/app.js`:
- Around line 62-71: The fallback paths in renderMarkdownSafe currently return
escapeHtml(text) which collapses newlines; update both the undefined-CDN branch
(the if checking typeof marked/DOMPurify) and the catch block in
renderMarkdownSafe to convert escaped newlines into <br> tags before returning
(e.g. take the escaped string and replace /\r?\n/g with '<br>'). This preserves
line breaks safely when marked/DOMPurify are unavailable while still returning
escaped HTML via escapeHtml in both locations.

In `@tests/test_xss_sanitization.py`:
- Around line 103-137: The test
test_marked_parse_appears_only_inside_safe_helper currently uses a fixed
FRONTEND_FILES list which can miss new frontend files; replace that static list
with a dynamic directory walk/glob under REPO_ROOT to collect all
templates/*.html and static/js/*.js (e.g., using os.walk or glob.glob) so every
current and future frontend file is scanned for the marked.parse regex. Keep the
existing logic and assertions (including the special-case check that
static/js/app.js has exactly one occurrence) but build per_file by iterating the
discovered file paths instead of the hardcoded FRONTEND_FILES; ensure you still
skip non-existent paths and preserve encoding="utf-8" when opening files.
- Around line 40-52: The test test_base_html_loads_dompurify_after_marked
currently only asserts DOMPurify loads before app.js but doesn't verify that
Marked.js appears before DOMPurify; add an assertion comparing marked_pos and
purify_pos (e.g., assertLess(marked_pos, purify_pos, "Marked.js must load before
DOMPurify so renderMarkdownSafe can use it")) to ensure marked_pos < purify_pos,
updating the test function where marked_pos and purify_pos are computed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc9c713f-1a1d-4000-9e8e-4d01f3c985d7

📥 Commits

Reviewing files that changed from the base of the PR and between f8b3cb3 and 951ae44.

📒 Files selected for processing (5)

static/js/app.js
static/js/download.js
templates/base.html
templates/workspace.html
tests/test_xss_sanitization.py

timon0305 · 2026-05-07T21:57:06Z

Closed in favour of the in-repo equivalent so CI can run.

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

timon0305 closed this May 7, 2026

clean6378-max-it mentioned this pull request Jun 23, 2026

cppa-cursor-browser: Search UX polish and distinct /api/search error codes #117

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: sanitise Marked.js output with DOMPurify (closes #11)#12

fix: sanitise Marked.js output with DOMPurify (closes #11)#12
timon0305 wants to merge 1 commit into
cppalliance:masterfrom
timon0305:fix/xss-dompurify-marked-11

timon0305 commented May 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

timon0305 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

timon0305 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change

Test plan

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

timon0305 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timon0305 commented May 4, 2026 •

edited

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading