Skip to content

fix: sanitise Marked.js output with DOMPurify (closes #11)#12

Closed
timon0305 wants to merge 1 commit into
cppalliance:masterfrom
timon0305:fix/xss-dompurify-marked-11
Closed

fix: sanitise Marked.js output with DOMPurify (closes #11)#12
timon0305 wants to merge 1 commit into
cppalliance:masterfrom
timon0305:fix/xss-dompurify-marked-11

Conversation

@timon0305

@timon0305 timon0305 commented May 4, 2026

Copy link
Copy Markdown
Collaborator

Problem

marked.parse(...) doesn't sanitise. Two call sites — templates/workspace.html (→ innerHTML) and static/js/download.js (→ download blob) — were piping its output straight into the DOM, exposing XSS via javascript: links, data: URIs, raw <script> tags, and event-handler attributes. A hit reads the entire local chat store.

Change

  • templates/base.html — load DOMPurify 3.2.4 (after Marked.js, before app.js).
  • static/js/app.js — new renderMarkdownSafe(text) helper: marked.parse → DOMPurify.sanitize. Falls back to escapeHtml if either lib is missing — never a bare Marked call.
  • templates/workspace.html and static/js/download.js — both call sites now use the helper.

Test plan

  • 145/145 unit tests pass (8 new in tests/test_xss_sanitization.py, including a source-level guard that fails if any future frontend file outside the helper contains a bare marked.parse(...)).
  • Live smoke: home returns 200, served HTML loads both Marked + DOMPurify, app.js exposes renderMarkdownSafe.
  • Browser DevTools verification (run on a workspace page):
    • renderMarkdownSafe('[click](javascript:alert(1))')href stripped
    • renderMarkdownSafe('<script>alert(1)</script>')<script> stripped
    • renderMarkdownSafe('# H\n\n**bold**') → full HTML preserved (sanity)

Closes #11.

`marked.parse(...)` does not sanitise. Two call sites were piping its
return value straight into the DOM (workspace.html → main.innerHTML)
or into a downloadable HTML blob (download.js), which exposed XSS via
javascript: links, dangerous data: URIs, raw <script> blocks, and
event-handler attributes — all of which Marked passes through.

A successful XSS in the dashboard origin reads every chat in the
local store and exfiltrates it.

Changes:

- templates/base.html: add DOMPurify 3.2.4 CDN script after Marked.js
  and before app.js (load order matters — app.js references DOMPurify).
- static/js/app.js: add `renderMarkdownSafe(text)` helper. Wraps
  `marked.parse(...)` with `DOMPurify.sanitize(...)`. If either lib
  failed to load (CDN failure, ad blocker, tests), falls back to
  `escapeHtml(text)` rather than emitting raw markdown HTML — never
  fall through to a bare Marked call.
- templates/workspace.html: `renderMarkdown(text)` now delegates to
  the shared helper.
- static/js/download.js: HTML download blob uses the helper too.
  The downloaded file is opened in a browser and any payload would
  execute in the file:// origin, so the same threat model applies.

Regression coverage in tests/test_xss_sanitization.py adds 8 cases:

- DOMPurify CDN present in base.html, loaded before app.js
- renderMarkdownSafe defined, invokes DOMPurify.sanitize, falls back
  to escapeHtml
- both call sites use the helper
- a source-level guard that fails if any frontend file outside the
  helper itself contains a bare `marked.parse(...)` call

Test plan: 145/145 unit tests pass (8 new + 137 existing). Live smoke
on `python3 app.py --port 3001` confirmed both Marked.js and
DOMPurify reach the page and `renderMarkdownSafe` is exposed in
the served app.js bundle.
@coderabbitai

coderabbitai Bot commented May 4, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR adds XSS sanitization to Marked.js markdown rendering across the frontend. DOMPurify is loaded via CDN, a new renderMarkdownSafe() helper wraps marked.parse() with DOMPurify.sanitize(), both unsanitized call sites (workspace HTML and download export) are refactored to use the helper, and comprehensive regression tests prevent reintroduction of bare markdown parsing patterns.

Changes

XSS Sanitization via DOMPurify

Layer / File(s) Summary
Library Dependency
templates/base.html
DOMPurify v3.2.4 is loaded from CDN in <head>, positioned to initialize before static/js/app.js.
Safe Rendering Helper
static/js/app.js
New renderMarkdownSafe(text) function wraps marked.parse(text, { breaks: true, gfm: true }) with DOMPurify.sanitize(...). Falls back to escapeHtml(text) if either library is missing or if parsing/sanitization throws; returns empty string for falsy input.
Call Site Updates
templates/workspace.html, static/js/download.js
renderMarkdown(text) in workspace.html now delegates to renderMarkdownSafe(text) instead of inline marked.parse(). HTML export in download.js calls renderMarkdownSafe(md) instead of bare marked.parse(md) to produce sanitized content.
Regression Tests
tests/test_xss_sanitization.py
Four test classes verify DOMPurify is loaded and ordered correctly, renderMarkdownSafe exists and sanitizes, both call sites use the safe helper, and marked.parse( appears exactly once across all listed frontend files (only within the helper).

Sequence Diagram

sequenceDiagram
    participant Browser
    participant MarkdownContent as Markdown<br/>Content
    participant RenderSafe as renderMarkdownSafe()
    participant MarkedLib as marked.parse()
    participant DOMPurify as DOMPurify<br/>.sanitize()
    participant DOM as DOM /<br/>Download

    Browser->>RenderSafe: text
    RenderSafe->>MarkedLib: raw markdown
    MarkedLib-->>RenderSafe: HTML (unsanitized)
    RenderSafe->>DOMPurify: unsanitized HTML
    DOMPurify-->>RenderSafe: sanitized HTML
    RenderSafe-->>DOM: safe to insert /<br/>download
    
    Note over RenderSafe: Fallback to escapeHtml()<br/>if marked or DOMPurify missing
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

The core refactoring is straightforward (two call sites → new helper), but the security implications require careful verification of the sanitization logic, DOMPurify usage, and fallback paths. The comprehensive test suite covers multiple attack vectors and static invariants, which is thorough but adds review scope. Changes span five files with mixed purposes (library, logic, tests), and the heterogeneity of file types demands separate reasoning for each layer.

Poem

🐰 A sanitizer's tale, oh what a sight!
DOMPurify shields us from XSS blight,
Marked's output tamed with a helper so neat,
No bare parse today—our chat's secure and sweet!
Tests keep the dragons from creeping back in, 🛡️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main change: sanitizing Marked.js output with DOMPurify, and indicates it closes issue #11.
Linked Issues check ✅ Passed All objectives from issue #11 are implemented: DOMPurify loaded in base.html with correct load order, renderMarkdownSafe helper created in app.js with sanitization and fallback, call sites in workspace.html and download.js updated, and regression tests added.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the security objective to prevent XSS via unsanitized Marked.js output; no out-of-scope modifications detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tests/test_xss_sanitization.py (2)

103-137: ⚡ Quick win

Scan the frontend directories instead of a fixed allowlist.

FRONTEND_FILES only covers today's files, so a future templates/*.html or static/js/*.js file can reintroduce a bare marked.parse( and this guard will still pass. Walking the frontend directories keeps the regression test aligned with the security goal.

Proposed fix
+from pathlib import Path
 import os
 import re
 import unittest
@@
-    FRONTEND_FILES = [
-        "templates/workspace.html",
-        "templates/index.html",
-        "templates/search.html",
-        "templates/config.html",
-        "static/js/app.js",
-        "static/js/download.js",
-    ]
+    FRONTEND_DIRS = ("templates", "static/js")
@@
-        for rel in self.FRONTEND_FILES:
-            full = os.path.join(REPO_ROOT, rel)
-            if not os.path.exists(full):
-                continue
+        frontend_files = []
+        for root in self.FRONTEND_DIRS:
+            base = Path(REPO_ROOT, root)
+            frontend_files.extend(
+                path.relative_to(REPO_ROOT).as_posix()
+                for path in base.rglob("*")
+                if path.suffix in {".html", ".js"}
+            )
+
+        for rel in frontend_files:
+            full = os.path.join(REPO_ROOT, rel)
             with open(full, "r", encoding="utf-8") as f:
                 src = f.read()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_xss_sanitization.py` around lines 103 - 137, The test
test_marked_parse_appears_only_inside_safe_helper currently uses a fixed
FRONTEND_FILES list which can miss new frontend files; replace that static list
with a dynamic directory walk/glob under REPO_ROOT to collect all
templates/*.html and static/js/*.js (e.g., using os.walk or glob.glob) so every
current and future frontend file is scanned for the marked.parse regex. Keep the
existing logic and assertions (including the special-case check that
static/js/app.js has exactly one occurrence) but build per_file by iterating the
discovered file paths instead of the hardcoded FRONTEND_FILES; ensure you still
skip non-existent paths and preserve encoding="utf-8" when opening files.

40-52: ⚡ Quick win

Assert the Marked.js-before-DOMPurify part of the contract.

This test name says "after_marked", but today it only proves DOMPurify comes before app.js. If those two CDN tags are swapped, the test still passes. Add the missing marked_pos < purify_pos assertion so the regression guard matches its stated invariant.

Proposed fix
         self.assertGreater(marked_pos, 0, "Marked.js must be loaded")
         self.assertGreater(purify_pos, 0, "DOMPurify must be loaded")
         self.assertGreater(app_js_pos, 0, "app.js must be loaded")
+        self.assertLess(marked_pos, purify_pos,
+                        "DOMPurify should load after Marked.js")
         self.assertLess(purify_pos, app_js_pos,
                         "DOMPurify must load before app.js so renderMarkdownSafe can use it")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_xss_sanitization.py` around lines 40 - 52, The test
test_base_html_loads_dompurify_after_marked currently only asserts DOMPurify
loads before app.js but doesn't verify that Marked.js appears before DOMPurify;
add an assertion comparing marked_pos and purify_pos (e.g.,
assertLess(marked_pos, purify_pos, "Marked.js must load before DOMPurify so
renderMarkdownSafe can use it")) to ensure marked_pos < purify_pos, updating the
test function where marked_pos and purify_pos are computed.
static/js/app.js (1)

62-71: ⚡ Quick win

Preserve line breaks in the fallback path.

When this drops to escapeHtml(text), both current call sites inject the result into normal HTML flow, so multiline chats collapse into a single paragraph exactly in the CDN-failure path this helper is meant to keep usable. Converting newlines to <br> here would keep the safe fallback readable in both templates/workspace.html and static/js/download.js.

Proposed fix
 function renderMarkdownSafe(text) {
   if (!text) return '';
+  const escaped = escapeHtml(text).replace(/\r?\n/g, '<br>');
   if (typeof marked === 'undefined' || typeof DOMPurify === 'undefined') {
-    return escapeHtml(text);
+    return escaped;
   }
   try {
     const html = marked.parse(text, { breaks: true, gfm: true });
     return DOMPurify.sanitize(html);
   } catch (e) {
-    return escapeHtml(text);
+    return escaped;
   }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@static/js/app.js` around lines 62 - 71, The fallback paths in
renderMarkdownSafe currently return escapeHtml(text) which collapses newlines;
update both the undefined-CDN branch (the if checking typeof marked/DOMPurify)
and the catch block in renderMarkdownSafe to convert escaped newlines into <br>
tags before returning (e.g. take the escaped string and replace /\r?\n/g with
'<br>'). This preserves line breaks safely when marked/DOMPurify are unavailable
while still returning escaped HTML via escapeHtml in both locations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@static/js/app.js`:
- Around line 62-71: The fallback paths in renderMarkdownSafe currently return
escapeHtml(text) which collapses newlines; update both the undefined-CDN branch
(the if checking typeof marked/DOMPurify) and the catch block in
renderMarkdownSafe to convert escaped newlines into <br> tags before returning
(e.g. take the escaped string and replace /\r?\n/g with '<br>'). This preserves
line breaks safely when marked/DOMPurify are unavailable while still returning
escaped HTML via escapeHtml in both locations.

In `@tests/test_xss_sanitization.py`:
- Around line 103-137: The test
test_marked_parse_appears_only_inside_safe_helper currently uses a fixed
FRONTEND_FILES list which can miss new frontend files; replace that static list
with a dynamic directory walk/glob under REPO_ROOT to collect all
templates/*.html and static/js/*.js (e.g., using os.walk or glob.glob) so every
current and future frontend file is scanned for the marked.parse regex. Keep the
existing logic and assertions (including the special-case check that
static/js/app.js has exactly one occurrence) but build per_file by iterating the
discovered file paths instead of the hardcoded FRONTEND_FILES; ensure you still
skip non-existent paths and preserve encoding="utf-8" when opening files.
- Around line 40-52: The test test_base_html_loads_dompurify_after_marked
currently only asserts DOMPurify loads before app.js but doesn't verify that
Marked.js appears before DOMPurify; add an assertion comparing marked_pos and
purify_pos (e.g., assertLess(marked_pos, purify_pos, "Marked.js must load before
DOMPurify so renderMarkdownSafe can use it")) to ensure marked_pos < purify_pos,
updating the test function where marked_pos and purify_pos are computed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc9c713f-1a1d-4000-9e8e-4d01f3c985d7

📥 Commits

Reviewing files that changed from the base of the PR and between f8b3cb3 and 951ae44.

📒 Files selected for processing (5)
  • static/js/app.js
  • static/js/download.js
  • templates/base.html
  • templates/workspace.html
  • tests/test_xss_sanitization.py

@timon0305

Copy link
Copy Markdown
Collaborator Author

Closed in favour of the in-repo equivalent so CI can run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: sanitise Marked.js output with DOMPurify (XSS via javascript: links / data: URIs)

1 participant