fix: sanitise Marked.js output with DOMPurify (closes #11)#12
Conversation
`marked.parse(...)` does not sanitise. Two call sites were piping its return value straight into the DOM (workspace.html → main.innerHTML) or into a downloadable HTML blob (download.js), which exposed XSS via javascript: links, dangerous data: URIs, raw <script> blocks, and event-handler attributes — all of which Marked passes through. A successful XSS in the dashboard origin reads every chat in the local store and exfiltrates it. Changes: - templates/base.html: add DOMPurify 3.2.4 CDN script after Marked.js and before app.js (load order matters — app.js references DOMPurify). - static/js/app.js: add `renderMarkdownSafe(text)` helper. Wraps `marked.parse(...)` with `DOMPurify.sanitize(...)`. If either lib failed to load (CDN failure, ad blocker, tests), falls back to `escapeHtml(text)` rather than emitting raw markdown HTML — never fall through to a bare Marked call. - templates/workspace.html: `renderMarkdown(text)` now delegates to the shared helper. - static/js/download.js: HTML download blob uses the helper too. The downloaded file is opened in a browser and any payload would execute in the file:// origin, so the same threat model applies. Regression coverage in tests/test_xss_sanitization.py adds 8 cases: - DOMPurify CDN present in base.html, loaded before app.js - renderMarkdownSafe defined, invokes DOMPurify.sanitize, falls back to escapeHtml - both call sites use the helper - a source-level guard that fails if any frontend file outside the helper itself contains a bare `marked.parse(...)` call Test plan: 145/145 unit tests pass (8 new + 137 existing). Live smoke on `python3 app.py --port 3001` confirmed both Marked.js and DOMPurify reach the page and `renderMarkdownSafe` is exposed in the served app.js bundle.
📝 WalkthroughWalkthroughThis PR adds XSS sanitization to Marked.js markdown rendering across the frontend. DOMPurify is loaded via CDN, a new ChangesXSS Sanitization via DOMPurify
Sequence DiagramsequenceDiagram
participant Browser
participant MarkdownContent as Markdown<br/>Content
participant RenderSafe as renderMarkdownSafe()
participant MarkedLib as marked.parse()
participant DOMPurify as DOMPurify<br/>.sanitize()
participant DOM as DOM /<br/>Download
Browser->>RenderSafe: text
RenderSafe->>MarkedLib: raw markdown
MarkedLib-->>RenderSafe: HTML (unsanitized)
RenderSafe->>DOMPurify: unsanitized HTML
DOMPurify-->>RenderSafe: sanitized HTML
RenderSafe-->>DOM: safe to insert /<br/>download
Note over RenderSafe: Fallback to escapeHtml()<br/>if marked or DOMPurify missing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes The core refactoring is straightforward (two call sites → new helper), but the security implications require careful verification of the sanitization logic, DOMPurify usage, and fallback paths. The comprehensive test suite covers multiple attack vectors and static invariants, which is thorough but adds review scope. Changes span five files with mixed purposes (library, logic, tests), and the heterogeneity of file types demands separate reasoning for each layer. Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
tests/test_xss_sanitization.py (2)
103-137: ⚡ Quick winScan the frontend directories instead of a fixed allowlist.
FRONTEND_FILESonly covers today's files, so a futuretemplates/*.htmlorstatic/js/*.jsfile can reintroduce a baremarked.parse(and this guard will still pass. Walking the frontend directories keeps the regression test aligned with the security goal.Proposed fix
+from pathlib import Path import os import re import unittest @@ - FRONTEND_FILES = [ - "templates/workspace.html", - "templates/index.html", - "templates/search.html", - "templates/config.html", - "static/js/app.js", - "static/js/download.js", - ] + FRONTEND_DIRS = ("templates", "static/js") @@ - for rel in self.FRONTEND_FILES: - full = os.path.join(REPO_ROOT, rel) - if not os.path.exists(full): - continue + frontend_files = [] + for root in self.FRONTEND_DIRS: + base = Path(REPO_ROOT, root) + frontend_files.extend( + path.relative_to(REPO_ROOT).as_posix() + for path in base.rglob("*") + if path.suffix in {".html", ".js"} + ) + + for rel in frontend_files: + full = os.path.join(REPO_ROOT, rel) with open(full, "r", encoding="utf-8") as f: src = f.read()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_xss_sanitization.py` around lines 103 - 137, The test test_marked_parse_appears_only_inside_safe_helper currently uses a fixed FRONTEND_FILES list which can miss new frontend files; replace that static list with a dynamic directory walk/glob under REPO_ROOT to collect all templates/*.html and static/js/*.js (e.g., using os.walk or glob.glob) so every current and future frontend file is scanned for the marked.parse regex. Keep the existing logic and assertions (including the special-case check that static/js/app.js has exactly one occurrence) but build per_file by iterating the discovered file paths instead of the hardcoded FRONTEND_FILES; ensure you still skip non-existent paths and preserve encoding="utf-8" when opening files.
40-52: ⚡ Quick winAssert the Marked.js-before-DOMPurify part of the contract.
This test name says "after_marked", but today it only proves DOMPurify comes before
app.js. If those two CDN tags are swapped, the test still passes. Add the missingmarked_pos < purify_posassertion so the regression guard matches its stated invariant.Proposed fix
self.assertGreater(marked_pos, 0, "Marked.js must be loaded") self.assertGreater(purify_pos, 0, "DOMPurify must be loaded") self.assertGreater(app_js_pos, 0, "app.js must be loaded") + self.assertLess(marked_pos, purify_pos, + "DOMPurify should load after Marked.js") self.assertLess(purify_pos, app_js_pos, "DOMPurify must load before app.js so renderMarkdownSafe can use it")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_xss_sanitization.py` around lines 40 - 52, The test test_base_html_loads_dompurify_after_marked currently only asserts DOMPurify loads before app.js but doesn't verify that Marked.js appears before DOMPurify; add an assertion comparing marked_pos and purify_pos (e.g., assertLess(marked_pos, purify_pos, "Marked.js must load before DOMPurify so renderMarkdownSafe can use it")) to ensure marked_pos < purify_pos, updating the test function where marked_pos and purify_pos are computed.static/js/app.js (1)
62-71: ⚡ Quick winPreserve line breaks in the fallback path.
When this drops to
escapeHtml(text), both current call sites inject the result into normal HTML flow, so multiline chats collapse into a single paragraph exactly in the CDN-failure path this helper is meant to keep usable. Converting newlines to<br>here would keep the safe fallback readable in bothtemplates/workspace.htmlandstatic/js/download.js.Proposed fix
function renderMarkdownSafe(text) { if (!text) return ''; + const escaped = escapeHtml(text).replace(/\r?\n/g, '<br>'); if (typeof marked === 'undefined' || typeof DOMPurify === 'undefined') { - return escapeHtml(text); + return escaped; } try { const html = marked.parse(text, { breaks: true, gfm: true }); return DOMPurify.sanitize(html); } catch (e) { - return escapeHtml(text); + return escaped; } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@static/js/app.js` around lines 62 - 71, The fallback paths in renderMarkdownSafe currently return escapeHtml(text) which collapses newlines; update both the undefined-CDN branch (the if checking typeof marked/DOMPurify) and the catch block in renderMarkdownSafe to convert escaped newlines into <br> tags before returning (e.g. take the escaped string and replace /\r?\n/g with '<br>'). This preserves line breaks safely when marked/DOMPurify are unavailable while still returning escaped HTML via escapeHtml in both locations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@static/js/app.js`:
- Around line 62-71: The fallback paths in renderMarkdownSafe currently return
escapeHtml(text) which collapses newlines; update both the undefined-CDN branch
(the if checking typeof marked/DOMPurify) and the catch block in
renderMarkdownSafe to convert escaped newlines into <br> tags before returning
(e.g. take the escaped string and replace /\r?\n/g with '<br>'). This preserves
line breaks safely when marked/DOMPurify are unavailable while still returning
escaped HTML via escapeHtml in both locations.
In `@tests/test_xss_sanitization.py`:
- Around line 103-137: The test
test_marked_parse_appears_only_inside_safe_helper currently uses a fixed
FRONTEND_FILES list which can miss new frontend files; replace that static list
with a dynamic directory walk/glob under REPO_ROOT to collect all
templates/*.html and static/js/*.js (e.g., using os.walk or glob.glob) so every
current and future frontend file is scanned for the marked.parse regex. Keep the
existing logic and assertions (including the special-case check that
static/js/app.js has exactly one occurrence) but build per_file by iterating the
discovered file paths instead of the hardcoded FRONTEND_FILES; ensure you still
skip non-existent paths and preserve encoding="utf-8" when opening files.
- Around line 40-52: The test test_base_html_loads_dompurify_after_marked
currently only asserts DOMPurify loads before app.js but doesn't verify that
Marked.js appears before DOMPurify; add an assertion comparing marked_pos and
purify_pos (e.g., assertLess(marked_pos, purify_pos, "Marked.js must load before
DOMPurify so renderMarkdownSafe can use it")) to ensure marked_pos < purify_pos,
updating the test function where marked_pos and purify_pos are computed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc9c713f-1a1d-4000-9e8e-4d01f3c985d7
📒 Files selected for processing (5)
static/js/app.jsstatic/js/download.jstemplates/base.htmltemplates/workspace.htmltests/test_xss_sanitization.py
|
Closed in favour of the in-repo equivalent so CI can run. |
Problem
marked.parse(...)doesn't sanitise. Two call sites —templates/workspace.html(→innerHTML) andstatic/js/download.js(→ download blob) — were piping its output straight into the DOM, exposing XSS viajavascript:links,data:URIs, raw<script>tags, and event-handler attributes. A hit reads the entire local chat store.Change
templates/base.html— load DOMPurify 3.2.4 (after Marked.js, beforeapp.js).static/js/app.js— newrenderMarkdownSafe(text)helper:marked.parse → DOMPurify.sanitize. Falls back toescapeHtmlif either lib is missing — never a bare Marked call.templates/workspace.htmlandstatic/js/download.js— both call sites now use the helper.Test plan
tests/test_xss_sanitization.py, including a source-level guard that fails if any future frontend file outside the helper contains a baremarked.parse(...)).app.jsexposesrenderMarkdownSafe.renderMarkdownSafe('[click](javascript:alert(1))')→hrefstrippedrenderMarkdownSafe('<script>alert(1)</script>')→<script>strippedrenderMarkdownSafe('# H\n\n**bold**')→ full HTML preserved (sanity)Closes #11.