Skip to content

performance: further performance optimizations for large documents#1

Merged
JackByrne merged 8 commits into
developfrom
performance-optimizations
May 18, 2026
Merged

performance: further performance optimizations for large documents#1
JackByrne merged 8 commits into
developfrom
performance-optimizations

Conversation

@JackByrne
Copy link
Copy Markdown
Member

Summary

This pull request refactors and optimizes the docxtpl/template.py module with a strong focus on rendering performance.

The primary motivation for these changes was improving rendering performance for large documents containing complex tables. In a real-world case, a document containing a table with 2,164 rows previously required approximately one hour to render. With these optimizations applied, rendering time was reduced to under 10 seconds.


Key Improvements

Performance Optimizations

  • Pre-compiled regular expressions

    • Regular expressions used for XML tag stripping and Jinja syntax detection are now pre-compiled and reused.
    • This avoids repeated compilation overhead during rendering and improves parsing efficiency.
  • Early exit in resolve_listing()

    • Added a fast-path return when no special characters are present.
    • Avoids unnecessary processing in the most common rendering scenarios.
  • Optimized XML tree manipulation

    • Refactored map_tree() to replace the entire <w:body> element in a single operation (O(1) complexity).
    • Includes a fallback to the previous per-child replacement logic for malformed templates or edge cases.
    • This significantly improves performance for large documents and large table structures.

Conditional Header/Footer Rendering

The rendering pipeline now skips header and footer processing when no Jinja tags are detected.

Changes include:

  • Pre-compiled Jinja detection patterns for faster checks.
  • Conditional rendering logic to avoid unnecessary processing of static headers and footers.
  • Safe fallback to the existing rendering behavior if detection or rendering fails.

This reduces overhead for documents where headers and footers are static.


Real-World Performance Impact

Scenario Before After
Large document with 2,164-row table ~1 hour <10 seconds

These changes provide substantial improvements for large and complex templates while maintaining compatibility with existing rendering behavior.

bonggo-pras and others added 6 commits May 12, 2026 16:15
Delete the module-level logger and several logger.warning calls in docxtpl/template.py. Added while debugging and should be removed.
Improve documentation in map_tree to explain the optimization: the code swaps the entire <w:body> via root.remove() + root.insert() to avoid O(n) per-child lxml operations, which is effectively O(1) on the document root. Clarify that the body's index is preserved so element order (body before sectPr) remains intact, and spell out the fallback behavior (child-by-child copy) if the body isn't a direct child or if remove/insert fails. Add additional safety and explanatory comments.
Enhance header/footer processing by detecting Jinja tags split across Word XML runs: check both intact tags (_JINJA_PATTERN) and open-tag fragments (_RE_JINJA_OPEN) when scanning part XML. Use a generator to iterate part XML strings once, and keep the existing exception fallback to unconditionally render headers/footers if the fast-path check fails (e.g. malformed XML). Also add clarifying comments about properties and footnotes skipping behaviour and make minor comment style fixes.
Add a fast-path to DocxTemplate.resolve_listing that returns the input XML unchanged when no Listing special characters are present. The check looks for tab, newline, bell and form-feed ("\t", "\n", "\a", "\f") and avoids running the heavier resolution logic in the common case, improving performance without changing behavior.
Introduce pre-compiled regex patterns (_RE_TAG_STRIP and _RE_COMMENT_STRIP) to strip surrounding <w:y> tags from template tags like {%y ...%}, {{y ...}} and comments {#y ...#}. Replace repeated re.sub loops with iteration over these patterns to avoid recompiling the same regexes on every call, reduce code duplication, and improve performance/maintainability.
Clean up docxtpl/template.py by removing unused imports: functools, logging, and Template from jinja2. Keeps Environment and meta from jinja2 and does not change runtime behavior; this reduces linter warnings and unnecessary dependencies.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes docxtpl/template.py to significantly reduce rendering time for large/complex DOCX templates (notably large tables) by reducing repeated regex work, avoiding unnecessary processing, and improving XML tree replacement efficiency.

Changes:

  • Pre-compile regexes used during XML patching and add a fast-path in resolve_listing() when no special characters are present.
  • Refactor map_tree() to replace the <w:body> element via root remove/insert (with a fallback path).
  • Skip header/footer rendering work when no Jinja syntax is detected, with a fallback to unconditional processing on errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docxtpl/template.py
Comment thread docxtpl/template.py Outdated
Update comment in docxtpl/template.py to clarify the fallback behavior when processing headers and footers. The comment now explains the fallback guards against unexpected part structure (e.g. blob is None or missing attributes) rather than implying it handles malformed XML; malformed XML would still fail in build_headers_footers_xml. No functional change.
@JackByrne JackByrne merged commit a0564e1 into develop May 18, 2026
@JackByrne JackByrne deleted the performance-optimizations branch May 18, 2026 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants