Skip to content

gh-145742: Manually emit _LOAD_FAST_BORROW to reduce stencil bloat#148217

Open
corona10 wants to merge 3 commits intopython:mainfrom
corona10:gh-145742-impl
Open

gh-145742: Manually emit _LOAD_FAST_BORROW to reduce stencil bloat#148217
corona10 wants to merge 3 commits intopython:mainfrom
corona10:gh-145742-impl

Conversation

@corona10
Copy link
Copy Markdown
Member

@corona10 corona10 commented Apr 7, 2026

  • Manually emit _LOAD_FAST_BORROW at JIT compile time, encoding the operand offset directly into the instruction instead of loading it from the GOT at runtime.
  • This shrinks the generic case (oparg ≥ 8) from 28 bytes to 8 bytes and eliminates 27 stencil functions.
  • I've compared machine code through godbolt:

@corona10
Copy link
Copy Markdown
Member Author

corona10 commented Apr 8, 2026

For i686: https://godbolt.org/z/cdjdzev5Y

@diegorusso
Copy link
Copy Markdown
Contributor

Some initial feedback on this:

  • we should not to pollute jit.c with uops implementation. They should live in separate compile units and have the same signature of the other ones (e.g.: void emit__UOP_NAME(unsigned char *code, unsigned char *data, _PyExecutorObject *executor, const _PyUOpInstruction *instruction, jit_state *state)
  • ifdefs can select the right architecture of the custom implementation
  • in bytecodes.c we should have a way to tell the JIT machinery not to generated any code for a specific uops but the uops implemetation should be accounted in the table in the jit-stencils-*.h (static const StencilGroup stencil_groups[MAX_UOP_REGS_ID + 1])
  • The linker later on will pick up our own version of the uops implementation.

@corona10
Copy link
Copy Markdown
Member Author

corona10 commented Apr 8, 2026

Thanks, @diegorusso. I’ll keep working on this based on your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants