Skip to content

Optimize ripper bounds, and other small perf tweaks#4083

Open
Earlopain wants to merge 3 commits intoruby:mainfrom
Earlopain:ripper-bounds-perf
Open

Optimize ripper bounds, and other small perf tweaks#4083
Earlopain wants to merge 3 commits intoruby:mainfrom
Earlopain:ripper-bounds-perf

Conversation

@Earlopain
Copy link
Copy Markdown
Collaborator

Basically a port of ruby/ruby@c45f781 into ruby

It's quite effective at ~97% hit rate for me. Speeds it up from ~6.77x slower to only 4.07x slower.

For the lexer on_sp it also gives a bit of an improvement: 1.04x slower to 1.10x faster

I guess the class may be universally useful but for now I just made it nodoc. I also didn't look much into the window, it seemed good enough.


Other commit is a small hack to skip redudant pack/unpacking of locations. Prism by default packs locations into a 64bit integer and later extracts start offset / length to create the location. But we know we access nearly all of the locations anyways, so we can request a frozen result to compute them eagerly. It gives about a 4% speed boost.


Benchmark from 2ea8139

@kddnewton
Copy link
Copy Markdown
Collaborator

I don't think I want that generally in parse result. Can you put that into the ripper translator? (Just not sure if we truly want to support that for everyone going forward.)

I see you must have determined unshift wasn't as fast as reassigning and splatting the array. That really surprises me, are you sure that's the case?

@Earlopain Earlopain force-pushed the ripper-bounds-perf branch from 3e8c264 to ee99eee Compare April 20, 2026 19:22
@Earlopain
Copy link
Copy Markdown
Collaborator Author

I don't think I want that generally in parse result. Can you put that into the ripper translator?

Yeah, I can move it. I had it nodoc but that's fine too.

I see you must have determined unshift wasn't as fast as reassigning and splatting the array. That really surprises me, are you sure that's the case?

It's probably not. But it also only happens for what ripper considers void statements. The change is only to make 3e8c264 work, and overall it comes out positive.

@Earlopain Earlopain force-pushed the ripper-bounds-perf branch from ee99eee to c22a6fd Compare April 20, 2026 19:47
Basically a port of ruby/ruby@c45f781 into ruby

It's quite effective at ~97% hit rate for me.
Speeds it up from ~6.77x slower to only 4.07x slower.

For the lexer `on_sp` it also gives a bit of an improvement:
1.04x slower to 1.10x faster

I guess the class may be universally useful but for now I just made it nodoc.
It's a small, somewhat hacky performance boost. Locations are lazy, by freezing the
result they don't have to be pack/unpacked redundantly.
This gives about a 4% speed boost.

Other changes are to not modify the frozen AST
@Earlopain Earlopain force-pushed the ripper-bounds-perf branch from c22a6fd to 9e93bd6 Compare April 20, 2026 20:11
@Earlopain Earlopain changed the title Optimize ripper bounds, and one other small perf tweak Optimize ripper bounds, and other small perf tweaks Apr 21, 2026
It was showing up in profiles.

So:
* Don't splat `KEYWORDS` (also did the same for `BINARY_OPERATORS`)
* Use `start_with?` if possible

Overall gives a ~5% speed boost
on_op(token)
else
on_ident(token)
end
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to speed this up, I think you should take advantage of opt_case_dispatch by making every key that's possible hash-able. That would look something like

Suggested change
case token[0]
when "."
on_period(token)
when "`"
on_backtick(token)
when "_"
on_ident(token)
when "@"
token[1] == "@" ? on_cvar(token) : on_ivar(token)
when "$"
on_gvar(token)
else
if allow_keywords && KEYWORDS.include?(token)
on_kw(token)
elsif token.match?(/^[[:upper:]]\w*$/)
on_const(token)
elsif token.match?(/^[[:punct:]]/)
on_op(token)
else
on_ident(token)
end
end

that should help because the case will become a jump table, with the other arms falling through to the linear check.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that but it doesn't work out to be faster (maybe because token[0] allocates a string?). Good idea regardless, didn't know about that trick

Out of curiosity I gave token.getbyte(0) a try (should be safe I think, although the code becomes less readable) but not much difference for that as well. So I think I'll just leave this be as is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants