perf(parser): byte_search macro always unroll main loop (#2439)

Refactor `byte_search!` macro to move logic out of the main loop. This ensures the compiler unrolls the loop.

This speeds up lexing single-line comments by 20%-25% on the benchmarks which contain enough comments for the change to register. Presumably the loop wasn't unrolled previously.

The code required to do this is a little odd. It adds an extra `loop {}` which always exits on the first turn (so not really a useful loop), but is required to be able to use `break` to exit that "loop", making 2 different paths for (1) matching byte found and (2) `for` loop completed without finding any match.

This is only way I could find to produce this behavior without using a macro. Is there a more "normal" way to get the same logic?
This commit is contained in:
overlookmotel 2024-02-20 02:39:52 +00:00 committed by GitHub
parent 27b2c212c4
commit 996a9d27eb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -495,45 +495,56 @@ macro_rules! byte_search {
let mut $pos = $start; let mut $pos = $start;
#[allow(unused_unsafe)] // Silence warnings if macro called in unsafe code #[allow(unused_unsafe)] // Silence warnings if macro called in unsafe code
loop { 'outer: loop {
#[allow(clippy::redundant_else)]
if $pos.addr() <= $lexer.source.end_for_batch_search_addr() { if $pos.addr() <= $lexer.source.end_for_batch_search_addr() {
// Search a batch of `SEARCH_BATCH_SIZE` bytes. // Search a batch of `SEARCH_BATCH_SIZE` bytes.
// The compiler unrolls this loop. //
// `'inner: loop {}` is not a real loop - it always exits on first turn.
// Only using `loop {}` so that can use `break 'inner` to get out of it.
// This allows complex logic of `$should_continue` and `$match_handler` to be
// outside the `for` loop, keeping it as minimal as possible, to encourage
// compiler to unroll it.
//
// SAFETY: // SAFETY:
// `$pos.addr() > lexer.source.end_for_batch_search_addr()` check above ensures there are // `$pos.addr() <= lexer.source.end_for_batch_search_addr()` check above ensures
// at least `SEARCH_BATCH_SIZE` bytes remaining in `lexer.source`. // there are at least `SEARCH_BATCH_SIZE` bytes remaining in `lexer.source`.
// So calls to `$pos.read()` and `$pos.add(1)` in this loop cannot go out of bounds. // So calls to `$pos.read()` and `$pos.add(1)` in this loop cannot go out of bounds.
for _i in 0..crate::lexer::search::SEARCH_BATCH_SIZE { let $match_byte = 'inner: loop {
// SAFETY: `$pos` cannot go out of bounds in this loop (see above). for _i in 0..crate::lexer::search::SEARCH_BATCH_SIZE {
let $match_byte = unsafe { $pos.read() }; // SAFETY: `$pos` cannot go out of bounds in this loop (see above)
if $table.matches($match_byte) { let byte = unsafe { $pos.read() };
// Found match. if $table.matches(byte) {
// Check if should continue. break 'inner byte;
{
let $continue_byte = $match_byte;
if $should_continue {
// Not a match after all - continue searching.
// SAFETY: `pos` is not at end of source, so safe to advance 1 byte.
// See above about UTF-8 character boundaries invariant.
$pos = unsafe { $pos.add(1) };
continue;
}
} }
// Advance `lexer.source`'s position up to `$pos`, consuming unmatched bytes. // No match - continue searching batch.
// SAFETY: See above about UTF-8 character boundaries invariant. // SAFETY: `$pos` cannot go out of bounds in this loop (see above).
$lexer.source.set_position($pos); // Also see above about UTF-8 character boundaries invariant.
$pos = unsafe { $pos.add(1) };
let $match_start = $start;
return $match_handler;
} }
// No match in batch - search next batch
continue 'outer;
};
// No match - continue searching // Found match. Check if should continue.
// SAFETY: `$pos` cannot go out of bounds in this loop (see above). {
// Also see above about UTF-8 character boundaries invariant. let $continue_byte = $match_byte;
$pos = unsafe { $pos.add(1) }; if $should_continue {
// Not a match after all - continue searching.
// SAFETY: `pos` is not at end of source, so safe to advance 1 byte.
// See above about UTF-8 character boundaries invariant.
$pos = unsafe { $pos.add(1) };
continue;
}
} }
// No match in batch - loop round and searching next batch
// Advance `lexer.source`'s position up to `$pos`, consuming unmatched bytes.
// SAFETY: See above about UTF-8 character boundaries invariant.
$lexer.source.set_position($pos);
let $match_start = $start;
return $match_handler;
} else { } else {
// Not enough bytes remaining to process as a batch. // Not enough bytes remaining to process as a batch.
// This branch marked `#[cold]` as should be very uncommon in normal-length JS files. // This branch marked `#[cold]` as should be very uncommon in normal-length JS files.