oxc/crates
overlookmotel d3a59f27f7
perf(parser): lex identifiers as bytes not chars (#2352)
This PR re-implements lexing identifiers with a fast path for the most common case - identifiers which are pure ASCII characters, using the new `Source` / `SourcePosition` APIs.

Lexing identifiers is a hot path, and accounts for the majority of the time the Lexer spends. The performance bump from this change is (if I do say so myself!) quite decent.

I've spent a lot of time tuning the implementation, which gained a further 10-15% on the Lexer benchmarks compared to my first, simpler attempt. Some of the design decisions, if they look odd, are likely motivated by gains in performance.

### Techniques

This implementation uses a few different strategies for performance:

* Search byte-by-byte, not char-by-char.
* Process batches of 32 bytes at a time to reduce bounds checks.
* Mark uncommon paths `#[cold]`.

### Structure

The implementation is built in 3 layers:

1. ASCII characters only.
2. ASCII and Unicode characters.
3. `\` escape sequences (and all the above).

`identifier_name_handler` starts at the top layer, and is optimized for consuming ASCII as fast as possible. Each "layer" is considered more uncommon than the previous, and dropping down a layer is a de-opt.

I'm assuming that 95%+ of JavaScript code does not include either Unicode characters or escapes in identifiers, so the speed of the fast path is prioritised.

That said, once a Unicode character is encountered, the next layer does expect to find further Unicode characters, rather than de-opting over and over again. If an identifier *starts* with a Unicode character, it enters the code straight on the 2nd layer, so is not penalised by going through a `#[cold]` boundary. Lexing Unicode is never going to be as fast as ASCII, but still I felt it was important not to penalise it unnecessarily, so as not to be Anglo-centric.

### ASCII search macro

The main ASCII search is implemented as a macro. I found that, for reasons I don't understand, it's significantly faster to have all the code in a single function, even compared to multiple functions marked `#[inline]` or `#[inline(always)]`. The fastest implementation also requires some code to be repeated twice, which is nicer to do with a macro.

This macro, and the `ByteMatchTable` types that go with it, are designed to be re-usable. Next step will be to apply them for whitespace and strings, which should be fairly simple.

Searching in batches of 32 bytes is also designed to be forward-compatible with SIMD.

### Bye bye `AutoCow`

`AutoCow` is removed. Instead, a string-builder is only created if it's needed, when a `\` escape is first encountered. The string builder is also more efficient than `AutoCow` was, as it copies bytes in chunks, rather than 1-by-1.

This won't make much difference for identifiers, as escapes are so rare anyway, but this same technique can be used for strings, where they're more common.
2024-02-09 12:01:30 +08:00
..
oxc Publish crates v0.6.0 2024-02-03 22:35:30 +08:00
oxc_allocator Publish crates v0.6.0 2024-02-03 22:35:30 +08:00
oxc_ast feat(ast): enter AstKind::ExportDefaultDeclaration, AstKind::ExportNamedDeclaration and AstKind::ExportAllDeclaration (#2317) 2024-02-05 17:43:30 +08:00
oxc_cli feat(cli): add --version (#2182) 2024-01-26 19:13:17 +08:00
oxc_codegen feat(codegen): avoid printing comma in ArrayAssignmentTarget if the elements is empty (#2331) 2024-02-06 22:45:19 +08:00
oxc_diagnostics Publish crates v0.6.0 2024-02-03 22:35:30 +08:00
oxc_index Publish crates v0.6.0 2024-02-03 22:35:30 +08:00
oxc_js_regex
oxc_language_server chore(deps): update cargo (#2191) 2024-01-29 11:38:47 +08:00
oxc_linter refactor(linter/config): Use serde::Deserialize for config parsing (#2325) 2024-02-08 16:48:38 +08:00
oxc_macros feat(linter): remove the --timings feature (#2049) 2024-01-16 14:21:04 +08:00
oxc_minifier refactor(ast): fix BigInt memory leak by removing it (#2293) 2024-02-04 16:47:00 +08:00
oxc_parser perf(parser): lex identifiers as bytes not chars (#2352) 2024-02-09 12:01:30 +08:00
oxc_prettier refactor(prettier): s/nodes/stack (#2347) 2024-02-08 23:22:44 +08:00
oxc_semantic feat(semantic): add export binding for ExportDefaultDeclarations in module record (#2329) 2024-02-06 22:01:16 +08:00
oxc_span feat(span): fix memory leak by implementing inlineable string for oxc_allocator (#2294) 2024-02-04 19:28:23 +08:00
oxc_syntax fix(semantic): remove unnecessary SymbolFlags::Import (#2311) 2024-02-05 14:16:29 +08:00
oxc_transformer Publish crates v0.6.0 2024-02-03 22:35:30 +08:00
oxc_wasm chore(wasm): remove console_error_panic_hook 2024-02-02 17:02:01 +08:00