oxc/crates at d3a59f27f7d8ba925146fda992fda868661a7954 - dan/oxc

dan/oxc

mirror of https://github.com/danbulant/oxc synced 2026-05-24 20:32:10 +00:00

History

overlookmotel d3a59f27f7 perf(parser): lex identifiers as bytes not chars (#2352 ) This PR re-implements lexing identifiers with a fast path for the most common case - identifiers which are pure ASCII characters, using the new `Source` / `SourcePosition` APIs. Lexing identifiers is a hot path, and accounts for the majority of the time the Lexer spends. The performance bump from this change is (if I do say so myself!) quite decent. I've spent a lot of time tuning the implementation, which gained a further 10-15% on the Lexer benchmarks compared to my first, simpler attempt. Some of the design decisions, if they look odd, are likely motivated by gains in performance. ### Techniques This implementation uses a few different strategies for performance: * Search byte-by-byte, not char-by-char. * Process batches of 32 bytes at a time to reduce bounds checks. * Mark uncommon paths `#[cold]`. ### Structure The implementation is built in 3 layers: 1. ASCII characters only. 2. ASCII and Unicode characters. 3. `\` escape sequences (and all the above). `identifier_name_handler` starts at the top layer, and is optimized for consuming ASCII as fast as possible. Each "layer" is considered more uncommon than the previous, and dropping down a layer is a de-opt. I'm assuming that 95%+ of JavaScript code does not include either Unicode characters or escapes in identifiers, so the speed of the fast path is prioritised. That said, once a Unicode character is encountered, the next layer does expect to find further Unicode characters, rather than de-opting over and over again. If an identifier starts with a Unicode character, it enters the code straight on the 2nd layer, so is not penalised by going through a `#[cold]` boundary. Lexing Unicode is never going to be as fast as ASCII, but still I felt it was important not to penalise it unnecessarily, so as not to be Anglo-centric. ### ASCII search macro The main ASCII search is implemented as a macro. I found that, for reasons I don't understand, it's significantly faster to have all the code in a single function, even compared to multiple functions marked `#[inline]` or `#[inline(always)]`. The fastest implementation also requires some code to be repeated twice, which is nicer to do with a macro. This macro, and the `ByteMatchTable` types that go with it, are designed to be re-usable. Next step will be to apply them for whitespace and strings, which should be fairly simple. Searching in batches of 32 bytes is also designed to be forward-compatible with SIMD. ### Bye bye `AutoCow` `AutoCow` is removed. Instead, a string-builder is only created if it's needed, when a `\` escape is first encountered. The string builder is also more efficient than `AutoCow` was, as it copies bytes in chunks, rather than 1-by-1. This won't make much difference for identifiers, as escapes are so rare anyway, but this same technique can be used for strings, where they're more common.		2024-02-09 12:01:30 +08:00
..
oxc	Publish crates v0.6.0	2024-02-03 22:35:30 +08:00
oxc_allocator	Publish crates v0.6.0	2024-02-03 22:35:30 +08:00
oxc_ast	feat(ast): enter AstKind::ExportDefaultDeclaration, AstKind::ExportNamedDeclaration and AstKind::ExportAllDeclaration (#2317 )	2024-02-05 17:43:30 +08:00
oxc_cli	feat(cli): add --version (#2182 )	2024-01-26 19:13:17 +08:00
oxc_codegen	feat(codegen): avoid printing comma in ArrayAssignmentTarget if the elements is empty (#2331 )	2024-02-06 22:45:19 +08:00
oxc_diagnostics	Publish crates v0.6.0	2024-02-03 22:35:30 +08:00
oxc_index	Publish crates v0.6.0	2024-02-03 22:35:30 +08:00
oxc_js_regex
oxc_language_server	chore(deps): update cargo (#2191 )	2024-01-29 11:38:47 +08:00
oxc_linter	refactor(linter/config): Use serde::Deserialize for config parsing (#2325 )	2024-02-08 16:48:38 +08:00
oxc_macros	feat(linter): remove the `--timings` feature (#2049 )	2024-01-16 14:21:04 +08:00
oxc_minifier	refactor(ast): fix BigInt memory leak by removing it (#2293 )	2024-02-04 16:47:00 +08:00
oxc_parser	perf(parser): lex identifiers as bytes not chars (#2352 )	2024-02-09 12:01:30 +08:00
oxc_prettier	refactor(prettier): s/nodes/stack (#2347 )	2024-02-08 23:22:44 +08:00
oxc_semantic	feat(semantic): add export binding for ExportDefaultDeclarations in module record (#2329 )	2024-02-06 22:01:16 +08:00
oxc_span	feat(span): fix memory leak by implementing inlineable string for oxc_allocator (#2294 )	2024-02-04 19:28:23 +08:00
oxc_syntax	fix(semantic): remove unnecessary SymbolFlags::Import (#2311 )	2024-02-05 14:16:29 +08:00
oxc_transformer	Publish crates v0.6.0	2024-02-03 22:35:30 +08:00
oxc_wasm	chore(wasm): remove `console_error_panic_hook`	2024-02-02 17:02:01 +08:00