dan/oxc - BGit

dan/oxc

mirror of https://github.com/danbulant/oxc synced 2026-05-25 04:42:10 +00:00

Author	SHA1	Message	Date
Boshen	9479865d9b	feat(napi/parser): expose `preserveParans` option (#2582 ) closes #2576	2024-03-03 15:18:47 +08:00
Arnaud Barré	637cd1dea4	fix(ast): support TSIndexSignature.readonly (#2579 ) [playground](https://oxc-project.github.io/oxc/playground/?code=3YCAAIDKgICAgICAgIC0GwpuZs97oWDqPM4xvCuoRB73mPOSrYb%2BTQEZf3b8RF0G%2B60jF5tYXUE9Me2%2FmMqVEwVy%2FiBIlyIMX6PqBpqsSmIXTJcsRqi4f3%2Bj6ICA)	2024-03-03 14:58:57 +08:00
Arnaud Barré	258b9b1c14	fix(ast): support FormalParameter.override (#2577 ) This [code](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC1gICAgICAgICxG4jI43W9aqTWr3WzyA0TqSOjtB34F78iblvTQruFcqR6BUbbiLtWhj5rEL0NnFkDs4pF3dHiw39X7YCA) can't be represented in the current OXC AST: ```ts class Foo { constructor(override bar: string) {} } ```	2024-03-03 14:41:42 +08:00
Arnaud Barré	78f30bc2db	fix(ast): change TSMappedType.type_annotation from TSTypeAnnotation to TSType (#2571 ) Is ESTree, in that special case, there is no TSTypeAnnotation wrapper: (See `type X` in each) - [oxc playground](https://oxc-project.github.io/oxc/playground/?code=3YCAAIAWgICAgICAgICyHorESipoToAAwTlix58geR2%2Beeu9rZHQZOqK%2B%2BX85ZQ9ldchOoVw2oAm2qi9okF3bJ9o4l78ENP3f%2Bc%2B8cIK6Itp%2B3SIInU72Vk0%2FSqawy1VNV5zTgBr7gOpGtUZsvkc12Yp8MC2shel9fbpgDySpYsWdgDhf3jVlIA%3D) - [astexplorer TSESLint parser](`9fc767f3a5`) - [astexplorer Babel parser](`9a4b02fae1`) --------- Co-authored-by: Boshen <boshenc@gmail.com>	2024-03-03 14:38:45 +08:00
Arnaud Barré	32028eb1c5	fix(parser): TSConditionalType span start (#2570 ) Span start should be the checkType.start (as all my PR, I try to make it work, don't hesitate to close and to it in a better way) [playground](https://oxc-project.github.io/oxc/playground/?code=3YCAAIDFgICAgICAgIC6nsrEgtelB%2FCnUFVHa8WBImPvKP4Ye3U5jBKASUfm8OtkXZASTLptdPlvM%2Fult4BgRbjIq3Yts9L2pZ%2FhVs8hMF%2Bwpqd%2FfdHggA%3D%3D)	2024-03-03 06:25:55 +00:00
Arnaud Barré	670081050f	fix(parser): set span end for TSEnumDeclaration (#2573 ) [playground](https://oxc-project.github.io/oxc/playground/?code=3YCAAIDHgICAgICAgICyHorESipoTXPdvBaE9wxyPnC9nb7Q6xEpIf3AzkuhOU2arZOLF1u08q1G2hs5klxiUYA6%2BBkL693d0iAZC%2BUFyne3yIKPv32k8IA%3D) (Tell me if you prefer that I group this kind of small fixes together)	2024-03-03 13:54:43 +08:00
Arnaud Barré	8a81851bf3	fix(parser): don't parse null as a literal type (#2572 ) See playgrounds: - [oxc](https://oxc-project.github.io/oxc/playground/?code=3YCAAIDbgICAgICAgIC6nsrEgteLFrCnQnPuEizmC%2BDQ8C8bP9fXPj%2B7%2FjjmRZPvpAH3N7PfIPDu7RDOlrl79cHiork8WA08r39%2FqpCAgA%3D%3D) - [Babel](`3a263be55b`)	2024-03-03 13:54:16 +08:00
overlookmotel	78f8c2ce7f	perf(parser): lex JSXText with memchr (#2558 ) Lexing JSXText only requires searching for 2 possible characters (`<` and `{`), so can use `memchr`.	2024-03-01 22:26:53 +08:00
overlookmotel	dd31c6453a	refactor(parser): `byte_search` macro evaluate to matched byte (#2555 ) Change behavior of `byte_search!` macro, to make it easier to understand and use: 1. `handle_match` removed. Macro instead evaluates to the first matching byte. 2. `handle_eof` does not return from enclosing function. 3. Alter syntax to make clear that `continue_if` and `handle_eof` are not closures, so can use `return` statements in them. These changes enabled by #2552.	2024-03-01 21:28:39 +08:00
overlookmotel	c579620701	refactor(parser): small efficiencies in `byte_search` macro usage (#2554 ) A few small efficiencies in usage of `byte_search` macro for lexing comments.	2024-03-01 21:23:34 +08:00
overlookmotel	18cff6aab8	refactor(parser): remove start params for `byte_search` macro arms (#2553 ) Simplify `byte_search` macro a bit more.	2024-03-01 21:15:27 +08:00
overlookmotel	34ecdd58d8	refactor(parser): simplify `byte_search` macro (#2552 ) This PR greatly simplifies the `byte_search!` macro. Mainly removing `cold_branch()` from the "not enough bytes remaining for a batch" branch, which allows refactoring so that `handle_match` and `continue_if` don't need to be repeated twice. Result for performance is inconsistent - a little better on some benchmarks, a little worse on others. But not by significant amounts either way. In my view, the benefit of making the macro simpler outweighs a small speed loss anyway.	2024-03-01 21:07:39 +08:00
overlookmotel	ddccaa1af9	refactor(parser): remove unsafe code in lexer (#2549 ) Same as #2527. Just remove some unnecessary unsafe code, no substantive changes.	2024-02-29 15:00:08 +00:00
overlookmotel	5a13714a18	perf(parser): faster lexing template strings (#2541 ) Speed up lexing template strings. This was the last use of `AutoCow` remaining in the lexer, and it's now removed. Implementation is quite complex, to avoid repeatedly branching on whether an unescaped string is required or not (the way `AutoCow` did). I tried to simplify it down to a single function, but this hurt performance significantly. Benchmarks do not show much movement, but I believe that's because there aren't many template strings in the benchmarks. Where there are template strings, I believe this speeds up lexing them significantly.	2024-02-29 13:28:30 +08:00
overlookmotel	9d7ea6b3f0	refactor(parser): single function for all string slicing (#2540 ) Pure refactor. Move all string-slicing in `lexer::Source` into a single function.	2024-02-29 13:22:55 +08:00
Boshen	3efbbb2e1f	feat(ast): add "abstract" type to `MethodDefinition` and `PropertyDefinition` (#2536 ) closes #2532 ``` pub enum PropertyDefinitionType { PropertyDefinition, TSAbstractPropertyDefinition, } pub enum MethodDefinitionType { MethodDefinition, TSAbstractMethodDefinition, } ```	2024-02-28 17:33:11 +08:00
overlookmotel	24ded3cb15	perf(parser): lex JSX strings with `memchr` (#2528 ) Simplify lexing JSX string attributes. As the search is purely for 1 byte value (the closing quote), and so doesn't require a byte table, use `memchr`. This change doesn't really register on benchmarks, but it's one step closer to removing `AutoCow`, and transitioning all the searches in the lexer to byte-by-byte.	2024-02-28 14:39:23 +08:00
overlookmotel	0ddfc856d2	refactor(parser): remove unsafe code (#2527 ) Remove some unnecessary unsafe code.	2024-02-27 20:28:21 +08:00
Boshen	46e779194a	chore: fix clippy warnings (#2519 )	2024-02-26 23:55:18 +08:00
Boshen	be6b8b7ce6	[BREAKING CHANGE] Change `Atom` to `Atom<'a>` to make it safe (#2497 ) Part of #2295 This PR splits the `Atom` type into `Atom<'a>` and `CompactString`. All the AST node strings now use `Atom<'a>` instead of `Atom` to signify it belongs to the arena. It is now up to the user to select which form of the string to use. This PR essentially removes the really unsafe code `93742f89e9/crates/oxc_span/src/atom.rs (L98-L107)` which can lead to ![image](https://github.com/oxc-project/oxc/assets/1430279/8c513c4f-19b0-4b63-b61c-e07c187c95b5)	2024-02-26 19:34:40 +08:00
Dunqing	70295a5552	feat(ast): update arrow_expression to arrow_function_expression (#2496 )	2024-02-25 14:39:34 +00:00
Boshen	7a796c4b5f	feat(ast): add `TSModuleDeclaration.kind` (#2487 ) closes #2395	2024-02-24 17:09:31 +08:00
Boshen	5212f7b51e	fix(parser): fix missing end span from `TSTypeAliasDeclaration` (#2485 ) closes #2483	2024-02-24 16:51:00 +08:00
Boshen	1634586934	refactor(ast): s/TSTypeOperatorType/TSTypeOperator to align with estree	2024-02-21 22:25:04 +08:00
Boshen	9087f71765	refactor(ast): s/TSThisKeyword/TSThisType to align with estree	2024-02-21 22:25:04 +08:00
Boshen	d08abc638e	refactor(ast): s/NumberLiteral/NumericLiteral to align with estree	2024-02-21 21:41:08 +08:00
Boshen	35608c8eb1	chore: fix all docs	2024-02-21 18:06:37 +08:00
Andrew McClenaghan	6b3b260dcc	feat(Codegen): Improve codegen (#2460 ) This gets all the new TS types working to the same level TS output was before and fixes a bunch of other codegen --------- Co-authored-by: Boshen <boshenc@gmail.com>	2024-02-21 14:41:57 +08:00
Dunqing	197fa16613	feat(semantic): add check for duplicate class elements in checker (#2455 ) 1. Remove the check implementation of the parser 2. Implement it to semantic checker 3. Support typescript's check for duplicate class elements Support checking for duplicate class elements in semantic checker is easier to support typescript checking rules.	2024-02-21 14:10:19 +08:00
overlookmotel	a78303d5a6	refactor(parser): `continue_if` in `byte_search` macro not unsafe (#2440 ) #2439 made using `continue_if` in `byte_search!` macro safe, as it no longer continues the main loop after a match, so no danger of reading out of bounds if `continue_if` code fast-forwards the current position. This follow-on PR removes the unsafe blocks, and uses that fast-forward ability in a couple of places.	2024-02-20 10:45:31 +08:00
overlookmotel	a5a3c695f7	refactor(parser): correct comment (#2441 ) Just correcting a typo in a comment, and moving comment to a better place.	2024-02-20 10:43:12 +08:00
overlookmotel	996a9d27eb	perf(parser): `byte_search` macro always unroll main loop (#2439 ) Refactor `byte_search!` macro to move logic out of the main loop. This ensures the compiler unrolls the loop. This speeds up lexing single-line comments by 20%-25% on the benchmarks which contain enough comments for the change to register. Presumably the loop wasn't unrolled previously. The code required to do this is a little odd. It adds an extra `loop {}` which always exits on the first turn (so not really a useful loop), but is required to be able to use `break` to exit that "loop", making 2 different paths for (1) matching byte found and (2) `for` loop completed without finding any match. This is only way I could find to produce this behavior without using a macro. Is there a more "normal" way to get the same logic?	2024-02-20 10:39:52 +08:00
Dunqing	60db720fa6	feat(parser): parse import attributes in TSImportType (#2436 ) close: #2394 `64d2eeea7b/src/compiler/types.ts (L2177-L2185)` The corresponding test cases were skipped, so I manually added some cases to misc `f5db48237f/tasks/coverage/src/typescript.rs (L118-L121)`	2024-02-19 12:26:42 +08:00
Dunqing	3cbe786b18	refactor(ast): update TSImportType parameter to argument (#2429 ) In typescript it's named argument, so we should keep it consistent `64d2eeea7b/src/compiler/types.ts (L2180)`	2024-02-19 10:29:24 +08:00
overlookmotel	90f9266d00	chore(deps): update `bumpalo` crate (#2417 ) Latest version of `bumpalo` includes a couple of performance fixes for `String` (e.g. https://github.com/fitzgen/bumpalo/pull/229) which may help the parser a little.	2024-02-18 11:49:31 +08:00
overlookmotel	cc2ddbee77	refactor(parser): catch all illegal UTF-8 bytes (#2415 ) Catch all illegal UTF-8 bytes with the `UER` byte handler. From https://datatracker.ietf.org/doc/html/rfc3629: > The octet values C0, C1, F5 to FF never appear. This change should make no difference at all, as a valid `&str` may not contain any of these byte values anyway. But it's possible if user has e.g. created the string with `str::from_utf8_unchecked` and not obeyed the safety contraints. This will at least contain the damage if that's happened, and panic rather than lead to UB. And since we're already catching other error conditions, may as well catch them all.	2024-02-16 20:49:01 +08:00
Dunqing	73e116e8a1	fix(parser): incorrect parsing of class accessor property name (#2386 )	2024-02-11 22:57:13 +08:00
overlookmotel	383f5b3081	perf(parser): consume multi-line comments faster (#2377 ) Consume multi-line comments faster. * Initially search for `/`, `\r`, `\n` or `0xE2` (first byte of irregular line breaks). Once a line break is found, switch to faster search which only looks for `*/`, as it's not relevant whether there are more line breaks or not. Using `memchr` for the 2nd simpler search, as it's efficient for a search with only one "needle". Initializing `memchr::memmem::Finder` is fairly expensive, and tried numerous ways to handle it. This is most performant way I could find. Any ideas how to avoid re-creating it for each Lexer pass? (it can't be a `static` as `Finder::new` is not a const function, and `lazy_static!` is too costly)	2024-02-11 12:43:14 +08:00
Boshen	ef336cb66b	feat(parser): recover from `async x [newline] => x` (#2375 ) ```javascript async x => x ``` Babel recovers and displays "No line break is allowed before '=>'	2024-02-10 11:19:08 +08:00
overlookmotel	c4fa738312	perf(parser): consume single-line comments faster (#2374 ) Use `byte_search!` macro to consume single-line comments. Would be a lot simpler if didn't have to deal with irregular line breaks. Damn you Unicode!	2024-02-10 11:02:30 +08:00
overlookmotel	b29719d2df	refactor(parser): add methods to `Source` + `SourcePosition` (#2373 ) Preparatory step for #2374.	2024-02-10 10:57:33 +08:00
overlookmotel	79ae9a9b2c	refactor(parser): extend `byte_search` macro (#2372 ) Preparatory step for #2374.	2024-02-10 10:52:59 +08:00
overlookmotel	0be8397c77	perf(parser): optimize lexing strings (#2366 ) Optimize lexing strings a bit.	2024-02-09 23:52:45 +08:00
overlookmotel	c0d1d6b08a	perf(parser): lex strings as bytes (#2357 ) Lex string literals as bytes, using same techniques as for identifiers. Handling escapes could be optimized a bit more, and maybe I'll return to that, but as escapes are fairly rare, it wouldn't be the biggest gain.	2024-02-09 21:00:27 +08:00
overlookmotel	2f6cf73d51	fix(parser): remove erroneous debug assertion (#2356 ) This was a bit of a whoopsie in last batch of PRs. This assertion shouldn't be there, because all reads are now via `source.position().read()`, so this assertion says "you can only read some byte values". Only reason it didn't blow up conformance tests is that they run in release mode. Sorry. Please merge soon as you can and cover my shame!	2024-02-09 20:55:12 +08:00
overlookmotel	8376f15b9a	perf(parser): eat whitespace after line break (#2353 ) Uses the `byte_search!` macro introduced in #2352 to consume whitespace after a line break.	2024-02-09 12:02:51 +08:00
overlookmotel	d3a59f27f7	perf(parser): lex identifiers as bytes not chars (#2352 ) This PR re-implements lexing identifiers with a fast path for the most common case - identifiers which are pure ASCII characters, using the new `Source` / `SourcePosition` APIs. Lexing identifiers is a hot path, and accounts for the majority of the time the Lexer spends. The performance bump from this change is (if I do say so myself!) quite decent. I've spent a lot of time tuning the implementation, which gained a further 10-15% on the Lexer benchmarks compared to my first, simpler attempt. Some of the design decisions, if they look odd, are likely motivated by gains in performance. ### Techniques This implementation uses a few different strategies for performance: * Search byte-by-byte, not char-by-char. * Process batches of 32 bytes at a time to reduce bounds checks. * Mark uncommon paths `#[cold]`. ### Structure The implementation is built in 3 layers: 1. ASCII characters only. 2. ASCII and Unicode characters. 3. `\` escape sequences (and all the above). `identifier_name_handler` starts at the top layer, and is optimized for consuming ASCII as fast as possible. Each "layer" is considered more uncommon than the previous, and dropping down a layer is a de-opt. I'm assuming that 95%+ of JavaScript code does not include either Unicode characters or escapes in identifiers, so the speed of the fast path is prioritised. That said, once a Unicode character is encountered, the next layer does expect to find further Unicode characters, rather than de-opting over and over again. If an identifier starts with a Unicode character, it enters the code straight on the 2nd layer, so is not penalised by going through a `#[cold]` boundary. Lexing Unicode is never going to be as fast as ASCII, but still I felt it was important not to penalise it unnecessarily, so as not to be Anglo-centric. ### ASCII search macro The main ASCII search is implemented as a macro. I found that, for reasons I don't understand, it's significantly faster to have all the code in a single function, even compared to multiple functions marked `#[inline]` or `#[inline(always)]`. The fastest implementation also requires some code to be repeated twice, which is nicer to do with a macro. This macro, and the `ByteMatchTable` types that go with it, are designed to be re-usable. Next step will be to apply them for whitespace and strings, which should be fairly simple. Searching in batches of 32 bytes is also designed to be forward-compatible with SIMD. ### Bye bye `AutoCow` `AutoCow` is removed. Instead, a string-builder is only created if it's needed, when a `\` escape is first encountered. The string builder is also more efficient than `AutoCow` was, as it copies bytes in chunks, rather than 1-by-1. This won't make much difference for identifiers, as escapes are so rare anyway, but this same technique can be used for strings, where they're more common.	2024-02-09 12:01:30 +08:00
overlookmotel	6910e4f71b	refactor(parser): macro for ASCII identifier byte handlers (#2351 ) Add a macro for ASCII identifier byte handlers. This is a preparatory step towards #2352.	2024-02-09 11:55:35 +08:00
overlookmotel	6f597b18bc	refactor(parser): all pointer manipulation through `SourcePosition` (#2350 ) A safer and faster interface for reading source text using pointers than `*ptr`.	2024-02-09 10:26:51 +08:00
overlookmotel	185b3dbcc3	refactor(parser): fix outdated comment (#2344 ) Just fixes an outdated comment.	2024-02-08 19:47:33 +08:00

1 2 3 4 5 ...

288 commits