dan/oxc - BGit

dan/oxc

mirror of https://github.com/danbulant/oxc synced 2026-05-25 12:51:57 +00:00

Author	SHA1	Message	Date
overlookmotel	71898ffdd5	refactor(parser): move source length check into lexer (#2206 ) This change makes little difference in itself, but moving the check into the lexer will allow some optimizations in lexer using unsafe code which depend on this invariant.	2024-01-29 22:29:02 +08:00
overlookmotel	e123be0a00	fix(parser): correct MAX_LEN for 32-bit systems (#2204 ) Maximum length of source parser can accept is limited on 32-bit systems to `isize::MAX` (i.e. `i32::MAX` not `u32::MAX`) because Rust [limits the size of allocations](https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.from_size_align) to `isize::MAX`. This PR takes that constraint into account when calculating `Parser::MAX_LEN`. It also speeds up the `overlong_source` test so it runs in under 500ms (previously it took ~4 secs on a M1 Macbook Pro).	2024-01-29 21:45:45 +08:00
Dunqing	ea8cc98c34	fix(ast): AcessorProperty is missing decorators (#2176 )	2024-01-26 15:43:05 +08:00
overlookmotel	bc7ea0bedb	refactor(parser): make `is_identifier` methods consistent	2024-01-23 11:05:17 +08:00
Dunqing	766ca63aa0	refactor(ast): rename RestElement to BindingRestElement (#2116 ) close: #2115	2024-01-22 14:28:35 +08:00
overlookmotel	36c718ee82	feat(tasks): benchmarks for lexer (#2101 ) This PR adds benchmarks for the lexer. I'm doing some work on optimizing the lexer and I thought it'd be useful to see the effects of changes in isolation, separate from the parser. These benchmarks may not be ideal to keep long-term, but for now it'd be useful. In order to do so, it's necessary for `oxc_parser` crate to expose the lexer, but have done that without adding it to the docs, and using an alias `__lexer`.	2024-01-21 14:32:50 +00:00
Boshen	59e29f286a	chore(parser): explain the reason for omitting "}" and ">" in jsx text lexer (#2097 ) closes #2094	2024-01-20 23:03:44 +08:00
Boshen	3f2b48f1a9	refactor(parser): remove useless string builder from jsx text lexer (#2096 ) relates #2094	2024-01-20 22:34:57 +08:00
Boshen	2f5afff9bd	fix(parser): fix crash on TSTemplateLiteralType in function return position (#2089 ) ``` interface Helpers { inspect(): `~~~~\n${string}\n~~~~`; } ```	2024-01-19 23:14:05 +08:00
overlookmotel	0e32618664	refactor(parser): combine token kinds for skipped tokens (#2072 ) Small optimization to the lexer. Whitespace, line breaks, and comments are all skipped by `read_next_token()`. At present there's a different `Kind` for each, and `read_next_token()` decides whether to skip with `matches!(kind, Kind::WhiteSpace \| Kind::NewLine \| Kind::Comment \| Kind::MultiLineComment)`. These `Kind`s are used for no other purpose, so there seems little reason to differentiate them. This PR combines them all into `Kind::Skip`, so then the test of whether to skip is reduced to `kind == Kind::Skip`. Only produces ~0.3% performance bump on parser benchmarks. But, why not?...	2024-01-18 21:14:12 +08:00
overlookmotel	8d5f5b8a49	refactor(parser): macro for ASCII byte handlers (#2066 ) As discussed on #2046, it wasn't ideal to have `unsafe { lexer.consume_ascii_char() }` in every byte handler. It also wasn't great to have a safe function `consume_ascii_char()` which could cause UB if called incorrectly (so wasn't really safe at all). This PR achieves the same objective of #2046, but using a macro to define byte handlers for ASCII chars, which builds in the assertion that next char is guaranteed to be ASCII. Before #2046: ```rs const SPS: ByteHandler = \|lexer\| { lexer.consume_char(); Kind::WhiteSpace }; ``` After this PR: ```rs ascii_byte_handler!(SPS(lexer) { lexer.consume_char(); Kind::WhiteSpace }); ``` i.e. The body of the handlers are unchanged from how they were before https://github.com/oxc-project/oxc/pull/2046. This expands to: ```rs const SPS: ByteHandler = \|lexer\| { unsafe { let s = lexer.current.chars.as_str(); assert_unchecked!(!s.is_empty()); assert_unchecked!(s.as_bytes()[0] < 128); } lexer.consume_char(); Kind::WhiteSpace }; ``` But due to the assertions the macro inserts, `consume_char()` is now optimized for ASCII characters, and reduces to a single instruction. So the `consume_ascii_char()` function introduced by #2046 is unnecessary, and can be removed again. The "boundary of unsafe" is moved to a new function `handle_byte()` which `read_next_token()` calls. `read_next_token()` is responsible for upholding the safety invariants, which include ensuring that `ascii_byte_handler!()` macro is not being misused (that last part is strictly speaking a bit of a cheat, but...). I am not a fan of macros, as they're not great for readability. But in this case I don't think it's too bad, because: 1. The macro is well-documented. 2. It's not too clever (only one syntax is accepted). 3. It's used repetitively in a clear pattern, and once you've understood one, you understand them all. What do you think? Does this strike a reasonable balance between readability and safety?	2024-01-17 15:29:15 +08:00
overlookmotel	408acb90e6	refactor(parser): lexer handle unicode without branch (#2039 ) As suggested by @strager in https://github.com/oxc-project/oxc/pull/2025#pullrequestreview-1820273832, this PR adds `BYTE_HANDLERS` for first bytes of unicode characters. This removes a branch from `read_next_token()` and produces a +1% speed-up on parser benchmarks.	2024-01-16 13:14:22 +08:00
overlookmotel	66a7a68f9f	perf(parser): lexer byte handlers consume ASCII chars faster (#2046 ) In the lexer, most `BYTE_HANDLER`s immediately consume the current char with `lexer.consume_char()`. Byte handlers are only called if there's a certain value (or range of values) for the next char. This is their entire purpose. So in all cases we know for sure that we're not at EOF, and that the next char is a single-byte ASCII character. The compiler, however, doesn't seem to be able to "see through" the `BYTE_HANDLERS[byte](self)` call and understand these invariants. So it produces very verbose ASM for `lexer.consume_char()`. This PR replaces `lexer.consume_char()` in the byte handlers with an unsafe `lexer.consume_ascii_char()` which skips on to next char with a single `inc` instruction. The difference in codegen can be seen here: https://godbolt.org/z/1ha3cr9W5 (compare the 2 x `core::ops::function::FnOnce::call_once` handlers). Downside is that this does introduce a lot of unsafe blocks, but in my opinion they're all pretty trivial to validate. --------- Co-authored-by: Boshen <boshenc@gmail.com>	2024-01-16 12:31:45 +08:00
Boshen	09c7570560	ci: use miri to detect memory leak for the parser (#2037 ) We'll merge this and then eventually turn it on as a nightly check, it's a manual run for now.	2024-01-15 15:11:02 +00:00
overlookmotel	b4d76f0b0d	refactor(parser): remove noop code (#2028 ) This PR removes some code from the lexer which doesn't do anything.	2024-01-14 23:48:35 +08:00
overlookmotel	60a927d8f5	perf(parser): lexer match byte not char (#2025 ) 2 related changes to lexer's `read_next_token()`: 1. Hint to branch predictor that unicode identifiers and non-standard whitespace are rare by marking that branch `#[cold]`. 2. The branch is on whether next character is ASCII or not. This check only requires reading 1 byte, as ASCII characters are always single byte in UTF8. So only do the work of getting a `char` in the cold path, once it's established that character is not ASCII and this work is required.	2024-01-14 18:50:11 +08:00
Boshen	1886a5b838	perf(parser): reduce `Token` size from 16 to 12 bytes (#2010 ) I also had to change how the string for private identifiers are built, otherwise they will always be allocated.	2024-01-13 12:42:39 +08:00
overlookmotel	6996948825	refactor(parser): remove extraneous code from regex parsing (#2008 ) This PR removes some code in parsing regexp flags which is extraneous: ```rs if !ch.is_ascii_lowercase() { self.error(diagnostics::RegExpFlag(ch, self.current_offset())); continue; } ``` Which is followed by: ```rs let flag = if let Ok(flag) = RegExpFlags::try_from(ch) { flag } else { self.error(diagnostics::RegExpFlag(ch, self.current_offset())); continue; }; ``` `!ch.is_ascii_lowercase()` is equivalent to `ch < 'a' \|\| ch > 'z'`. The compiler implements `RegExpFlags::try_from(ch)` as `ch < 'd' \|\| ch > 'y'` and then a jump table. So `ch.is_ascii_lowercase()` does nothing that `RegExpFlags::try_from(ch)` doesn't do already. https://godbolt.org/z/51GPPY9nx (this PR built on top of #2007 for ease)	2024-01-13 02:34:05 +00:00
overlookmotel	712e99cf9b	fix(parser): restore regex flag parsing (#2007 ) As discussed in https://github.com/oxc-project/oxc/pull/1999#issuecomment-1888916383, this PR restores some of regex parsing behavior to as it was prior to #1926.	2024-01-13 03:19:33 +08:00
Boshen	aa91fde1d9	refactor(parser): only allocate for escaped template strings (#2005 )	2024-01-12 18:56:36 +08:00
Boshen	38f86b0cac	refactor(parser): remove string builder from number parsing (#2002 ) The builder was used to build an allocated string for numbers with underscores, this is no longer required because it is now allocated on demand. `0d77e1e788/crates/oxc_parser/src/lexer/number.rs (L32)`	2024-01-12 17:01:51 +08:00
overlookmotel	c7316856db	refactor(parser): reduce work parsing regexps (#1999 ) #1926 produced a small performance regression because when parsing a regexp, some work is repeated.	2024-01-12 11:36:30 +08:00
Boshen	4706765d2a	refactor(parser): reduce `Token` size from 32 to 16 bytes (#1962 ) Part of #1880 `Token` size is reduced from 32 to 16 bytes by changing the previous token value `Option<&'a str>` to a u32 index handle. It would be nice if this handle is eliminated entirely because the normal case for a string is always `&source_text[token.span.start.token.span.end]` Unfortunately, JavaScript allows escaped characters to appear in identifiers, strings and templates. These strings need to be unescaped for equality checks, i.e. `"\a" === "a"`. This leads us to adding a `escaped_strings[]` vec for storing these unescaped and allocated strings. Performance regression for adding this vec should be minimal because escaped strings are rare. Background Reading: * https://floooh.github.io/2018/06/17/handles-vs-pointers.html	2024-01-09 15:17:02 +08:00
Boshen	6e0bd52af1	refactor(parser): remove TokenValue::Number from Token (#1945 ) This PR is part of #1880. Token size is reduced from 40 to 32 bytes.	2024-01-08 16:29:03 +08:00
Dunqing	b50c5ec623	fix(parser): unexpected ts type annotation in get/set (#1942 ) fix: https://github.com/oxc-project/oxc/issues/1939	2024-01-08 15:07:43 +08:00
Boshen	08438e04ba	refactor(parser): remove TokenValue::RegExp from `Token` (#1926 ) This PR is part of #1880. `Token` size is reduced from 48 to 40 bytes. To reconstruct the regex pattern and flags within the parser , the regex string is re-parsed from the end by reading all valid flags. In order to make things work nicely, the lexer will no longer recover from a invalid regex.	2024-01-08 13:48:52 +08:00
Boshen	7eb2573178	refactor(parser): parse BigInt lazily (#1924 ) This PR partially fixes #1803 and is part of #1880. BigInt is removed from the `Token` value, so that the token size can be reduced once we removed all the variants. `Token` is now also `Copy`, which removes all the `clone` and `drop` calls. This yields 5% performance improvement for the parser.	2024-01-08 12:37:20 +08:00
overlookmotel	eb2966c512	fix(parser): fix incorrectly identified directives (#1885 ) Parser incorrectly identifies string literals as directives if they follow after `import`s, `export`s, or decorators. In all of these cases, `'use strict'` produces a directive in the AST, where it should be parsed as an `ExpressionStatement` containing a `StringLiteral`: ```js import x from 'foo'; 'use strict'; ``` ```js export {x}; 'use strict'; ``` ```js @foo 'use strict'; ``` [Playground](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC0gICAgICAgIC0G8rnONK89ITJ3zrK%2FUP7OmSZPgHQzStr3yMtwFTU%2BD1WPt09JgqZJLoYooydbGsM5vGcf34BnIA%3D) This PR should fix that. I'm not sure about the decorator case, though. I assume it's not a directive. But is prefixing a string literal with a decorator even legal syntax anyway? And a side nit: If I'm reading it right, I don't think the `continue` statement in the decorator arm of the match does anything. Do I have that right? Last question: Where does one go about putting a test? I guess these silly cases aren't covered by Babel etc's tests. --------- Co-authored-by: Boshen <boshenc@gmail.com>	2024-01-04 13:39:15 +00:00
Dunqing	c3090c2c70	fix(parser): terminate parsing if an EmptyParenthesizedExpression error occurs (#1874 ) close: https://github.com/oxc-project/oxc/issues/1870#issue-2061901976	2024-01-03 11:34:14 +08:00
overlookmotel	62bc8c5cea	fix(parser): error on source larger than 4 GiB (#1860 ) `Token` and `Span` both represent `start` and `end` as `u32`. This limits size of source which can be parsed to `u32::MAX`. `19577709db/crates/oxc_span/src/span.rs (L14-L20)` However, this constraint is currently not enforced. In a release build, code will not panic on arithmetic overflow, so `start`/`end` could wrap around back to zero if source is 4 GiB or more. That'd produce nonsense spans. But worse, the lexer relies in some places on `self.current.token.start` being correct, so if the value wrapped around, possibly it'd keep rewinding to the start of the source and lexing it again, causing an infinite loop. In worst case, if for some reason an application's public API used OXC's parser with user-supplied source code (parser-as-a-service!), this could be exploited for denial of service. This PR adds an assertion to catch this at the start of parsing instead. This does add an extra instruction, but I imagine the effect will be negligible compared to the work required to parse the code.	2024-01-02 11:05:28 +08:00
Deivid Almeida	c1cfd1759e	feat(linter): no-irregular-whitespace rule (#1835 ) Parser, trivias and trivias_builder were edited to get all whitespaces. Now Trivias struct store comments and whitespaces Vec. After that, i will implement the no-irregular-whitespace rule. P.S.: There isn't a way to implement this feature without lose a little bit of performance, comparing with my last PR #1819 to minimax this trouble instead of store the irregular whitespace as Span it was stored as u32, i removed a map iterator and removed too a unused function. If you have a suggestion about it pls give me a feedback.	2023-12-31 12:05:38 +08:00
IWANABETHATGUY	4bbc977971	chore: upgrade rustc toolchain to stable 1.75.0 (#1853 ) ref: https://blog.rust-lang.org/2023/12/28/Rust-1.75.0.html	2023-12-29 12:20:51 +08:00
overlookmotel	19577709db	Remove redundant code from lexer (#1850 ) Just removes a couple of lines of redundant code from the lexer. A note on the 2nd one: ```rs let mut builder = AutoCow::new(lexer); let c = lexer.consume_char(); builder.push_matching(c); ``` `push_matching()` is a no-op unless `force_allocation_without_current_ascii_char()` has already been called. Here the `AutoCow` has just been freshly created, so we know it hasn't.	2023-12-29 10:07:21 +08:00
overlookmotel	1feec95a94	fix(parser) fix typo in `expecting_directives` variable name (#1801 ) Renamves `expecting_diretives ` to `expecting_directives` to fix spelling	2023-12-24 16:51:02 +00:00
magic-akari	5b2696b711	refactor(parser): report `this` parameter error (#1788 ) - follow up: #1728	2023-12-23 22:09:14 +08:00
Boshen	2b4d1bf142	fix(parser): await in jsx expression closes #1740	2023-12-19 20:23:16 +08:00
magic-akari	a2858ed452	refactor(ast): introduce `ThisParameter` (#1728 ) Most TypeScript types can be eliminated during the code generation phase by not printing the corresponding AST nodes. The changes in this PR enable applying a similar technique to the `this` parameter.	2023-12-19 13:20:33 +08:00
Boshen	19e77b0af3	fix(parser): false postive for "Missing initializer in const declaration" in declare + namespace (#1724 ) closes #1723	2023-12-18 17:03:42 +08:00
Boshen	8edcab82f2	chore(lexer): document the `accessor` keyword	2023-12-14 12:55:55 +08:00
Boshen	1554f7c0d2	feat(parsr): parse `let.a = 1` with error recovery (#1587 )	2023-11-29 23:21:39 +08:00
Boshen	9842be4461	refactor(parser): remove duplicated code	2023-11-29 18:23:32 +08:00
Boshen	6670d94708	chore(rust): remove unnecessary clippy::non_upper_case_globals (#1557 )	2023-11-27 14:31:38 +08:00
magic-akari	9ff0ffcc6f	feat(ast): implement new proposal-import-attributes (#1476 ) - [Import Attributes](https://tc39.es/proposal-import-attributes)	2023-11-25 15:56:09 +08:00
Boshen	567c6ed757	feat(prettier): print directives (#1497 )	2023-11-22 19:39:25 +08:00
JonaAnders	08164b0e18	refactor(parser) Updated comments mentioning the ecma specification section 12.x (#1496 ) The ECMA specification seems to added the "Tokens" section to the specification as 12.6. This pushed all the other sections down, resulting in e.g. former 12.6 now being 12.7. Comments in the parser mention this part of the specification. All the mentions of section 12.6+ therefor are outdated now. This pull request tries to fix that by updating all the comments.	2023-11-22 19:29:04 +08:00
Boshen	07b010912a	feat(parser): add `preserve_parens` option (default: true) (#1474 ) closes #1461	2023-11-21 11:16:30 +08:00
magic-akari	a7e0706dbc	fix(parser): correct `import_kind` of `TSImportEqualsDeclaration` (#1449 )	2023-11-20 16:57:38 +08:00
Boshen	0218ae8641	feat(prettier): print leading comments with newlines (#1434 )	2023-11-19 22:46:55 +08:00
Jon Surrell	cb804d3cd2	Add base to AST BigintLiteral (#1416 )	2023-11-19 11:11:19 +08:00
magic-akari	445352991f	fix(parser): Fix type import (#1291 ) - fix: #1288 - fix: #1289	2023-11-14 15:17:58 +08:00

1 2 3 4 5

220 commits