dan/oxc - BGit

dan/oxc

mirror of https://github.com/danbulant/oxc synced 2026-05-25 04:42:10 +00:00

Author	SHA1	Message	Date
Boshen	28eeee0f71	fix(parser): fix asi error diagnostic pointing at invalid text causing crash (#4163 )	2024-07-10 14:45:10 +00:00
rzvxa	b936162093	refactor(ast/ast_builder)!: shorter allocator utility method names. (#4122 ) This PR serves two purposes, First off it would lower the amount of characters we have to type in for a simple operation such as wrapping an expression in a vector. Secondly, it would follow the generated names more closely since nowhere else in the builder we do have `new_xxx`, We always say `xxx` since a builder always constructs something. ``` new_vec -> vec new_vec_single -> vec1* new_vec_from_iter -> vec_from_iter new_vec_with_capacity -> vec_with_capacity new_str -> str new_atom -> atom ``` `*` This one is the main motivation behind this PR, It saves 10 characters!	2024-07-09 12:16:38 +00:00
Boshen	243c9f35b0	refactor(parser): use function instead of trait to parse list with rest element (#4028 ) closes #3887	2024-07-02 13:43:14 +00:00
Boshen	1dacb1fc5b	refactor(parser): use function instead of trait to parse delimited lists (#4014 ) relates #3887 The rest of the list parsing trait implementations involves ... parsing `rest`, which I'll refactor in another PR.	2024-07-02 14:47:56 +08:00
Boshen	d0eac46fc8	refactor(parser): use function instead of trait to parse normal lists (#4003 ) To reduce boilerplate and code noise. relates #3887	2024-07-01 15:57:36 +00:00
Boshen	a471e62e2d	refactor(parser): clean up `try_parse` (#3925 )	2024-06-26 11:18:02 +00:00
Boshen	3db2553dc2	refactor(parser): improve parsing of TypeScript type arguments (#3923 )	2024-06-26 07:16:18 +00:00
Boshen	4bf405ddfc	perf(parser): add a few more inline hints to cursor functions (#3894 )	2024-06-25 06:00:46 +00:00
Boshen	1e802c71d5	refactor(parser): clean up `ParserState` (#3345 )	2024-05-19 01:30:16 +08:00
Boshen	b27a905958	refactor(parser): simplify `Context` passing (#3266 )	2024-05-14 12:22:27 +08:00
Boshen	2064ae9e0a	refactor(parser,diagnostic): one diagnostic struct to eliminate monomorphization of generic types (#3214 ) part of #3213 We should only have one diagnostic struct instead 353 copies of them, so we don't end up choking LLVM with 50k lines of the same code due to monomorphization. If the proposed approach is good, then I'll start writing a codemod to turn all the existing structs to plain functions. --- Background: Using `--timings`, we see `oxc_linter` is slow on codegen (the purple part). ![image](https://github.com/zkat/miette/assets/1430279/c1df4f7d-90ef-4c0f-9956-2ec3194db7ca) The crate currently contains 353 miette errors. [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) displays ``` cargo llvm-lines -p oxc_linter --lib --release Lines Copies Function name ----- ------ ------------- 830350 33438 (TOTAL) 29252 (3.5%, 3.5%) 808 (2.4%, 2.4%) <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop 23298 (2.8%, 6.3%) 353 (1.1%, 3.5%) miette::eyreish::error::object_downcast 19062 (2.3%, 8.6%) 706 (2.1%, 5.6%) core::error::Error::type_id 12610 (1.5%, 10.1%) 65 (0.2%, 5.8%) alloc::raw_vec::RawVec<T,A>::grow_amortized 12002 (1.4%, 11.6%) 706 (2.1%, 7.9%) miette::eyreish::ptr::Own<T>::boxed 9215 (1.1%, 12.7%) 115 (0.3%, 8.2%) core::iter::traits::iterator::Iterator::try_fold 9150 (1.1%, 13.8%) 1 (0.0%, 8.2%) oxc_linter::rules::RuleEnum::read_json 8825 (1.1%, 14.9%) 353 (1.1%, 9.3%) <miette::eyreish::error::ErrorImpl<E> as core::error::Error>::source 8822 (1.1%, 15.9%) 353 (1.1%, 10.3%) miette::eyreish::error::<impl miette::eyreish::Report>::construct 8119 (1.0%, 16.9%) 353 (1.1%, 11.4%) miette::eyreish::error::object_ref 8119 (1.0%, 17.9%) 353 (1.1%, 12.5%) miette::eyreish::error::object_ref_stderr 7413 (0.9%, 18.8%) 353 (1.1%, 13.5%) <miette::eyreish::error::ErrorImpl<E> as core::fmt::Display>::fmt 7413 (0.9%, 19.7%) 353 (1.1%, 14.6%) miette::eyreish::ptr::Own<T>::new 6669 (0.8%, 20.5%) 39 (0.1%, 14.7%) alloc::raw_vec::RawVec<T,A>::try_allocate_in 6173 (0.7%, 21.2%) 353 (1.1%, 15.7%) miette::eyreish::error::<impl miette::eyreish::Report>::from_std 6027 (0.7%, 21.9%) 70 (0.2%, 16.0%) <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter 6001 (0.7%, 22.7%) 353 (1.1%, 17.0%) miette::eyreish::error::object_drop 6001 (0.7%, 23.4%) 353 (1.1%, 18.1%) miette::eyreish::error::object_drop_front 5648 (0.7%, 24.1%) 353 (1.1%, 19.1%) <miette::eyreish::error::ErrorImpl<E> as core::fmt::Debug>::fmt ``` It's totalling more than 50k llvm lines, and is putting pressure on rustc codegen (the purple part on `oxc_linter` in the image above. --- It's pretty obvious by looking at https://github.com/zkat/miette/blob/main/src/eyreish/error.rs, the generics can expand out to lots of code.	2024-05-11 04:56:22 +00:00
Dunqing	0ba7778e5e	fix(parser): correctly parse cls.fn<C> = x (#3208 ) close: #3206	2024-05-09 10:23:45 +08:00
Boshen	504698ab4a	chore: guard against unsafe code as much as possible.	2024-04-03 19:35:07 +08:00
Boshen	cda9c93436	fix(parser): improve lexing of jsx identifier to fix duplicated comments after jsx name (#2687 )	2024-03-12 15:51:51 +08:00
Boshen	8a73d18fcf	chore(parser): make sure all span.end >= span.start (#2681 ) closes #2679	2024-03-11 19:49:51 +08:00
Boshen	bf42158ad7	perf(parser): inline `end_span` and `parse_identifier_kind` which are on the hot path (#2612 )	2024-03-05 15:39:53 +08:00
overlookmotel	f3470163d9	refactor(parser): make `Source::set_position` safe (#2341 ) Make `Source::set_position` a safe function. This addresses a shortcoming of #2288. Instead of requiring caller of `Source::set_position` to guarantee that the `SourcePosition` is created from this `Source`, the preceding PRs enforce this guarantee at the type level. `Source::set_position` is going to be a central API for transitioning the lexer to processing the source as bytes, rather than `char`s (and the anticipated speed-ups that will produce). So making this method safe will remove the need for a lot of unsafe code blocks, and boilerplate comments promising "SAFETY: There's only one `Source`", when to the developer, this is blindingly obvious anyway. So, while splitting the parser into `Parser` and `ParserImpl` (#2339) is an annoying change to have to make, I believe the benefit of this PR justifies it.	2024-02-08 14:56:26 +08:00
overlookmotel	0bdecb5043	refactor(parser): wrapper type for parser (#2339 ) Split parser into public interface `Parser` and internal implementation `ParserImpl`. This involves no changes to public API. This change is a bit annoying, but justification is that it's required for #2341, which I believe to be very worthwhile. The `ParserOptions` type also makes it a bit clearer what the defaults for `allow_return_outside_function` and `preserve_parens` are. It came as a surprise to me that `preserve_parens` defaults to `true`, and this refactor makes that a bit more obvious when reading the code. All the real changes are in [oxc_parser/src/lib.rs](https://github.com/oxc-project/oxc/pull/2339/files#diff-8e59dfd35fc50b6ac9a9ccd991e25c8b5d30826e006d565a2e01f3d15dc5f7cb). The rest of the diff is basically replacing `Parser` with `ParserImpl` everywhere else.	2024-02-07 23:22:08 +08:00
overlookmotel	cdef41d552	refactor(parser): lexer replace `Chars` with `Source` (#2288 ) This PR replaces the `Chars` iterator in the lexer with a new structure `Source`. ## What it does `Source` holds the source text, and allows: * Iterating through source text char-by-char (same as `Chars` did). * Iterating byte-by-byte. * Getting a `SourcePosition` for current position, which can be used later to rewind to that position, without having to clone the entire `Source` struct. `Source` has the same invariants as `Chars` - cursor must always be positioned on a UTF-8 character boundary (i.e. not in the middle of a multi-byte Unicode character). However, unsafe APIs are provided to allow a caller to temporarily break that invariant, as long as they satisfy it again before they pass control back to safe code. This will be useful for processing batches of bytes. ## Why I envisage most of the Lexer migrating to byte-by-byte iteration, and I believe it'll make a significant impact on performance. It will allow efficiently processing batches of bytes (e.g. consuming identifiers or whitespace) without the overhead of calculating code points for every character. It should also make all the many `peek()`, `next_char()` and `next_eq()` calls faster. `Source` is also more performant than `Chars` in itself. This wasn't my intent, but seems to be a pleasant side-effect of it being less opaque to the compiler than `Chars`, so it can apply more optimizations. In addition, because checkpoints don't need to store the entire `Source` struct, but only a `SourcePosition` (8 bytes), was able to reduce the size of `LexerCheckpoint` and `ParserCheckpoint`, and make them both `Copy`. ## Notes on implementation `Source` is heavily based on Rust's `std::str::Chars` and `std::slice::Iter` iterators and I've copied the code/concepts from them as much as possible. As it's a low-level primitive, it uses raw pointers and contains a lot of unsafe code. I think I've crossed the T's and dotted the I's, and I've commented the code extensively, but I'd appreciate a close review if anyone has time. I've split it into 2 commits. * First commit is all the substantive changes. * 2nd commit just does away with `lexer.current` which is no longer needed, and replaces `lexer.current.token` with `lexer.token` everywhere. Hopefully looking just at the 1st commit will reduce the noise and make it easier to review. ### `SourcePosition` There is one annoyance with the API which I haven't been able solve: `SourcePosition` is a wrapper around a pointer, which can only be created from the current position of `Source`. Due to the invariant mentioned above, therefore `SourcePosition` is always in bounds of the source text, and points to a UTF-8 character boundary. So `Source` can be rewound to a `SourcePosition` cheaply, without any checks. I had originally envisaged `Source::set_position` being a safe function, as `SourcePosition` enforces the necessary invariants itself. The fly in the ointment is that a `SourcePosition` could theoretically have been created from another `Source`. If that was the case, it would be out of bounds, and it would be instant UB. Consequently, `Source::set_position` has to be an unsafe function. This feels rather ridiculous. Of course the parser won't create 2 Lexers at the same time. But still it's possible, so I think better to take the strict approach and make it unsafe until can find a way to statically prove the safety by some other means. Any ideas? ## Oddity in the benchmarks There's something really odd going on with the semantic benchmark for `pdf.mjs`. While I was developing this, small and seemingly irrelevant changes would flip that benchmark from +0.5% or so to -4%, and then another small change would flip it back. What I don't understand is that parsing happens outside of the measurement loop in the semantic benchmark, so the parser shouldn't have any effect either way on semantic's benchmarks. If CodSpeed's flame graph is to be believed, most of the negative effect appears to be a large Vec reallocation happening somewhere in semantic. I've ruled out a few things: The AST produced by the parser for `pdf.mjs` after this PR is identical to what it was before. And semantic's `nodes` and `scopes` Vecs are same length as they were before. Nothing seems to have changed! I really am at a loss to explain it. Have you seen anything like this before? One possibility is a fault in my unsafe code which is manifesting only with `pdf.mjs`, and it's triggering UB, which I guess could explain the weird effects. I'm running the parser on `pdf.mjs` in Miri now and will see if it finds anything (Miri doesn't find any problem running the tests). It's been running for over an hour now. Hopefully it'll be done by morning! I feel like this shouldn't merged until that question is resolved, so marking this as draft in the meantime.	2024-02-05 13:51:46 +00:00
Boshen	aa91fde1d9	refactor(parser): only allocate for escaped template strings (#2005 )	2024-01-12 18:56:36 +08:00
overlookmotel	c7316856db	refactor(parser): reduce work parsing regexps (#1999 ) #1926 produced a small performance regression because when parsing a regexp, some work is repeated.	2024-01-12 11:36:30 +08:00
Boshen	4706765d2a	refactor(parser): reduce `Token` size from 32 to 16 bytes (#1962 ) Part of #1880 `Token` size is reduced from 32 to 16 bytes by changing the previous token value `Option<&'a str>` to a u32 index handle. It would be nice if this handle is eliminated entirely because the normal case for a string is always `&source_text[token.span.start.token.span.end]` Unfortunately, JavaScript allows escaped characters to appear in identifiers, strings and templates. These strings need to be unescaped for equality checks, i.e. `"\a" === "a"`. This leads us to adding a `escaped_strings[]` vec for storing these unescaped and allocated strings. Performance regression for adding this vec should be minimal because escaped strings are rare. Background Reading: * https://floooh.github.io/2018/06/17/handles-vs-pointers.html	2024-01-09 15:17:02 +08:00
Boshen	7eb2573178	refactor(parser): parse BigInt lazily (#1924 ) This PR partially fixes #1803 and is part of #1880. BigInt is removed from the `Token` value, so that the token size can be reduced once we removed all the variants. `Token` is now also `Copy`, which removes all the `clone` and `drop` calls. This yields 5% performance improvement for the parser.	2024-01-08 12:37:20 +08:00
Boshen	4886d408eb	chore(clippy): enable undocumented_unsafe_blocks	2023-10-16 15:18:14 +08:00
Boshen	6428139b76	fix(parser): fix `re_lex_jsx_identifier` not omitting whitespaces closes #518	2023-07-05 12:53:21 +08:00
Boshen	ad2835f11b	chore(rustfmt): run `cargo fmt`	2023-05-21 11:52:26 +08:00
Boshen	7f93e58f10	chore: remove all #[must_use]	2023-05-11 21:08:00 +08:00
Boshen	cd276c2850	feat: add `oxc_span` crate (#323 )	2023-04-27 21:51:15 +08:00
Boshen	ca0e80691c	refactor(oxc_parser): remove unused re_lex_as_typescript_r_angle	2023-04-16 12:15:49 +08:00
Boshen	b11f774c41	refactor(oxc_parser): clean up doc	2023-04-01 19:03:33 +08:00
Boshen	d917348f9b	refactor(ast,parser): move parsing context from ast to parser	2023-04-01 18:01:33 +08:00
Boshen	d4ff0bb40e	refactor(oxc_parser): parser and lexer does not need to share the errors vec	2023-04-01 15:59:42 +08:00
Boshen	174330561c	fix(parser): fix panic on multi-byte characaters (#233 ) * fix(oxc_parser): fix panic when EOF on a multi-byte character relates #232 * fix(parser): fix panic on multi-byte char in private identifer relates #232	2023-04-01 13:34:18 +08:00
Boshen	2fe8fba5b6	refactor(lexer): make TokenValue 8 bytes smaller by changing RegExp.pattern to &'a str (#175 )	2023-03-13 23:20:52 +08:00
Boshen	f36e3301fd	refactor(lexer): change `TokenValue::String(Atom)` to `TokenValue::String(&str)` (#174 )	2023-03-13 09:33:44 +08:00
Boshen	605684f4c0	fix: fix clippy warnings	2023-03-12 21:53:08 +08:00
Boshen	66207e74a4	refactor(lexer): remove `LexerContext::JsxChild` (#172 )	2023-03-12 20:19:51 +08:00
Boshen	4d32bfb55e	refactor: remove all declarations of `const fn`, which is useless for us	2023-03-07 21:29:47 +08:00
Ye Yangchen	0bf8f817f5	feat(oxc_parser): Port isStartOfDeclaration form tsc	2023-02-27 12:27:44 +08:00
Boshen	4f4a9802b7	refactor(diagnostics,parser): move diagnostics to parser	2023-02-22 19:23:01 +08:00
Boshen	5390d3e6b4	refactor(diagnostic): change Err type to miette::Error This is the prerequisite for breaking up the large Diagnostic enum.	2023-02-22 11:08:21 +08:00
Boshen	4c6407b152	refactor(ast): s/node/span This corrects the jargon for span. The term `node` came from `estree`, which is a bit misleading here in Rust. closes #9	2023-02-21 19:17:49 +08:00
Boshen	a733856536	refactor(ast,parser): use u32 for node spans The next PR will fix the jargon where Node = Span. relates to #9	2023-02-21 16:02:23 +08:00
Boshen	d57ab2f088	refactor(ast,parser): remove Node::ctx This is adding too many bytes to the AST	2023-02-21 13:11:58 +08:00
Boshen	0bbbc7768f	perf(oxc_parser): use u8 for offset	2023-02-21 13:11:58 +08:00
Boshen	85955d7147	refactor(parser): clean up some lexer code	2023-02-12 21:34:19 +08:00
Boshen	1fdc635638	feat(parser): add parser	2023-02-11 05:26:49 -08:00

47 commits