dan/oxc - BGit

dan/oxc

mirror of https://github.com/danbulant/oxc synced 2026-05-25 12:51:57 +00:00

Author	SHA1	Message	Date
Song Gao	cf3415b0e4	chore(doc): replace main/master to tag/commit to make the url always accessible (#7298 )	2024-11-16 21:00:30 +08:00
overlookmotel	9e85b104e2	refactor(parser): add `ParserImpl::alloc` method (#7063 ) Pure refactor. Introduce `ParserImpl::alloc` method. Shorten `self.ast.alloc(...)` to `self.alloc(...)`. Also reduce `alloc` calls by using `AstBuilder` methods which already allocate where possible.	2024-11-01 17:09:06 +00:00
Boshen	4d8bc8c8af	perf(parser): precompute `is_typescript` (#6443 )	2024-10-11 03:39:38 +00:00
Boshen	a4b55bf00e	refactor(parser): use AstBuilder (#5743 ) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>	2024-09-13 22:39:48 +08:00
Boshen	603817bef9	feat(oxc)!: add `SourceType::Unambiguous`; parse `.js` as unambiguous (#5557 ) See https://babel.dev/docs/options#misc-options for background on `unambiguous` Once `SourceType::Unambiguous` is parsed, it will correctly set the returned `Program::source_type` to either `module` or `script`.	2024-09-07 10:48:58 +00:00
Kevin Deng 三咲智子	234a24c14d	fix(ast)!: merge `UsingDeclaration` into `VariableDeclaration` (#5270 ) relate #2854	2024-08-28 11:26:05 +08:00
rzvxa	b936162093	refactor(ast/ast_builder)!: shorter allocator utility method names. (#4122 ) This PR serves two purposes, First off it would lower the amount of characters we have to type in for a simple operation such as wrapping an expression in a vector. Secondly, it would follow the generated names more closely since nowhere else in the builder we do have `new_xxx`, We always say `xxx` since a builder always constructs something. ``` new_vec -> vec new_vec_single -> vec1* new_vec_from_iter -> vec_from_iter new_vec_with_capacity -> vec_with_capacity new_str -> str new_atom -> atom ``` `*` This one is the main motivation behind this PR, It saves 10 characters!	2024-07-09 12:16:38 +00:00
rzvxa	d347aedfda	feat(ast)!: generate `ast_builder.rs`. (#3890 ) ### Every structure has 2 builder methods: 1. `xxx` e.g. `block_statement` ```rust #[inline] pub fn block_statement(self, span: Span, body: Vec<'a, Statement<'a>>) -> BlockStatement<'a> { BlockStatement { span, body, scope_id: Default::default() } } ``` 2. `alloc_xxx` e.g. `alloc_block_statement` ```rust #[inline] pub fn alloc_block_statement( self, span: Span, body: Vec<'a, Statement<'a>>, ) -> Box<'a, BlockStatement<'a>> { self.block_statement(span, body).into_in(self.allocator) } ``` ### We generate 3 types of methods for enums: 1. `yyy_xxx` e.g. `statement_block` ```rust #[inline] pub fn statement_block(self, span: Span, body: Vec<'a, Statement<'a>>) -> Statement<'a> { Statement::BlockStatement(self.alloc(self.block_statement(span, body))) } ``` 2. `yyy_from_xxx` e.g. `statement_from_block` ```rust #[inline] pub fn statement_from_block<T>(self, inner: T) -> Statement<'a> where T: IntoIn<'a, Box<'a, BlockStatement<'a>>>, { Statement::BlockStatement(inner.into_in(self.allocator)) } ``` 3. `yyy_xxx` where `xxx` is inherited e.g. `statement_declaration` ```rust #[inline] pub fn statement_declaration(self, inner: Declaration<'a>) -> Statement<'a> { Statement::from(inner) } ``` ------------ ### Generic parameters: We no longer accept `Box<'a, ADT>`, `Atom` or `&'a str`, Instead we use `IntoIn<'a, Box<'a, ADT>>`, `IntoIn<'a, Atom<'a>>` and `IntoIn<'a, &'a str>` respectively. It allows us to rewrite things like this: ```rust let ident = IdentifierReference::new(SPAN, Atom::from("require")); let number_literal_expr = self.ast.expression_numeric_literal( right_expr.span(), num, raw, self.ast.new_str(num.to_string().as_str()), NumberBase::Decimal, ); ``` As this: ```rust let ident = IdentifierReference::new(SPAN, "require"); let number_literal_expr = self.ast.expression_numeric_literal( right_expr.span(), num, raw, num.to_string(), NumberBase::Decimal, ); ```	2024-07-09 00:57:26 +00:00
Boshen	d0eac46fc8	refactor(parser): use function instead of trait to parse normal lists (#4003 ) To reduce boilerplate and code noise. relates #3887	2024-07-01 15:57:36 +00:00
Boshen	97d59fc2f3	refactor(parser): move code around for parsing `Modifiers` (#3849 )	2024-06-23 12:46:42 +00:00
Boshen	9b38119ec9	refactor(ast)!: replace `Modifiers` with `declare` on `VariableDeclaration` (#3839 ) part of #2958	2024-06-23 10:34:52 +00:00
Boshen	cfcef241db	feat(ast)!: add `directives` field to `TSModuleBlock` (#3830 ) closes #3564	2024-06-22 18:14:08 +00:00
Boshen	dd540c8f0f	feat(minifier): add skeleton for ReplaceGlobalDefines ast pass (#3803 )	2024-06-21 13:53:59 +00:00
Boshen	6b3d019631	refactor(paresr): move some structs to js module (#3341 )	2024-05-18 14:41:32 +00:00
Boshen	9ced605487	refactor(parser): start porting arrow function parsing from tsc (#3340 ) relates #3320	2024-05-18 22:35:29 +08:00
Boshen	b27a905958	refactor(parser): simplify `Context` passing (#3266 )	2024-05-14 12:22:27 +08:00
Boshen	2064ae9e0a	refactor(parser,diagnostic): one diagnostic struct to eliminate monomorphization of generic types (#3214 ) part of #3213 We should only have one diagnostic struct instead 353 copies of them, so we don't end up choking LLVM with 50k lines of the same code due to monomorphization. If the proposed approach is good, then I'll start writing a codemod to turn all the existing structs to plain functions. --- Background: Using `--timings`, we see `oxc_linter` is slow on codegen (the purple part). ![image](https://github.com/zkat/miette/assets/1430279/c1df4f7d-90ef-4c0f-9956-2ec3194db7ca) The crate currently contains 353 miette errors. [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) displays ``` cargo llvm-lines -p oxc_linter --lib --release Lines Copies Function name ----- ------ ------------- 830350 33438 (TOTAL) 29252 (3.5%, 3.5%) 808 (2.4%, 2.4%) <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop 23298 (2.8%, 6.3%) 353 (1.1%, 3.5%) miette::eyreish::error::object_downcast 19062 (2.3%, 8.6%) 706 (2.1%, 5.6%) core::error::Error::type_id 12610 (1.5%, 10.1%) 65 (0.2%, 5.8%) alloc::raw_vec::RawVec<T,A>::grow_amortized 12002 (1.4%, 11.6%) 706 (2.1%, 7.9%) miette::eyreish::ptr::Own<T>::boxed 9215 (1.1%, 12.7%) 115 (0.3%, 8.2%) core::iter::traits::iterator::Iterator::try_fold 9150 (1.1%, 13.8%) 1 (0.0%, 8.2%) oxc_linter::rules::RuleEnum::read_json 8825 (1.1%, 14.9%) 353 (1.1%, 9.3%) <miette::eyreish::error::ErrorImpl<E> as core::error::Error>::source 8822 (1.1%, 15.9%) 353 (1.1%, 10.3%) miette::eyreish::error::<impl miette::eyreish::Report>::construct 8119 (1.0%, 16.9%) 353 (1.1%, 11.4%) miette::eyreish::error::object_ref 8119 (1.0%, 17.9%) 353 (1.1%, 12.5%) miette::eyreish::error::object_ref_stderr 7413 (0.9%, 18.8%) 353 (1.1%, 13.5%) <miette::eyreish::error::ErrorImpl<E> as core::fmt::Display>::fmt 7413 (0.9%, 19.7%) 353 (1.1%, 14.6%) miette::eyreish::ptr::Own<T>::new 6669 (0.8%, 20.5%) 39 (0.1%, 14.7%) alloc::raw_vec::RawVec<T,A>::try_allocate_in 6173 (0.7%, 21.2%) 353 (1.1%, 15.7%) miette::eyreish::error::<impl miette::eyreish::Report>::from_std 6027 (0.7%, 21.9%) 70 (0.2%, 16.0%) <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter 6001 (0.7%, 22.7%) 353 (1.1%, 17.0%) miette::eyreish::error::object_drop 6001 (0.7%, 23.4%) 353 (1.1%, 18.1%) miette::eyreish::error::object_drop_front 5648 (0.7%, 24.1%) 353 (1.1%, 19.1%) <miette::eyreish::error::ErrorImpl<E> as core::fmt::Debug>::fmt ``` It's totalling more than 50k llvm lines, and is putting pressure on rustc codegen (the purple part on `oxc_linter` in the image above. --- It's pretty obvious by looking at https://github.com/zkat/miette/blob/main/src/eyreish/error.rs, the generics can expand out to lots of code.	2024-05-11 04:56:22 +00:00
overlookmotel	7e1fe36c68	refactor(ast): squash nested enums (#3115 ) OK, this is a big one... I have done this as part of work on Traversable AST, but I believe it has wider benefits, so thought better to spin it off into its own PR. ## What this PR does This PR squashes all nested AST enum types (#2685). e.g.: Previously: ```rs pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... / Declaration(Declaration<'a>), } pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>), / ...other Declaration variants... / } ``` After this PR: ```rs #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>) = 0, / ...other Statement variants... / VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, / ...other Declaration variants... / } #[repr(C, u8)] pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, / ...other Declaration variants... / } ``` All `Declaration`'s variants are combined into `Statement`, but `Declaration` type still exists. As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a `Declaration` can be transmuted to a `Statement` at zero cost. This is the same thing as #2847, but here applied to all* nested enums in the AST, and with improved helper methods. No enums increase in size, and a few get smaller. Indirection is reduced for some types (this removes multiple levels of boxing). ## Why? 1. It is a prerequisite for Traversable AST (#2987). 2. It would help a lot with AST Transfer (#2409) - it solves the only remaining blocker for this. 3. It is a step closer to making the whole AST `#[repr(C)]`. ## Why is it a good thing for the AST to be `#[repr(C)]`? Oxc's direction appears to be increasingly to build up control over the fundamental primitives we use, in order to unlock performance and features. We have our own allocator, our own custom implementations for `Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central building block of Oxc, and taking control of its memory layout feels like a step in this same direction. Oxc has a major advantage over other similar libraries in that it keeps all the AST data in an arena. This opens the door to treating the AST either as Rust types or as pure data (just bytes). That data can be moved around and manipulated beyond what Rust natively allows. However, to enable that, the types need to be well-specified, with completely stable layouts. `#[repr(C)]` is the only tool Rust provides to do this. Once the types are `#[repr(C)]`, various features become possible: 1. Cheap transfer of the AST across boundaries without ser/deser - the property used by AST Transfer. 2. Having multiple versions of the AST (standard, read-only, traversable), and these AST representations can be converted to one other at zero cost via transmute - the property used by Traversable AST scheme. 3. Caching AST data on disk (#3079) or transferring across network. 4. Stuff we haven't thought of yet! Allowing the AST to be treated as pure data will likely unlock other "next level" features further down the track (caching for "edge bundling" comes to mind). ## The problem with `#[repr(C)]` It's not required to squash nested enums to make the AST `#[repr(C)]`. But the problem with `#[repr(C)]` is that it disables some compiler optimizations. Without `#[repr(C)]`, the compiler squashes enums itself in some cases (which is how `Statement` is currently 16 bytes). But making the types `#[repr(C)]` as they are currently disables this optimization. So this PR essentially makes explicit what the compiler is already doing - and in fact goes a bit further with the optimization than the compiler is able to, in squashing 3 or 4 layers of nested enums (the compiler only does up to 2 layers). ## Implementation One enum "inheriting" variants from another is implemented with `inherit_variants!` macro. ```rs inherit_variants! { #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... / // `Declaration` variants added here by `inherit_variants!` macro @inherit Declaration // `ModuleDeclaration` variants added here by `inherit_variants!` macro @inherit ModuleDeclaration } } ``` The macro is fairly* lightweight, and I think the above is quite easy to understand. No proc macros. The macro also implements utility methods for converting between enums e.g. `Statement::as_declaration`. These methods are all zero-cost (essentially transmutes). New patterns for dealing with nested enums are introduced: Creation: ```rs // Old let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl)); // New let stmt = Statement::VariableDeclaration(var_decl); ``` Conversion: ```rs // Old let stmt = Statement::Declaration(decl); // New let stmt = Statement::from(decl); ``` Testing: ```rs // Old if matches!(stmt, Statement::Declaration(_)) { } if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { } // New if stmt.is_declaration() { } if matches!(stmt, Statement::ImportDeclaration(_)) { } ``` Branching: ```rs // Old if let Statement::Declaration(decl) = &stmt { decl.do_stuff() }; // New if let Some(decl) = stmt.as_declaration() { decl.do_stuff() }; ``` Matching: ```rs // Old match stmt { Statement::Declaration(decl) => visitor.visit(decl), } // New (exhaustive match) match stmt { match_declaration!(Statement) => visitor.visit(stmt.to_declaration()), } // New (alternative) match stmt { _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()), } ``` New syntax has pluses and minuses vs the old. `match` syntax is worse, but when working with a deeply nested enum, the code is much nicer - it's shorter and easier to read. This PR removes 200 lines from the linter with changes like this: https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95 ```diff - let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) = - &assignment_expr.left - else { - return; - }; - let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) = - simple_assignment_target + let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left else { return; }; ```	2024-04-28 20:40:37 +08:00
Boshen	92d709bf21	feat(ast): add `CatchParameter` node (#3049 )	2024-04-21 23:43:39 +08:00
Boshen	240ff19675	refactor(parser): improve parsing of `BindingPattern` in TypeScript (#2624 ) closes #2622	2024-03-06 16:16:03 +08:00
Boshen	be6b8b7ce6	[BREAKING CHANGE] Change `Atom` to `Atom<'a>` to make it safe (#2497 ) Part of #2295 This PR splits the `Atom` type into `Atom<'a>` and `CompactString`. All the AST node strings now use `Atom<'a>` instead of `Atom` to signify it belongs to the arena. It is now up to the user to select which form of the string to use. This PR essentially removes the really unsafe code `93742f89e9/crates/oxc_span/src/atom.rs (L98-L107)` which can lead to ![image](https://github.com/oxc-project/oxc/assets/1430279/8c513c4f-19b0-4b63-b61c-e07c187c95b5)	2024-02-26 19:34:40 +08:00
overlookmotel	0bdecb5043	refactor(parser): wrapper type for parser (#2339 ) Split parser into public interface `Parser` and internal implementation `ParserImpl`. This involves no changes to public API. This change is a bit annoying, but justification is that it's required for #2341, which I believe to be very worthwhile. The `ParserOptions` type also makes it a bit clearer what the defaults for `allow_return_outside_function` and `preserve_parens` are. It came as a surprise to me that `preserve_parens` defaults to `true`, and this refactor makes that a bit more obvious when reading the code. All the real changes are in [oxc_parser/src/lib.rs](https://github.com/oxc-project/oxc/pull/2339/files#diff-8e59dfd35fc50b6ac9a9ccd991e25c8b5d30826e006d565a2e01f3d15dc5f7cb). The rest of the diff is basically replacing `Parser` with `ParserImpl` everywhere else.	2024-02-07 23:22:08 +08:00
Boshen	4706765d2a	refactor(parser): reduce `Token` size from 32 to 16 bytes (#1962 ) Part of #1880 `Token` size is reduced from 32 to 16 bytes by changing the previous token value `Option<&'a str>` to a u32 index handle. It would be nice if this handle is eliminated entirely because the normal case for a string is always `&source_text[token.span.start.token.span.end]` Unfortunately, JavaScript allows escaped characters to appear in identifiers, strings and templates. These strings need to be unescaped for equality checks, i.e. `"\a" === "a"`. This leads us to adding a `escaped_strings[]` vec for storing these unescaped and allocated strings. Performance regression for adding this vec should be minimal because escaped strings are rare. Background Reading: * https://floooh.github.io/2018/06/17/handles-vs-pointers.html	2024-01-09 15:17:02 +08:00
overlookmotel	eb2966c512	fix(parser): fix incorrectly identified directives (#1885 ) Parser incorrectly identifies string literals as directives if they follow after `import`s, `export`s, or decorators. In all of these cases, `'use strict'` produces a directive in the AST, where it should be parsed as an `ExpressionStatement` containing a `StringLiteral`: ```js import x from 'foo'; 'use strict'; ``` ```js export {x}; 'use strict'; ``` ```js @foo 'use strict'; ``` [Playground](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC0gICAgICAgIC0G8rnONK89ITJ3zrK%2FUP7OmSZPgHQzStr3yMtwFTU%2BD1WPt09JgqZJLoYooydbGsM5vGcf34BnIA%3D) This PR should fix that. I'm not sure about the decorator case, though. I assume it's not a directive. But is prefixing a string literal with a decorator even legal syntax anyway? And a side nit: If I'm reading it right, I don't think the `continue` statement in the decorator arm of the match does anything. Do I have that right? Last question: Where does one go about putting a test? I guess these silly cases aren't covered by Babel etc's tests. --------- Co-authored-by: Boshen <boshenc@gmail.com>	2024-01-04 13:39:15 +00:00
overlookmotel	1feec95a94	fix(parser) fix typo in `expecting_directives` variable name (#1801 ) Renamves `expecting_diretives ` to `expecting_directives` to fix spelling	2023-12-24 16:51:02 +00:00
Boshen	567c6ed757	feat(prettier): print directives (#1497 )	2023-11-22 19:39:25 +08:00
Cameron	5b1e1e5408	feat(parser): TypeScript 5.2 (#811 ) - adds support for [Using Declarations](https://devblogs.microsoft.com/typescript/announcing-typescript-5-2/#using-declarations-and-explicit-resource-management) Closes #786	2023-10-05 12:52:14 +13:00
Yunfei He	35167599bc	refactor(ast): use `atom` for `Directive` and `Hashbang` (#701 ) The main reason is using Atom to remove the lifetime for convenience. And after removing the lifetime of these nodes, the `Program<'a>` doesn't rely on `&'a source` anymore, which allows us to [specify more accurate lifetimes](https://github.com/web-infra-dev/oxc/discussions/700).	2023-08-09 13:52:56 +08:00
Carter Snook	985b8f21d9	feat: support hashbang interpreter comments (#431 )	2023-06-11 23:55:58 +08:00
Boshen	a0b09a3f27	refactor(ast): remove `RestElement` from `BindingPattern`	2023-05-16 22:25:52 +08:00
Boshen	cd276c2850	feat: add `oxc_span` crate (#323 )	2023-04-27 21:51:15 +08:00
Boshen	fec5aafbf1	refactor(oxc_parser): remove a few unused diagnostics	2023-04-15 18:13:15 +08:00
Boshen	6360bdad31	fix(parser): fix [+in] context in `CallArguments` (#265 ) relates #255	2023-04-06 21:02:30 +08:00
Boshen	1c2acd121c	refactor(parser): clean up parsing of ForStatement (#251 ) closes #176	2023-04-04 22:36:35 +08:00
Boshen	236d53ad9d	fix(parser): fix panic on multi-byte char in ExpectCatchFinally error	2023-04-02 20:09:38 +08:00
Boshen	b11f774c41	refactor(oxc_parser): clean up doc	2023-04-01 19:03:33 +08:00
Boshen	d917348f9b	refactor(ast,parser): move parsing context from ast to parser	2023-04-01 18:01:33 +08:00
Boshen	f2fcbb30c3	refactor(oxc_parser): removed not needed generic from `unexpected` function	2023-04-01 15:59:42 +08:00
Boshen	174330561c	fix(parser): fix panic on multi-byte characaters (#233 ) * fix(oxc_parser): fix panic when EOF on a multi-byte character relates #232 * fix(parser): fix panic on multi-byte char in private identifer relates #232	2023-04-01 13:34:18 +08:00
Boshen	d6e8c6fb2f	feat(parser): check `ReturnStatement` in return context	2023-03-12 23:30:32 +08:00
Boshen	c2f760f1ed	chore: run `types -w" to fix all typos	2023-03-11 23:37:19 +08:00
Fnll	81760da7cc	feat(parser): better diagnostic for missing semicolon in for loop statement (#133 ) feat(parser): better diagnostic Co-authored-by: kerui.lian <kerui.lian@bytedance.com>	2023-03-05 04:13:23 -08:00
Ye Yangchen	b06ab627bf	fix(oxc_parser) correct span for decorators	2023-03-02 21:34:24 -08:00
Ye Yangchen	d8c6caf57f	feat(oxc_parser): Parse modifiers before declaration	2023-03-01 22:50:23 -08:00
Ye Yangchen	0bf8f817f5	feat(oxc_parser): Port isStartOfDeclaration form tsc	2023-02-27 12:27:44 +08:00
Boshen	4f4a9802b7	refactor(diagnostics,parser): move diagnostics to parser	2023-02-22 19:23:01 +08:00
Boshen	4c6407b152	refactor(ast): s/node/span This corrects the jargon for span. The term `node` came from `estree`, which is a bit misleading here in Rust. closes #9	2023-02-21 19:17:49 +08:00
Boshen	a733856536	refactor(ast,parser): use u32 for node spans The next PR will fix the jargon where Node = Span. relates to #9	2023-02-21 16:02:23 +08:00
Boshen	1fdc635638	feat(parser): add parser	2023-02-11 05:26:49 -08:00

49 commits