Commit graph

399 commits

Author SHA1 Message Date
Boshen
a8af5de8f5
refactor(syntax): move number related functions to number module (#3130) 2024-04-29 18:54:35 +08:00
overlookmotel
7e1fe36c68
refactor(ast): squash nested enums (#3115)
OK, this is a big one...

I have done this as part of work on Traversable AST, but I believe it
has wider benefits, so thought better to spin it off into its own PR.

## What this PR does

This PR squashes all nested AST enum types (#2685).

e.g.: Previously:

```rs
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    Declaration(Declaration<'a>),
}

pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    /* ...other Declaration variants... */
}
```

After this PR:

```rs
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>) = 0,
    /* ...other Statement variants... */

    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}

#[repr(C, u8)]
pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}
```

All `Declaration`'s variants are combined into `Statement`, but
`Declaration` type still exists.

As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a
`Declaration` can be transmuted to a `Statement` at zero cost.

This is the same thing as #2847, but here applied to *all* nested enums
in the AST, and with improved helper methods.

No enums increase in size, and a few get smaller. Indirection is reduced
for some types (this removes multiple levels of boxing).

## Why?

1. It is a prerequisite for Traversable AST (#2987).
2. It would help a lot with AST Transfer (#2409) - it solves the only
remaining blocker for this.
3. It is a step closer to making the whole AST `#[repr(C)]`.

## Why is it a good thing for the AST to be `#[repr(C)]`?

Oxc's direction appears to be increasingly to build up control over the
fundamental primitives we use, in order to unlock performance and
features. We have our own allocator, our own custom implementations for
`Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central
building block of Oxc, and taking control of its memory layout feels
like a step in this same direction.

Oxc has a major advantage over other similar libraries in that it keeps
all the AST data in an arena. This opens the door to treating the AST
either as Rust types or as *pure data* (just bytes). That data can be
moved around and manipulated beyond what Rust natively allows.

However, to enable that, the types need to be well-specified, with
completely stable layouts. `#[repr(C)]` is the only tool Rust provides
to do this.

Once the types are `#[repr(C)]`, various features become possible:

1. Cheap transfer of the AST across boundaries without ser/deser - the
property used by AST Transfer.
2. Having multiple versions of the AST (standard, read-only,
traversable), and these AST representations can be converted to one
other at zero cost via transmute - the property used by Traversable AST
scheme.
3. Caching AST data on disk (#3079) or transferring across network.
4. Stuff we haven't thought of yet!

Allowing the AST to be treated as pure data will likely unlock other
"next level" features further down the track (caching for "edge
bundling" comes to mind).

## The problem with `#[repr(C)]`

It's not *required* to squash nested enums to make the AST `#[repr(C)]`.

But the problem with `#[repr(C)]` is that it disables some compiler
optimizations. Without `#[repr(C)]`, the compiler squashes enums itself
in some cases (which is how `Statement` is currently 16 bytes). But
making the types `#[repr(C)]` as they are currently disables this
optimization.

So this PR essentially makes explicit what the compiler is already doing
- and in fact goes a bit further with the optimization than the compiler
is able to, in squashing 3 or 4 layers of nested enums (the compiler
only does up to 2 layers).

## Implementation

One enum "inheriting" variants from another is implemented with
`inherit_variants!` macro.

```rs
inherit_variants! {
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    
    // `Declaration` variants added here by `inherit_variants!` macro
    @inherit Declaration
    // `ModuleDeclaration` variants added here by `inherit_variants!` macro
    @inherit ModuleDeclaration
}
}
```

The macro is *fairly* lightweight, and I think the above is quite easy
to understand. No proc macros.

The macro also implements utility methods for converting between enums
e.g. `Statement::as_declaration`. These methods are all zero-cost
(essentially transmutes).

New patterns for dealing with nested enums are introduced:

Creation:

```rs
// Old
let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl));

// New
let stmt = Statement::VariableDeclaration(var_decl);
```

Conversion:

```rs
// Old
let stmt = Statement::Declaration(decl);

// New
let stmt = Statement::from(decl);
```

Testing:

```rs
// Old
if matches!(stmt, Statement::Declaration(_)) { }
if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { }

// New
if stmt.is_declaration() { }
if matches!(stmt, Statement::ImportDeclaration(_)) { }
```

Branching:

```rs
// Old
if let Statement::Declaration(decl) = &stmt { decl.do_stuff() };

// New
if let Some(decl) = stmt.as_declaration() { decl.do_stuff() };
```

Matching:

```rs
// Old
match stmt {
    Statement::Declaration(decl) => visitor.visit(decl),
}

// New (exhaustive match)
match stmt {
    match_declaration!(Statement) => visitor.visit(stmt.to_declaration()),
}

// New (alternative)
match stmt {
    _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()),
}
```

New syntax has pluses and minuses vs the old. `match` syntax is worse,
but when working with a deeply nested enum, the code is much nicer -
it's shorter and easier to read.

This PR removes 200 lines from the linter with changes like this:


https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95

```diff
- let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) =
-     &assignment_expr.left
- else {
-     return;
- };
- let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) =
-     simple_assignment_target
+ let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left
else {
    return;
};
```
2024-04-28 20:40:37 +08:00
overlookmotel
0185eb2edc
refactor(ast): remove duplicate TSNamedTupleMember representation (#3101)
Removes duplicate representation of a `TSTupleElement` which is a
`TSNamedTupleMember`.

Closes #3100.
2024-04-25 19:16:24 +08:00
Ali Rezvani
ac72d08592
chore: cleanup the dependencies on static_assertions and oxc_index. (#3095)
We used to export `static_assertions` as part of the `oxc_index`. It
would've made sense back when it was only a vessel for exporting other
crates - although even then it wouldn't make much sense other than being
convenient - Now with it turning into a port of `index_vec` and
potentially getting bigger as the result of specific needs of the
project; It makes much more sense to stop exporting it from `oxc_index`
and use the crate directly in places that used to use what `oxc_index`
were exporting.


PS: we may want to follow up this with an `oxc_asset` crate containing
our own set of assertion tools which would also export
`static_assertions`.
2024-04-25 16:56:23 +08:00
overlookmotel
942b2ba084
refactor(ast): add array element Elision type (#3074)
Pure refactor. This change does nothing except makes it more consistent
with other types which are also just a wrapper around `Span` e.g.
`NullLiteral` and `TSThisType`.
2024-04-23 02:05:11 +08:00
Boshen
559bca86c5
Release crates v0.12.5 2024-04-22 12:52:17 +08:00
Ali Rezvani
6c8296164e
perf(ast): box typescript enum variants. (#3065)
Similar to #3058 and #3061 it is a continuation of #3047.

Handles these enum types:

> TSEnumMemberName
> Variant sizes: 16, 24, 24, 40
> Unboxed variants: IdentifierName (struct), StringLiteral (struct),
NumericLiteral (struct)
> Dependents: TSEnumMember (struct)
> => Box all variants.
>
> TSModuleReference 
> Variant sizes: 16, 32
> Unboxed variants: TSExternalModuleReference (struct)
> Dependents: Box<TSModuleReference> in TSImportEqualsDeclaration
> => Box all variants. Replace Box<TSModuleReference> with
TSModuleReference in TSImportEqualsDeclaration.
>
> TSTypePredicateName 
> Variant sizes: 8, 24
> Unboxed variants: IdentifierName (struct), TSThisType (struct)
> Dependents: TSTypePredicate (struct)
> => Box Identifier variant. Do not box This variant as only 8 bytes
(just contains Span).
>
> TSTypeQueryExprName 
> Variant sizes: 16, 88
> Unboxed variants: TSImportType (struct)
> Dependents: TSTypeQuery (struct)
> => Box TSImportType variant. Do not box TSTypeName variant, as is
another enum.
2024-04-22 09:54:53 +08:00
overlookmotel
48e20880d4
perf(ast): box enum variants (#3058)
Box all enum variants for JSX types (`JSXAttributeName`,
`JSXAttributeValue`, `JSXChild`, `JSXElementName`,
`JSXMemberExpressionObject`). Part of #3047.

I'm not sure how to interpret the benchmark results. As I said on #3047:

> I imagine it may cost a little in performance in the parser due to
extra calls to `alloc`, but in return traversing the AST should be
cheaper, as the data is more compact, so less cache misses.

Sure enough, there is a small impact (1%) on the 2 parser benchmarks for
JSX files. However, the other benchmarks have too much noise in them to
see whether this is repaid in a speed up on transformer etc, especially
as the transformer benchmarks also include parsing.

What do you think @Boshen?
2024-04-22 09:09:30 +08:00
overlookmotel
383b449d4e
perf(ast): box ImportDeclarationSpecifier enum variants (#3061)
Part of #3047.

As with #3058, it's hard to interpret the benchmark results here. But in
this case I think it's easier to see from "first principles" that this
should be an improvement - `ImportSpecifier` is pretty massive (80
bytes) vs `ImportDefaultSpecifier` (40 bytes), and the latter (e.g.
`import React from 'react'`) is common in JS code.
2024-04-22 09:06:39 +08:00
overlookmotel
2804e7dbf6
perf(ast): reduce indirection in AST types (#3051)
Fixes #3048.

No apparent change on benchmarks. Likely these TS features are not much
used in the benchmark fixtures.
2024-04-22 09:05:25 +08:00
Boshen
92d709bf21
feat(ast): add CatchParameter node (#3049) 2024-04-21 23:43:39 +08:00
overlookmotel
d44301c871
fix(parser): fix comment typos (#3036)
Fix 2 typos in comments.
2024-04-20 13:13:25 +03:30
Boshen
a05c4e39b8
Release crates v0.12.4 2024-04-19 16:40:05 +08:00
Boshen
93ce5a919a
chore: fix internal doc warnings 2024-04-13 15:59:24 +08:00
branchseer
f159f60084
Make ast types covariant over the allocator lifetime. (#2943)
## Why

Due to the usage of `&'alloc mut T` in `oxc_allocator::Box`, and
`bumpalo::collections::Vec` in `oxc_allocator::Vec`, ast types are
currently invariant over their allocator lifetime `'a`. This prevents
`ouroboros` from generating `borrow_*` on ast type fields, leading to
the unfriendly `with_*` api:
c250b288ef/crates/oxc_parser/examples/multi-thread.rs (L82-L84)

## How

- For `oxc_allocator::Vec`, switch to `allocator_api2::vec::Vec`, which
has a covariant relationship with the allocator lifetime.
- For `oxc_allocator::Box`, use `std::ptr::NonNull` which is
specifically designed to be covariant. I don't use
`allocator_api2::boxed::Box` because it holds the allocator for
dropping, so the size is bigger.

## Downside

Now that `oxc_allocator::Box` uses the unsafe `NonNull`. It has to be a
private field to be safe. This make it impossible to do `Box(....)`
pattern matching.
2024-04-12 18:12:18 +08:00
Boshen
614f73b66c
Release crates v0.12.3 2024-04-11 16:18:17 +08:00
Boshen
59748199da
refactor(ast): clean up the ts type visit methods 2024-04-11 15:26:24 +08:00
Boshen
09452659e2
Release crates v0.12.2 2024-04-08 11:13:13 +08:00
Boshen
fb2ebf462e
chore: fix clippy on unsafe comment 2024-04-03 19:57:21 +08:00
Boshen
feb3c90098
chore(parser): allow unsafe in examples 2024-04-03 19:40:02 +08:00
Boshen
366a7fb0d4
Release crates v0.11.2 2024-04-03 19:36:54 +08:00
Boshen
504698ab4a
chore: guard against unsafe code as much as possible. 2024-04-03 19:35:07 +08:00
Boshen
54f7cd3978
Release crates v0.11.1 2024-04-03 16:57:52 +08:00
Boshen
23d3c4e0a4
chore: add changelogs via git cliff (#2878)
This is generated alongside https://github.com/oxc-project/release-oxc
2024-04-01 20:04:48 +08:00
Boshen
31ed532b79
Release crates v0.11.0 2024-03-30 13:54:53 +08:00
Ali Rezvani
b76b02d019
fix(parser): add support for empty module declaration (#2834)
Should be merged after #2829, Tried a few times to get it done with
graphite stacking but found no success. I guess it either doesn't work
with forks or It is just a skill issue since I'm not familiar with it.

closes: #2829
closes: #2830

---------

Co-authored-by: Dmytro Maretskyi <maretskii@gmail.com>
2024-03-27 13:48:03 +08:00
Boshen
95fc28168c
chore: apply cargo autoinherit (#2826)
See https://github.com/mainmatter/cargo-autoinherit
2024-03-26 23:57:50 +08:00
Ali Rezvani
fc3878350f
refactor(ast): add walk_mut functions (#2776)
* move `visit` and `visit_mut` modules to a super module called `visit`
* add `walk_mut` module containing walk functions
* update `enter_node` and `leave_node` events to not pass a reference in the `VisitMut` trait
* add `AstType`, a non-referencing version of `AstKind` to use with `VisitMut` trait
* update the `VisitMut` trait's usages.
2024-03-25 20:40:13 +03:30
Boshen
e32a3b3783
ci: use cargo-shear (#2810) 2024-03-26 00:43:10 +08:00
Ali Rezvani
198eea0bce
refactor(ast): add walk functions to Visit trait. (#2791)
closes #2442
2024-03-25 10:44:29 +08:00
Boshen
ef1108a749
chore: Rust v1.77.0 (#2781) 2024-03-21 17:21:57 +00:00
overlookmotel
e793063f75
perf(parser): faster lexing JSX identifiers (#2557)
Speed up lexing JSX identifier continuations (i.e. after `-`), by
searching for end of identifier byte-by-byte.

Change does not register on benchmarks, only because benchmarks don't
contain any `<Foo-Bar />` identifiers, so don't exercise this code path.
2024-03-18 12:06:27 +00:00
Boshen
798a1fde09
fix(parser): fix failed to parse JSXChild after JSXEmptyExpression (#2726)
fixes #2723
2024-03-15 13:39:20 +08:00
Boshen
a5ddb5b452
Release crates v0.10.0 2024-03-14 18:23:34 +08:00
Boshen
697b6b70c0
feat: merge features serde and wasm to serialize (#2716)
This PR merges the previous confusing features `serde` and `wasm` into a
single `serialize` feature.

We'll eventually do serialize + type information for both wasm and napi
targets.

`oxc_macros` is removed from `oxc_ast`'s dependency because it requires
`syn` and friends, which goes against our policy ["Third-party
dependencies should be
minimal."](https://oxc-project.github.io/docs/contribute/rules.html#development-policy)
2024-03-14 17:13:12 +08:00
Boshen
0f86333437
refactor(ast): refactor Trivias API - have less noise around it (#2692) 2024-03-12 20:16:36 +08:00
Boshen
86ee074678
fix(parser): remove all duplicated comments in trivia builder (#2689) 2024-03-12 17:51:22 +08:00
Boshen
cda9c93436
fix(parser): improve lexing of jsx identifier to fix duplicated comments after jsx name (#2687) 2024-03-12 15:51:51 +08:00
Boshen
6c6adb46d1
fix(ast): parse rest parameter with the correct optional and type annotation syntax (#2686)
closes #2653
2024-03-12 15:47:22 +08:00
Boshen
8a73d18fcf
chore(parser): make sure all span.end >= span.start (#2681)
closes #2679
2024-03-11 19:49:51 +08:00
Arnaud Barré
b378e7ecc9
fix(parser): fix span for JSXEmptyExpression with comment (#2673)
[playground](https://oxc-project.github.io/oxc/playground/?code=3YCAAICVgICAgICAgICejwtjmCpbllbPawdM2eEFKwhGb62iFlQWu39yrLCA)

---------

Co-authored-by: Boshen <boshenc@gmail.com>
2024-03-11 10:50:33 +00:00
Arnaud Barré
82260318a9
fix(parser): fix span start for return type in function type (#2660)
This matter for code like. This matches the behavior of both Babel and
TSESLint.

```ts
export type Plugin = (
  a: string
) => // Comment
number
```


[TSESLint](https://ast.sxzz.moe/#eNo9jMEKwjAMQH8l5KQw2X0wL/6AB/HUS61hVNq0pKkoY/9u52C35L3kzRhwwJd92+LEZ8UOcwP6zbSBE5XgeeWucfrkJAqrhmuok2cY4WAYwA5QVDxPho8wnqHv4ZJiJFbDXOODpCVSS8zrtcGSqji6tZDBoe0xPWtoc7dpctHeSYpPvPlglYruXixP/0+VSoYXXH53+0Kk)

[OXC](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC5gICAgICAgICyHorESipoTp3admelrvvzLVu5WllVkMM9n7p1s27YYhddDchOGSC6foF%2BGw%2B1Mfo7DYhiNueGpuc27%2F3gf2tToIA%3D)
2024-03-10 13:32:25 +08:00
Arnaud Barré
c3477de64e
fix(ast)!: rename BigintLiteral to BigIntLiteral (#2659)
This matches the case for the name in Babel. (in ESTree everything is a
`Literal`)
2024-03-10 13:31:51 +08:00
Arnaud Barré
b453a072cc
fix(parser): parse named rest element in type tuple (#2655)
This is fixing the parser for `type X = [...args: string[]];`

In TSESLint TSNamedTupleMember in part of the TSType union, so I did the
same.
2024-03-10 13:25:15 +08:00
Arnaud Barré
776812315d
fix(parser)!: drop TSImportEqualsDeclaration.is_export (#2654)
This is one point where Babel and TSESLint diverge. For linter purposes
TSESLint structure makes more sense and that the reason of
https://github.com/typescript-eslint/typescript-eslint/issues/4130

The remaining `is_export` was creating redundant information and made
prettier (and the WIP oxc/prettier) print the AST of `export import X =
Y` as `export export import X = Y`.
2024-03-10 13:22:18 +08:00
Boshen
32303b20fb
New tool: oxc_module_lexer (#2650)
# Oxc Module Lexer

This is not a lexer. The name "lexer" is used for easier recognition.

## [es-module-lexer](https://github.com/guybedford/es-module-lexer)

Outputs the list of exports and locations of import specifiers,
including dynamic import and import meta handling.

Does not have any
[limitations](https://github.com/guybedford/es-module-lexer?tab=readme-ov-file#limitations)
mentioned in `es-module-lexer`.

I'll also work on the following cases to make this feature complete.

- [ ] get imported variables
https://github.com/guybedford/es-module-lexer/issues/163
- [ ] track star exports as imports as well
https://github.com/guybedford/es-module-lexer/issues/76
- [ ] TypeScript specific syntax
- [ ] TypeScript `type` import / export keyword

## [cjs-module-lexer](https://github.com/nodejs/cjs-module-lexer)

- [ ] TODO

## Benchmark

This is 2 times slower than `es-module-lexer`, but will be significantly
faster when TypeScript is processed.

The difference is around 10ms vs 20ms on a large file (700k).
2024-03-09 23:23:55 +08:00
Boshen
265b2fb640
feat: miette v7 (#2465) 2024-03-08 15:50:00 +08:00
magic-akari
2a235d3b8c
fix(ast): parse with_clause in re-export declaration (#2634) 2024-03-07 14:09:31 +08:00
Boshen
240ff19675
refactor(parser): improve parsing of BindingPattern in TypeScript (#2624)
closes #2622
2024-03-06 16:16:03 +08:00
overlookmotel
0646bf34fa
refactor: rename CompactString to CompactStr (#2619)
Preparatory step for #2620.

This PR purely changes names of types and methods:

* `CompactString` -> `CompactStr`
* `Atom::to_compact_string` -> `to_compact_str`
* `Atom::into_compact_string` -> `into_compact_str`

Have split this into a separate PR as the diff is large, but it does absolutely nothing but renaming (I've checked the whole diff twice, so feel free not to check it again!). This should make it easier to see the content of the substantive change in #2620.
2024-03-06 12:24:23 +08:00