No description
Find a file
overlookmotel 7e1fe36c68
refactor(ast): squash nested enums (#3115)
OK, this is a big one...

I have done this as part of work on Traversable AST, but I believe it
has wider benefits, so thought better to spin it off into its own PR.

## What this PR does

This PR squashes all nested AST enum types (#2685).

e.g.: Previously:

```rs
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    Declaration(Declaration<'a>),
}

pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    /* ...other Declaration variants... */
}
```

After this PR:

```rs
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>) = 0,
    /* ...other Statement variants... */

    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}

#[repr(C, u8)]
pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}
```

All `Declaration`'s variants are combined into `Statement`, but
`Declaration` type still exists.

As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a
`Declaration` can be transmuted to a `Statement` at zero cost.

This is the same thing as #2847, but here applied to *all* nested enums
in the AST, and with improved helper methods.

No enums increase in size, and a few get smaller. Indirection is reduced
for some types (this removes multiple levels of boxing).

## Why?

1. It is a prerequisite for Traversable AST (#2987).
2. It would help a lot with AST Transfer (#2409) - it solves the only
remaining blocker for this.
3. It is a step closer to making the whole AST `#[repr(C)]`.

## Why is it a good thing for the AST to be `#[repr(C)]`?

Oxc's direction appears to be increasingly to build up control over the
fundamental primitives we use, in order to unlock performance and
features. We have our own allocator, our own custom implementations for
`Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central
building block of Oxc, and taking control of its memory layout feels
like a step in this same direction.

Oxc has a major advantage over other similar libraries in that it keeps
all the AST data in an arena. This opens the door to treating the AST
either as Rust types or as *pure data* (just bytes). That data can be
moved around and manipulated beyond what Rust natively allows.

However, to enable that, the types need to be well-specified, with
completely stable layouts. `#[repr(C)]` is the only tool Rust provides
to do this.

Once the types are `#[repr(C)]`, various features become possible:

1. Cheap transfer of the AST across boundaries without ser/deser - the
property used by AST Transfer.
2. Having multiple versions of the AST (standard, read-only,
traversable), and these AST representations can be converted to one
other at zero cost via transmute - the property used by Traversable AST
scheme.
3. Caching AST data on disk (#3079) or transferring across network.
4. Stuff we haven't thought of yet!

Allowing the AST to be treated as pure data will likely unlock other
"next level" features further down the track (caching for "edge
bundling" comes to mind).

## The problem with `#[repr(C)]`

It's not *required* to squash nested enums to make the AST `#[repr(C)]`.

But the problem with `#[repr(C)]` is that it disables some compiler
optimizations. Without `#[repr(C)]`, the compiler squashes enums itself
in some cases (which is how `Statement` is currently 16 bytes). But
making the types `#[repr(C)]` as they are currently disables this
optimization.

So this PR essentially makes explicit what the compiler is already doing
- and in fact goes a bit further with the optimization than the compiler
is able to, in squashing 3 or 4 layers of nested enums (the compiler
only does up to 2 layers).

## Implementation

One enum "inheriting" variants from another is implemented with
`inherit_variants!` macro.

```rs
inherit_variants! {
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    
    // `Declaration` variants added here by `inherit_variants!` macro
    @inherit Declaration
    // `ModuleDeclaration` variants added here by `inherit_variants!` macro
    @inherit ModuleDeclaration
}
}
```

The macro is *fairly* lightweight, and I think the above is quite easy
to understand. No proc macros.

The macro also implements utility methods for converting between enums
e.g. `Statement::as_declaration`. These methods are all zero-cost
(essentially transmutes).

New patterns for dealing with nested enums are introduced:

Creation:

```rs
// Old
let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl));

// New
let stmt = Statement::VariableDeclaration(var_decl);
```

Conversion:

```rs
// Old
let stmt = Statement::Declaration(decl);

// New
let stmt = Statement::from(decl);
```

Testing:

```rs
// Old
if matches!(stmt, Statement::Declaration(_)) { }
if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { }

// New
if stmt.is_declaration() { }
if matches!(stmt, Statement::ImportDeclaration(_)) { }
```

Branching:

```rs
// Old
if let Statement::Declaration(decl) = &stmt { decl.do_stuff() };

// New
if let Some(decl) = stmt.as_declaration() { decl.do_stuff() };
```

Matching:

```rs
// Old
match stmt {
    Statement::Declaration(decl) => visitor.visit(decl),
}

// New (exhaustive match)
match stmt {
    match_declaration!(Statement) => visitor.visit(stmt.to_declaration()),
}

// New (alternative)
match stmt {
    _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()),
}
```

New syntax has pluses and minuses vs the old. `match` syntax is worse,
but when working with a deeply nested enum, the code is much nicer -
it's shorter and easier to read.

This PR removes 200 lines from the linter with changes like this:


https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95

```diff
- let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) =
-     &assignment_expr.left
- else {
-     return;
- };
- let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) =
-     simple_assignment_target
+ let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left
else {
    return;
};
```
2024-04-28 20:40:37 +08:00
.cargo ci: run cargo check with --all-features 2024-04-03 16:43:50 +08:00
.github chore: fix renovate error "Use matchDepNames instead of matchPackageNames" 2024-04-28 14:49:23 +08:00
.vscode chore: add some useful informantion log (#1912) 2024-01-06 22:30:01 +08:00
crates refactor(ast): squash nested enums (#3115) 2024-04-28 20:40:37 +08:00
editors/vscode Release oxlint and vscode extension v0.3.1 2024-04-22 16:00:17 +08:00
fuzz chore(fuzz): add a timeout command 2024-02-05 14:41:14 +08:00
napi/parser feat(napi/parser): remove experimental flexbuffer api (#2957) 2024-04-13 14:59:31 +08:00
npm fix(cli): update --format documentation (#3118) 2024-04-28 11:56:10 +08:00
tasks refactor(ast): squash nested enums (#3115) 2024-04-28 20:40:37 +08:00
wasm/parser Release @oxc-parser/wasm v0.1.0 2024-04-08 15:47:51 +08:00
website chore(deps): update pnpm to v9 (#3043) 2024-04-21 15:49:49 +08:00
.git-blame-ignore-revs chore: update .git-blame-ignore-revs 2023-07-28 13:57:29 +08:00
.gitignore feat(tasks): shard benchmarks in CI (#2751) 2024-03-18 10:45:44 +08:00
.ignore chore: add just watch command for overcoming cargo-watch being slow 2023-05-16 13:22:42 +08:00
.taplo.toml chore: format wasm/parser/Cargo.toml 2024-03-14 17:25:53 +08:00
.typos.toml chore: fix typos by ignoring no_unknown_property.rs 2024-04-02 16:27:38 +08:00
Cargo.lock chore: cleanup the dependencies on static_assertions and oxc_index. (#3095) 2024-04-25 16:56:23 +08:00
Cargo.toml chore(index): fork index_vec crate. (#3092) 2024-04-25 06:09:53 +00:00
CHANGELOG.md chore: add CHANGELOG.md 2024-03-26 21:03:22 +08:00
cliff.toml chore: add changelogs via git cliff (#2878) 2024-04-01 20:04:48 +08:00
CONTRIBUTING.md chore(CONTRIBUTING): use the website content 2023-12-16 21:02:52 +08:00
deny.toml wip 2024-03-05 16:25:14 +08:00
justfile chore: update MAINTENANCE.md 2024-04-20 16:59:47 +08:00
LICENSE Change license holder to @boshen 2023-11-10 14:26:11 +08:00
MAINTENANCE.md chore: update MAINTENANCE.md 2024-04-20 16:59:47 +08:00
README.md chore: update README 2024-04-19 23:19:10 +08:00
rust-toolchain.toml chore(deps): update dependency rust to v1.77.2 (#2956) 2024-04-13 06:19:38 +00:00
rustfmt.toml chore(rustfmt): disable all unstable format options 2023-07-27 13:11:46 +08:00
THIRD-PARTY-LICENSE chore(index): fork index_vec crate. (#3092) 2024-04-25 06:09:53 +00:00

OXC Logo

MIT licensed Build Status Code Coverage CodSpeed Badge Sponsors

Discord chat Playground Website

Oxc

The Oxidation Compiler is creating a collection of high-performance tools for JavaScript and TypeScript.

Oxc is building a parser, linter, formatter, transpiler, minifier, resolver ... all written in Rust.

🙋Who's using Oxc?

Linter Quick Start

The linter is ready to catch mistakes for you. It comes with 91 rules turned on by default (out of 300 in total) and no configuration is required.

To get started, run oxlint or via npx:

npx oxlint@latest

To give you an idea of its capabilities, here is an example from the vscode repository, which finishes linting 4800+ files in 0.7 seconds.

Performance

  • The parser aim to be the fastest Rust-based ready-for-production parser.
  • The linter is more than 50 times faster than ESLint, and scales with the number of CPU cores.

⌨️ Programming Usage

Rust

Individual crates are published, you may use them to build your own JavaScript tools.

  • The umbrella crate oxc exports all public crates from this repository.
  • The AST and parser crates oxc_ast and oxc_parser are production ready.
  • The resolver crate oxc_resolver for module resolution is also production ready.
  • Example usages of these crates can be found in their respective crates/*/examples directory.

While Rust has gained a reputation for its comparatively slower compilation speed, we have dedicated significant effort to fine-tune the Rust compilation speed. Our aim is to minimize any impact on your development workflow, ensuring that developing your own Oxc based tools remains a smooth and efficient experience.

This is demonstrated by our CI runs, where warm runs complete in 3 minutes.

Node.js

Wasm


🎯 Tools

🔸 AST and Parser

Oxc maintains its own AST and parser, which is by far the fastest and most conformant JavaScript and TypeScript (including JSX and TSX) parser written in Rust.

As the parser often represents a key performance bottleneck in JavaScript tooling, any minor improvements can have a cascading effect on our downstream tools. By developing our parser, we have the opportunity to explore and implement well-researched performance techniques.

While many existing JavaScript tools rely on estree as their AST specification, a notable drawback is its abundance of ambiguous nodes. This ambiguity often leads to confusion during development with estree.

The Oxc AST differs slightly from the estree AST by removing ambiguous nodes and introducing distinct types. For example, instead of using a generic estree Identifier, the Oxc AST provides specific types such as BindingIdentifier, IdentifierReference, and IdentifierName. This clear distinction greatly enhances the development experience by aligning more closely with the ECMAScript specification.

🏆 Parser Performance

Our benchmark reveals that the Oxc parser surpasses the speed of the swc parser by approximately 3 times and the Biome parser by 5 times.

How is it so fast?
  • AST is allocated in a memory arena (bumpalo) for fast AST memory allocation and deallocation.
  • Short strings are inlined by CompactString.
  • No other heap allocations are done except the above two.
  • Scope binding, symbol resolution and some syntax errors are not done in the parser, they are delegated to the semantic analyzer.

🔸 Linter

The linter embraces convention over configuration, eliminating the need for extensive configuration and plugin setup. Unlike other linters like ESLint, which often require intricate configurations and plugin installations (e.g. @typescript-eslint), our linter only requires a single command that you can immediately run on your codebase:

npx oxlint@latest

🏆 Linter Performance

The linter is 50 - 100 times faster than ESLint depending on the number of rules and number of CPU cores used. It completes in less than a second for most codebases with a few hundred files and completes in a few seconds for larger monorepos. See bench-javascript-linter for details.

As an upside, the binary is approximately 5MB, whereas ESLint and its associated plugin dependencies can easily exceed 100.

You may also download the linter binary from the latest release tag as a standalone binary, this lets you run the linter without a Node.js installation in your CI.

How is it so fast?
  • Oxc parser is used.
  • AST visit is a fast operation due to linear memory scan from the memory arena.
  • Files are linted in a multi-threaded environment, so scales with the total number of CPU cores.
  • Every single lint rule is tuned for performance.

🔸 Resolver

Module resolution plays a crucial role in JavaScript tooling, especially for tasks like multi-file analysis or bundling. However, it can often become a performance bottleneck. To address this, we developed oxc_resolver.

The resolver is production-ready and is currently being used in Rspack and Rolldown. Usage and examples can be found in its own repository.

🔸 Transformer (Transpiler)

A transformer is responsible for turning higher versions of ECMAScript to a lower version that can be used in older browsers. We are currently focusing on the architecture. See Milestone 1 for details.

🔸 Minifier

JavaScript minification plays a crucial role in optimizing website performance as it reduces the amount of data sent to users, resulting in faster page loads. This holds tremendous economic value, particularly for e-commerce websites, where every second can equate to millions of dollars.

However, existing minifiers typically require a trade-off between compression quality and speed. You have to choose between the slowest for the best compression or the fastest for less compression. But what if we could develop a faster minifier without compromising on compression?

We are actively working on a prototype that aims to achieve this goal, by porting all test cases from well-known minifiers such as google-closure-compiler, terser, esbuild, and tdewolff-minify.

Preliminary results indicate that we are on track to achieve our objectives. With the Oxc minifier, you can expect faster minification times without sacrificing compression quality.

🔸 Formatter

While prettier has established itself as the de facto code formatter for JavaScript, there is a significant demand in the developer community for a less opinionated alternative. Recognizing this need, our ambition is to undertake research and development to create a new JavaScript formatter that offers increased flexibility and customization options.

The prototype is currently work in progress.


✍️ Contribute

See CONTRIBUTING.md for guidance.

Check out some of the good first issues or ask us on Discord.

If you are unable to contribute by code, you can still participate by:

📚 Learning Resources

🤝 Credits

This project was incubated with the assistance of these exceptional mentors and their projects:

📖 License

Oxc is free and open-source software licensed under the MIT License.

Oxc ports or copies code from other open source projects, their licenses are listed in Third-party library licenses.