Commit graph

49 commits

Author SHA1 Message Date
Song Gao
cf3415b0e4
chore(doc): replace main/master to tag/commit to make the url always accessible (#7298) 2024-11-16 21:00:30 +08:00
overlookmotel
9e85b104e2 refactor(parser): add ParserImpl::alloc method (#7063)
Pure refactor. Introduce `ParserImpl::alloc` method. Shorten `self.ast.alloc(...)` to `self.alloc(...)`.

Also reduce `alloc` calls by using `AstBuilder` methods which already allocate where possible.
2024-11-01 17:09:06 +00:00
Boshen
4d8bc8c8af perf(parser): precompute is_typescript (#6443) 2024-10-11 03:39:38 +00:00
Boshen
a4b55bf00e
refactor(parser): use AstBuilder (#5743)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2024-09-13 22:39:48 +08:00
Boshen
603817bef9 feat(oxc)!: add SourceType::Unambiguous; parse .js as unambiguous (#5557)
See https://babel.dev/docs/options#misc-options for background on `unambiguous`

Once `SourceType::Unambiguous` is parsed, it will correctly set the returned `Program::source_type` to either `module` or `script`.
2024-09-07 10:48:58 +00:00
Kevin Deng 三咲智子
234a24c14d
fix(ast)!: merge UsingDeclaration into VariableDeclaration (#5270)
relate #2854
2024-08-28 11:26:05 +08:00
rzvxa
b936162093 refactor(ast/ast_builder)!: shorter allocator utility method names. (#4122)
This PR serves two purposes, First off it would lower the amount of characters we have to type in for a simple operation such as wrapping an expression in a vector. Secondly, it would follow the generated names more closely since nowhere else in the builder we do have `new_xxx`, We always say `xxx` since a builder always constructs something.

```
new_vec -> vec
new_vec_single -> vec1*
new_vec_from_iter -> vec_from_iter
new_vec_with_capacity -> vec_with_capacity
new_str -> str
new_atom -> atom
```

`*` This one is the main motivation behind this PR, It saves 10 characters!
2024-07-09 12:16:38 +00:00
rzvxa
d347aedfda feat(ast)!: generate ast_builder.rs. (#3890)
### Every structure has 2 builder methods:

1. `xxx` e.g. `block_statement`
```rust
    #[inline]
    pub fn block_statement(self, span: Span, body: Vec<'a, Statement<'a>>) -> BlockStatement<'a> {
        BlockStatement { span, body, scope_id: Default::default() }
    }
```
2. `alloc_xxx` e.g. `alloc_block_statement`
```rust
    #[inline]
    pub fn alloc_block_statement(
        self,
        span: Span,
        body: Vec<'a, Statement<'a>>,
    ) -> Box<'a, BlockStatement<'a>> {
        self.block_statement(span, body).into_in(self.allocator)
    }
```

### We generate 3 types of methods for enums:

1. `yyy_xxx` e.g. `statement_block`
```rust
    #[inline]
    pub fn statement_block(self, span: Span, body: Vec<'a, Statement<'a>>) -> Statement<'a> {
        Statement::BlockStatement(self.alloc(self.block_statement(span, body)))
    }
```
2. `yyy_from_xxx` e.g. `statement_from_block`
```rust
    #[inline]
    pub fn statement_from_block<T>(self, inner: T) -> Statement<'a>
    where
        T: IntoIn<'a, Box<'a, BlockStatement<'a>>>,
    {
        Statement::BlockStatement(inner.into_in(self.allocator))
    }
```
3. `yyy_xxx` where `xxx` is inherited e.g. `statement_declaration`
```rust
    #[inline]
    pub fn statement_declaration(self, inner: Declaration<'a>) -> Statement<'a> {
        Statement::from(inner)
    }
```

------------

### Generic parameters:

We no longer accept `Box<'a, ADT>`, `Atom` or `&'a str`, Instead we use `IntoIn<'a, Box<'a, ADT>>`, `IntoIn<'a, Atom<'a>>` and `IntoIn<'a, &'a str>` respectively.
It allows us to rewrite things like this:
```rust
let ident = IdentifierReference::new(SPAN, Atom::from("require"));
let number_literal_expr = self.ast.expression_numeric_literal(
    right_expr.span(),
    num,
    raw,
    self.ast.new_str(num.to_string().as_str()),
    NumberBase::Decimal,
);
```
As this:
```rust
let ident = IdentifierReference::new(SPAN, "require");
let number_literal_expr = self.ast.expression_numeric_literal(
    right_expr.span(),
    num,
    raw,
    num.to_string(),
    NumberBase::Decimal,
);
```
2024-07-09 00:57:26 +00:00
Boshen
d0eac46fc8 refactor(parser): use function instead of trait to parse normal lists (#4003)
To reduce boilerplate and code noise.

relates #3887
2024-07-01 15:57:36 +00:00
Boshen
97d59fc2f3 refactor(parser): move code around for parsing Modifiers (#3849) 2024-06-23 12:46:42 +00:00
Boshen
9b38119ec9 refactor(ast)!: replace Modifiers with declare on VariableDeclaration (#3839)
part of #2958
2024-06-23 10:34:52 +00:00
Boshen
cfcef241db feat(ast)!: add directives field to TSModuleBlock (#3830)
closes #3564
2024-06-22 18:14:08 +00:00
Boshen
dd540c8f0f feat(minifier): add skeleton for ReplaceGlobalDefines ast pass (#3803) 2024-06-21 13:53:59 +00:00
Boshen
6b3d019631 refactor(paresr): move some structs to js module (#3341) 2024-05-18 14:41:32 +00:00
Boshen
9ced605487
refactor(parser): start porting arrow function parsing from tsc (#3340)
relates #3320
2024-05-18 22:35:29 +08:00
Boshen
b27a905958
refactor(parser): simplify Context passing (#3266) 2024-05-14 12:22:27 +08:00
Boshen
2064ae9e0a refactor(parser,diagnostic): one diagnostic struct to eliminate monomorphization of generic types (#3214)
part of #3213

We should only have one diagnostic struct instead 353 copies of them, so we don't end up choking LLVM with 50k lines of the same code due to monomorphization.

If the proposed approach is good, then I'll start writing a codemod to turn all the existing structs to plain functions.

---

Background:

Using `--timings`, we see `oxc_linter` is slow on codegen (the purple part).

![image](https://github.com/zkat/miette/assets/1430279/c1df4f7d-90ef-4c0f-9956-2ec3194db7ca)

The crate currently contains 353 miette errors. [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) displays

```
cargo llvm-lines -p oxc_linter --lib --release

  Lines                 Copies               Function name
  -----                 ------               -------------
  830350                33438                (TOTAL)
   29252 (3.5%,  3.5%)    808 (2.4%,  2.4%)  <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
   23298 (2.8%,  6.3%)    353 (1.1%,  3.5%)  miette::eyreish::error::object_downcast
   19062 (2.3%,  8.6%)    706 (2.1%,  5.6%)  core::error::Error::type_id
   12610 (1.5%, 10.1%)     65 (0.2%,  5.8%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
   12002 (1.4%, 11.6%)    706 (2.1%,  7.9%)  miette::eyreish::ptr::Own<T>::boxed
    9215 (1.1%, 12.7%)    115 (0.3%,  8.2%)  core::iter::traits::iterator::Iterator::try_fold
    9150 (1.1%, 13.8%)      1 (0.0%,  8.2%)  oxc_linter::rules::RuleEnum::read_json
    8825 (1.1%, 14.9%)    353 (1.1%,  9.3%)  <miette::eyreish::error::ErrorImpl<E> as core::error::Error>::source
    8822 (1.1%, 15.9%)    353 (1.1%, 10.3%)  miette::eyreish::error::<impl miette::eyreish::Report>::construct
    8119 (1.0%, 16.9%)    353 (1.1%, 11.4%)  miette::eyreish::error::object_ref
    8119 (1.0%, 17.9%)    353 (1.1%, 12.5%)  miette::eyreish::error::object_ref_stderr
    7413 (0.9%, 18.8%)    353 (1.1%, 13.5%)  <miette::eyreish::error::ErrorImpl<E> as core::fmt::Display>::fmt
    7413 (0.9%, 19.7%)    353 (1.1%, 14.6%)  miette::eyreish::ptr::Own<T>::new
    6669 (0.8%, 20.5%)     39 (0.1%, 14.7%)  alloc::raw_vec::RawVec<T,A>::try_allocate_in
    6173 (0.7%, 21.2%)    353 (1.1%, 15.7%)  miette::eyreish::error::<impl miette::eyreish::Report>::from_std
    6027 (0.7%, 21.9%)     70 (0.2%, 16.0%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    6001 (0.7%, 22.7%)    353 (1.1%, 17.0%)  miette::eyreish::error::object_drop
    6001 (0.7%, 23.4%)    353 (1.1%, 18.1%)  miette::eyreish::error::object_drop_front
    5648 (0.7%, 24.1%)    353 (1.1%, 19.1%)  <miette::eyreish::error::ErrorImpl<E> as core::fmt::Debug>::fmt
```

It's totalling more than 50k llvm lines, and is putting pressure on rustc codegen (the purple part on `oxc_linter` in the image above.

---

It's pretty obvious by looking at https://github.com/zkat/miette/blob/main/src/eyreish/error.rs, the generics can expand out to lots of code.
2024-05-11 04:56:22 +00:00
overlookmotel
7e1fe36c68
refactor(ast): squash nested enums (#3115)
OK, this is a big one...

I have done this as part of work on Traversable AST, but I believe it
has wider benefits, so thought better to spin it off into its own PR.

## What this PR does

This PR squashes all nested AST enum types (#2685).

e.g.: Previously:

```rs
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    Declaration(Declaration<'a>),
}

pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    /* ...other Declaration variants... */
}
```

After this PR:

```rs
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>) = 0,
    /* ...other Statement variants... */

    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}

#[repr(C, u8)]
pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}
```

All `Declaration`'s variants are combined into `Statement`, but
`Declaration` type still exists.

As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a
`Declaration` can be transmuted to a `Statement` at zero cost.

This is the same thing as #2847, but here applied to *all* nested enums
in the AST, and with improved helper methods.

No enums increase in size, and a few get smaller. Indirection is reduced
for some types (this removes multiple levels of boxing).

## Why?

1. It is a prerequisite for Traversable AST (#2987).
2. It would help a lot with AST Transfer (#2409) - it solves the only
remaining blocker for this.
3. It is a step closer to making the whole AST `#[repr(C)]`.

## Why is it a good thing for the AST to be `#[repr(C)]`?

Oxc's direction appears to be increasingly to build up control over the
fundamental primitives we use, in order to unlock performance and
features. We have our own allocator, our own custom implementations for
`Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central
building block of Oxc, and taking control of its memory layout feels
like a step in this same direction.

Oxc has a major advantage over other similar libraries in that it keeps
all the AST data in an arena. This opens the door to treating the AST
either as Rust types or as *pure data* (just bytes). That data can be
moved around and manipulated beyond what Rust natively allows.

However, to enable that, the types need to be well-specified, with
completely stable layouts. `#[repr(C)]` is the only tool Rust provides
to do this.

Once the types are `#[repr(C)]`, various features become possible:

1. Cheap transfer of the AST across boundaries without ser/deser - the
property used by AST Transfer.
2. Having multiple versions of the AST (standard, read-only,
traversable), and these AST representations can be converted to one
other at zero cost via transmute - the property used by Traversable AST
scheme.
3. Caching AST data on disk (#3079) or transferring across network.
4. Stuff we haven't thought of yet!

Allowing the AST to be treated as pure data will likely unlock other
"next level" features further down the track (caching for "edge
bundling" comes to mind).

## The problem with `#[repr(C)]`

It's not *required* to squash nested enums to make the AST `#[repr(C)]`.

But the problem with `#[repr(C)]` is that it disables some compiler
optimizations. Without `#[repr(C)]`, the compiler squashes enums itself
in some cases (which is how `Statement` is currently 16 bytes). But
making the types `#[repr(C)]` as they are currently disables this
optimization.

So this PR essentially makes explicit what the compiler is already doing
- and in fact goes a bit further with the optimization than the compiler
is able to, in squashing 3 or 4 layers of nested enums (the compiler
only does up to 2 layers).

## Implementation

One enum "inheriting" variants from another is implemented with
`inherit_variants!` macro.

```rs
inherit_variants! {
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    
    // `Declaration` variants added here by `inherit_variants!` macro
    @inherit Declaration
    // `ModuleDeclaration` variants added here by `inherit_variants!` macro
    @inherit ModuleDeclaration
}
}
```

The macro is *fairly* lightweight, and I think the above is quite easy
to understand. No proc macros.

The macro also implements utility methods for converting between enums
e.g. `Statement::as_declaration`. These methods are all zero-cost
(essentially transmutes).

New patterns for dealing with nested enums are introduced:

Creation:

```rs
// Old
let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl));

// New
let stmt = Statement::VariableDeclaration(var_decl);
```

Conversion:

```rs
// Old
let stmt = Statement::Declaration(decl);

// New
let stmt = Statement::from(decl);
```

Testing:

```rs
// Old
if matches!(stmt, Statement::Declaration(_)) { }
if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { }

// New
if stmt.is_declaration() { }
if matches!(stmt, Statement::ImportDeclaration(_)) { }
```

Branching:

```rs
// Old
if let Statement::Declaration(decl) = &stmt { decl.do_stuff() };

// New
if let Some(decl) = stmt.as_declaration() { decl.do_stuff() };
```

Matching:

```rs
// Old
match stmt {
    Statement::Declaration(decl) => visitor.visit(decl),
}

// New (exhaustive match)
match stmt {
    match_declaration!(Statement) => visitor.visit(stmt.to_declaration()),
}

// New (alternative)
match stmt {
    _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()),
}
```

New syntax has pluses and minuses vs the old. `match` syntax is worse,
but when working with a deeply nested enum, the code is much nicer -
it's shorter and easier to read.

This PR removes 200 lines from the linter with changes like this:


https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95

```diff
- let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) =
-     &assignment_expr.left
- else {
-     return;
- };
- let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) =
-     simple_assignment_target
+ let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left
else {
    return;
};
```
2024-04-28 20:40:37 +08:00
Boshen
92d709bf21
feat(ast): add CatchParameter node (#3049) 2024-04-21 23:43:39 +08:00
Boshen
240ff19675
refactor(parser): improve parsing of BindingPattern in TypeScript (#2624)
closes #2622
2024-03-06 16:16:03 +08:00
Boshen
be6b8b7ce6
[BREAKING CHANGE] Change Atom to Atom<'a> to make it safe (#2497)
Part of #2295

This PR splits the `Atom` type into `Atom<'a>` and `CompactString`.

All the AST node strings now use `Atom<'a>` instead of `Atom` to signify
it belongs to the arena.

It is now up to the user to select which form of the string to use.

This PR essentially removes the really unsafe code 


93742f89e9/crates/oxc_span/src/atom.rs (L98-L107)

which can lead to 

![image](https://github.com/oxc-project/oxc/assets/1430279/8c513c4f-19b0-4b63-b61c-e07c187c95b5)
2024-02-26 19:34:40 +08:00
overlookmotel
0bdecb5043
refactor(parser): wrapper type for parser (#2339)
Split parser into public interface `Parser` and internal implementation `ParserImpl`.

This involves no changes to public API.

This change is a bit annoying, but justification is that it's required for #2341, which I believe to be very worthwhile.

The `ParserOptions` type also makes it a bit clearer what the defaults for `allow_return_outside_function` and `preserve_parens` are. It came as a surprise to me that `preserve_parens` defaults to `true`, and this refactor makes that a bit more obvious when reading the code.

All the real changes are in [oxc_parser/src/lib.rs](https://github.com/oxc-project/oxc/pull/2339/files#diff-8e59dfd35fc50b6ac9a9ccd991e25c8b5d30826e006d565a2e01f3d15dc5f7cb). The rest of the diff is basically replacing `Parser` with `ParserImpl` everywhere else.
2024-02-07 23:22:08 +08:00
Boshen
4706765d2a
refactor(parser): reduce Token size from 32 to 16 bytes (#1962)
Part of #1880

`Token` size is reduced from 32 to 16 bytes by changing the previous
token value `Option<&'a str>` to a u32 index handle.

It would be nice if this handle is eliminated entirely because
the normal case for a string is always
`&source_text[token.span.start.token.span.end]`

Unfortunately, JavaScript allows escaped characters to appear in
identifiers, strings and templates. These strings need to be unescaped
for equality checks, i.e. `"\a"  === "a"`.

This leads us to adding a `escaped_strings[]` vec for storing these
unescaped and allocated
strings.

Performance regression for adding this vec should be minimal because
escaped strings are rare.

Background Reading:

* https://floooh.github.io/2018/06/17/handles-vs-pointers.html
2024-01-09 15:17:02 +08:00
overlookmotel
eb2966c512
fix(parser): fix incorrectly identified directives (#1885)
Parser incorrectly identifies string literals as directives if they
follow after `import`s, `export`s, or decorators.

In all of these cases, `'use strict'` produces a directive in the AST,
where it should be parsed as an `ExpressionStatement` containing a
`StringLiteral`:

```js
import x from 'foo';
'use strict';
```

```js
export {x};
'use strict';
```

```js
@foo
'use strict';
```


[Playground](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC0gICAgICAgIC0G8rnONK89ITJ3zrK%2FUP7OmSZPgHQzStr3yMtwFTU%2BD1WPt09JgqZJLoYooydbGsM5vGcf34BnIA%3D)

This PR should fix that.

I'm not sure about the decorator case, though. I assume it's not a
directive. But is prefixing a string literal with a decorator even legal
syntax anyway?

And a side nit: If I'm reading it right, I don't think the `continue`
statement in the decorator arm of the match does anything. Do I have
that right?

Last question: Where does one go about putting a test? I guess these
silly cases aren't covered by Babel etc's tests.

---------

Co-authored-by: Boshen <boshenc@gmail.com>
2024-01-04 13:39:15 +00:00
overlookmotel
1feec95a94
fix(parser) fix typo in expecting_directives variable name (#1801)
Renamves `expecting_diretives ` to `expecting_directives` to fix spelling
2023-12-24 16:51:02 +00:00
Boshen
567c6ed757
feat(prettier): print directives (#1497) 2023-11-22 19:39:25 +08:00
Cameron
5b1e1e5408
feat(parser): TypeScript 5.2 (#811)
- adds support for [Using
Declarations](https://devblogs.microsoft.com/typescript/announcing-typescript-5-2/#using-declarations-and-explicit-resource-management)

Closes #786
2023-10-05 12:52:14 +13:00
Yunfei He
35167599bc
refactor(ast): use atom for Directive and Hashbang (#701)
The main reason is using Atom to remove the lifetime for convenience.

And after removing the lifetime of these nodes, the `Program<'a>`
doesn't rely on `&'a source` anymore, which allows us to [specify more
accurate
lifetimes](https://github.com/web-infra-dev/oxc/discussions/700).
2023-08-09 13:52:56 +08:00
Carter Snook
985b8f21d9
feat: support hashbang interpreter comments (#431) 2023-06-11 23:55:58 +08:00
Boshen
a0b09a3f27
refactor(ast): remove RestElement from BindingPattern 2023-05-16 22:25:52 +08:00
Boshen
cd276c2850
feat: add oxc_span crate (#323) 2023-04-27 21:51:15 +08:00
Boshen
fec5aafbf1
refactor(oxc_parser): remove a few unused diagnostics 2023-04-15 18:13:15 +08:00
Boshen
6360bdad31
fix(parser): fix [+in] context in CallArguments (#265)
relates #255
2023-04-06 21:02:30 +08:00
Boshen
1c2acd121c
refactor(parser): clean up parsing of ForStatement (#251)
closes #176
2023-04-04 22:36:35 +08:00
Boshen
236d53ad9d
fix(parser): fix panic on multi-byte char in ExpectCatchFinally error 2023-04-02 20:09:38 +08:00
Boshen
b11f774c41 refactor(oxc_parser): clean up doc 2023-04-01 19:03:33 +08:00
Boshen
d917348f9b refactor(ast,parser): move parsing context from ast to parser 2023-04-01 18:01:33 +08:00
Boshen
f2fcbb30c3 refactor(oxc_parser): removed not needed generic from unexpected function 2023-04-01 15:59:42 +08:00
Boshen
174330561c
fix(parser): fix panic on multi-byte characaters (#233)
* fix(oxc_parser): fix panic when EOF on a multi-byte character

relates #232

* fix(parser): fix panic on multi-byte char in private identifer

relates #232
2023-04-01 13:34:18 +08:00
Boshen
d6e8c6fb2f
feat(parser): check ReturnStatement in return context 2023-03-12 23:30:32 +08:00
Boshen
c2f760f1ed
chore: run `types -w" to fix all typos 2023-03-11 23:37:19 +08:00
Fnll
81760da7cc
feat(parser): better diagnostic for missing semicolon in for loop statement (#133)
feat(parser): better diagnostic

Co-authored-by: kerui.lian <kerui.lian@bytedance.com>
2023-03-05 04:13:23 -08:00
Ye Yangchen
b06ab627bf fix(oxc_parser) correct span for decorators 2023-03-02 21:34:24 -08:00
Ye Yangchen
d8c6caf57f feat(oxc_parser): Parse modifiers before declaration 2023-03-01 22:50:23 -08:00
Ye Yangchen
0bf8f817f5 feat(oxc_parser): Port isStartOfDeclaration form tsc 2023-02-27 12:27:44 +08:00
Boshen
4f4a9802b7 refactor(diagnostics,parser): move diagnostics to parser 2023-02-22 19:23:01 +08:00
Boshen
4c6407b152 refactor(ast): s/node/span
This corrects the jargon for span. The term `node` came from `estree`,
which is a bit misleading here in Rust.

closes #9
2023-02-21 19:17:49 +08:00
Boshen
a733856536 refactor(ast,parser): use u32 for node spans
The next PR will fix the jargon where Node = Span.

relates to #9
2023-02-21 16:02:23 +08:00
Boshen
1fdc635638 feat(parser): add parser 2023-02-11 05:26:49 -08:00