This PR adds a new edge type called `Jump` to distinguish between normal edges and jumps.
There is also a control flow context which is used to keep track of cfg scopes and labels. It replaces the old `preserve_state` and `restore_state`.
It corrects some mistakes - such as labeled blocks especially labeled continue which wasn't easy to implement with the old approach - in the old control flow but other than that it is mostly refactored to have a more declarative API instead of a procedural approach.
I've replaced the `BasicBlockElement` with an `Instruction` type which would keep both the instruction kind and its associated AstNodeId.
I also removed the register scheme in the control flow in favor of a simpler approach using explicit enums.
https://github.com/oxc-project/oxc/pull/3381#issuecomment-2126622774
Set `reference_id` for references to new imported bindings. e.g. `_jsx`
in `_jsx(Foo, {})` where JSX transform has inserted `import {jsx as
_jsx} from "react/jsx-runtime";`.
Use the `TraverseCtx::generate_uid` method introduced in #3395 to fix
some of the TS namespace test cases.
But... I honestly have no idea what I'm doing here!
I am running up against a combination of 3 different areas where I know
very little:
1. I am unfamiliar with `Semantic`.
2. I am unfamiliar with Oxc's conventions for writing transforms.
3. I don't "speak" Typescript!
This PR should not be merged as is. I'm pretty sure it's wrong. I've
done it mostly just to test out `generate_uid`.
In particular:
1. Is `SymbolFlags::FunctionScopedVariable` the right flags to use in
all cases here?
2. `generate_uid` creates a symbol, and registers a binding for it in
the scope tree. So if there are any branches in this logic where `name`
doesn't actually get used after `generate_uid` is called, then it's not
right.
3. Identifiers which are added to AST should also have corresponding
`ReferenceId`s created for them (but I imagine this is also missing in
many other places in the transformer).
On point 2: We could split up `generate_uid` into 2 functions. One
function would find a valid UID name *but not register a binding for
it*. If you want to actually use that name, you'd call a 2nd function to
generate the binding. Would that be a better API?
Could someone help me out to progress this please?
(Sorry my lack of knowledge is a bit useless here. I will learn all
these things in time, but just trying to be honest about where I'm at
right now. I'm sure I could figure it out myself, but it would take me
hours, whereas others will probably look at it and know what to do in
about 5 mins.)
Add `TraverseCtx::generate_uid` method.
This is modelled on Babel's `scope.generateUid()` method. As discussed
in
https://github.com/oxc-project/oxc/discussions/3251#discussioncomment-9416826,
this is required to fix most of the remaining failing tests in
transformer Milestone 1.
I have implemented this to work as closely as possible to Babel, so that
it will generate same output as Babel for our tests. However, as
mentioned in the code comments, this means it's a pretty expensive
function to call. Those code comments suggest 2 ways in which we could
make it much more efficient, but we'd need to decide how we're going to
handle divergence from Babel before we can decide which route to go.
I've left it as a `TODO(improve-on-babel)` for now.
It is a simple change, Before this, we had a lot of fields in our
control flow graph that were only used during the build process. This PR
aims to simplify the ControlFlowGraph.
PS: Sorry for the long branch name, Apprerantly graphite gets confused
when you write the commit body using the `gt create` command.
`oxc_semantic` populate `scope_id` fields in AST nodes as it walks the tree.
This does produce some duplication - scope IDs are stored both in the AST itself, and in `AstNode`. Will clean this up later on.
OK, this is a big one...
I have done this as part of work on Traversable AST, but I believe it
has wider benefits, so thought better to spin it off into its own PR.
## What this PR does
This PR squashes all nested AST enum types (#2685).
e.g.: Previously:
```rs
pub enum Statement<'a> {
BlockStatement(Box<'a, BlockStatement<'a>>),
/* ...other Statement variants... */
Declaration(Declaration<'a>),
}
pub enum Declaration<'a> {
VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
/* ...other Declaration variants... */
}
```
After this PR:
```rs
#[repr(C, u8)]
pub enum Statement<'a> {
BlockStatement(Box<'a, BlockStatement<'a>>) = 0,
/* ...other Statement variants... */
VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
/* ...other Declaration variants... */
}
#[repr(C, u8)]
pub enum Declaration<'a> {
VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
/* ...other Declaration variants... */
}
```
All `Declaration`'s variants are combined into `Statement`, but
`Declaration` type still exists.
As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a
`Declaration` can be transmuted to a `Statement` at zero cost.
This is the same thing as #2847, but here applied to *all* nested enums
in the AST, and with improved helper methods.
No enums increase in size, and a few get smaller. Indirection is reduced
for some types (this removes multiple levels of boxing).
## Why?
1. It is a prerequisite for Traversable AST (#2987).
2. It would help a lot with AST Transfer (#2409) - it solves the only
remaining blocker for this.
3. It is a step closer to making the whole AST `#[repr(C)]`.
## Why is it a good thing for the AST to be `#[repr(C)]`?
Oxc's direction appears to be increasingly to build up control over the
fundamental primitives we use, in order to unlock performance and
features. We have our own allocator, our own custom implementations for
`Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central
building block of Oxc, and taking control of its memory layout feels
like a step in this same direction.
Oxc has a major advantage over other similar libraries in that it keeps
all the AST data in an arena. This opens the door to treating the AST
either as Rust types or as *pure data* (just bytes). That data can be
moved around and manipulated beyond what Rust natively allows.
However, to enable that, the types need to be well-specified, with
completely stable layouts. `#[repr(C)]` is the only tool Rust provides
to do this.
Once the types are `#[repr(C)]`, various features become possible:
1. Cheap transfer of the AST across boundaries without ser/deser - the
property used by AST Transfer.
2. Having multiple versions of the AST (standard, read-only,
traversable), and these AST representations can be converted to one
other at zero cost via transmute - the property used by Traversable AST
scheme.
3. Caching AST data on disk (#3079) or transferring across network.
4. Stuff we haven't thought of yet!
Allowing the AST to be treated as pure data will likely unlock other
"next level" features further down the track (caching for "edge
bundling" comes to mind).
## The problem with `#[repr(C)]`
It's not *required* to squash nested enums to make the AST `#[repr(C)]`.
But the problem with `#[repr(C)]` is that it disables some compiler
optimizations. Without `#[repr(C)]`, the compiler squashes enums itself
in some cases (which is how `Statement` is currently 16 bytes). But
making the types `#[repr(C)]` as they are currently disables this
optimization.
So this PR essentially makes explicit what the compiler is already doing
- and in fact goes a bit further with the optimization than the compiler
is able to, in squashing 3 or 4 layers of nested enums (the compiler
only does up to 2 layers).
## Implementation
One enum "inheriting" variants from another is implemented with
`inherit_variants!` macro.
```rs
inherit_variants! {
#[repr(C, u8)]
pub enum Statement<'a> {
BlockStatement(Box<'a, BlockStatement<'a>>),
/* ...other Statement variants... */
// `Declaration` variants added here by `inherit_variants!` macro
@inherit Declaration
// `ModuleDeclaration` variants added here by `inherit_variants!` macro
@inherit ModuleDeclaration
}
}
```
The macro is *fairly* lightweight, and I think the above is quite easy
to understand. No proc macros.
The macro also implements utility methods for converting between enums
e.g. `Statement::as_declaration`. These methods are all zero-cost
(essentially transmutes).
New patterns for dealing with nested enums are introduced:
Creation:
```rs
// Old
let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl));
// New
let stmt = Statement::VariableDeclaration(var_decl);
```
Conversion:
```rs
// Old
let stmt = Statement::Declaration(decl);
// New
let stmt = Statement::from(decl);
```
Testing:
```rs
// Old
if matches!(stmt, Statement::Declaration(_)) { }
if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { }
// New
if stmt.is_declaration() { }
if matches!(stmt, Statement::ImportDeclaration(_)) { }
```
Branching:
```rs
// Old
if let Statement::Declaration(decl) = &stmt { decl.do_stuff() };
// New
if let Some(decl) = stmt.as_declaration() { decl.do_stuff() };
```
Matching:
```rs
// Old
match stmt {
Statement::Declaration(decl) => visitor.visit(decl),
}
// New (exhaustive match)
match stmt {
match_declaration!(Statement) => visitor.visit(stmt.to_declaration()),
}
// New (alternative)
match stmt {
_ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()),
}
```
New syntax has pluses and minuses vs the old. `match` syntax is worse,
but when working with a deeply nested enum, the code is much nicer -
it's shorter and easier to read.
This PR removes 200 lines from the linter with changes like this:
https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95
```diff
- let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) =
- &assignment_expr.left
- else {
- return;
- };
- let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) =
- simple_assignment_target
+ let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left
else {
return;
};
```
Adds a way to fetch the root node without iterating over all ancestors
which has a nondeterministic time - best case O(1) worst case O(n!) - It
is only possible to set this field in the semantic builder.
This PR aims to support these cases.
````js
/**
* This is normal comment, `@xxx` should not parsed as tag.
*
* @example ```ts
// @comment
@decoratorInComment
class Foo { }
```
*/
````
Only `@example` should be parsed as tag.
## Why
Due to the usage of `&'alloc mut T` in `oxc_allocator::Box`, and
`bumpalo::collections::Vec` in `oxc_allocator::Vec`, ast types are
currently invariant over their allocator lifetime `'a`. This prevents
`ouroboros` from generating `borrow_*` on ast type fields, leading to
the unfriendly `with_*` api:
c250b288ef/crates/oxc_parser/examples/multi-thread.rs (L82-L84)
## How
- For `oxc_allocator::Vec`, switch to `allocator_api2::vec::Vec`, which
has a covariant relationship with the allocator lifetime.
- For `oxc_allocator::Box`, use `std::ptr::NonNull` which is
specifically designed to be covariant. I don't use
`allocator_api2::boxed::Box` because it holds the allocator for
dropping, so the size is bigger.
## Downside
Now that `oxc_allocator::Box` uses the unsafe `NonNull`. It has to be a
private field to be safe. This make it impossible to do `Box(....)`
pattern matching.
> The error message emphasizes "empty text" so I would put the span on
the extra text.
> https://github.com/oxc-project/oxc/pull/2893#discussion_r1548843621
To address this, special `Span` handling should be implemented for
comment part.
So, this PR introduces:
- `JSDocCommentPart` struct holds raw `.span` and special
`.span_trimmed_first_line()`
- Add `JSDocKindPart`, `JSDocTypePart` and `JSDocTypeNamePart` in the
same manner
- `JSDocTag` uses these depending on the purpose
# What This PR Does
Symbols declared inside exported functions and classes were being
incorrectly flagged with `SymbolFlags::Export`.
For example,
```ts
export function foo<T>(a: T) {
let b = String(a)
return b
}
```
`T`, `a`, and `b` were all flagged as exported.
## Further Work
It doesn't seem like exported enums and interfaces are being included in
`ModuleRecord`. Am I looking in the wrong place, or are they actually
missing?