Realized we can get the source type from the AST.
The next PR will introduce `unambiguous` to `SourceType` and directly set `Program::source_type` to either `script` or `module`.
Follow-on after #5232. `oxc_wasm` build scopes text with a single AST traversal. Previous implementation was O($n^2$).
If we can assume scopes are listed in traversal order, then we could do it a bit more efficiently just from `ScopeTree`, but this approach of using `Visit` will handle out-of-order scope IDs (which you'd get if printing a post-transform `ScopeTree`).
Also reduce creating and discarding `String`s for indentation - reuse a single string instead.
- Feat: add `source_type` to `ParserOptions`
- Refactor: use plain objects for options instead of `new
OxcRunOptions()` and `new OxcParserOptions()`, allowing easier
serialization.
We no longer need to build semantic data inside the transformer.
The caller should be responsible for handling semantic data and its
errors.
The best way to achieve this in via `CompilerInterface`.
closes#3565
Previously `ScopeTree::get_child_ids` and `ScopeTree::get_child_ids_mut` returned an `Option`. Instead return the unwrapped value, in line with other methods e.g. `get_flags`.
After studying google closure compiler, I'm leaning towards a multi-ast-pass infrastructure for the minifier.
This is one of the few places where we are going to trade maintainability over performance, given the goal of the minifier is compression size not performance.
All of the terminologies and separation of concerns are aligned with google closure compiler.
Infrastructure of `terser` and `esbuild` are not suitable for us to study nor pursuit. Their code are so tightly coupled - I failed to comprehend any of them every time I try to walk through a piece of optmization. Google closure compiler despite being written in Java, it's actually the most readable minifier out there.
To improve performance between ast passes, I envision a change detection system over a portion of the code.
The benchmark will demonstrate the performance regression of running 5 ast passes instead of 2.
To complete this PR, I need to figure out "fix-point" and order of these ast passes.
Make `ScopeId` a type with a niche, like `SymbolId` and `ReferenceId`. This makes `Option<ScopeId>` 4 bytes instead of 8, and shrinks various AST types e.g. `ArrowFunctionExpression` by 8 bytes, and halves the size of the `Vec` in `ScopeTree::parent_ids`.
The snapshot change on `prefer-hooks-in-order` lint rule appears incidental - it doesn't alter what errors are reported, only the order they're reported in. This appears to be because it changes the order of keys in a hashmap keyed by `ScopeId` that [the rule uses](a49f4915de/crates/oxc_linter/src/rules/jest/prefer_hooks_in_order.rs (L143)).
> This PR is (unfortunately) quite large, but all changes are needed in tandem for this to work properly.
## What This PR Does
Updates the linter to populate diagnostics reported by rules with error codes statically derived from `RuleMeta` + `RuleEnum`.
Doing so required changing how we handle vitest rules. I know @mysterven was hoping to refactor that part of the code, and I think this approach is an improvement (but could probably be cleaned up further).
## Changes
### 1. Auto-Populate Error Codes
`LintContext` now sets an error code scope + error code number for diagnostics reported by lint rules. `LintContext` will not clobber existing codes set by rules, allowing for rule-specific override behavior (e.g. to use `eslint-plugin-react-hooks` as an error scope).
In order to accomplish this, I had to update every diagnostic factory for every rule. While doing this I found some incorrect error messages, or messages that could be easily improved. This is where a large majority of the snapshot diffs come from. Additionally, I was able to reduce string allocations from `format!` usages in diagnostic factories, especially within jest rules.
### 2. Framework and Library Detection
This PR adds `FrameworkFlags`, which specify what (if any) set of libraries and frameworks are being used by a project and/or file. They are passed in two ways:
1. `LintOptions` can specify a set of `framework_hints` that apply to the entire target codebase. Right now these are always empty, but I'm thinking in the future we could sniff `package.json`. It may be helpful for enabling/disabling default rules.
2. When `Linter` gets run on a file, framework information is sniffed from the `LintContext`. Right now, we are only checking for `vitest` imports in `ModuleRecord` and test path prefixes from `source_path`. It may be useful to do something similar for React/NextJS rules in the future. I know that [next/no-html-link-for-pages](https://nextjs.org/docs/messages/no-html-link-for-pages) could benefit greatly from this.
This PR adds a new method `build_with_symbols_and_scopes` to make semantic building optional, there may be prior steps that has the semantic data already built.
It has the same spirit as #3747 but with a much simpler approach. I've used the fact that @Boshen mentioned about linter always using CFG so now we assert `semantic.cfg().is_some()` in the `LintContext::new` because of this assertion we can have a `LintContext::cfg` that unwraps unchecked.
Eliminates unnecessary checks in our hot paths.
It has the best of both worlds, No complicated typing yet we still get the CFG as a non-optional value without extra ASM or branching.
This PR introduces two type alias to avoid the confusing const generic `pub struct Codegen<'a, const MINIFY: bool>`
* CodeGenerator - Code generator without whitespace removal.
* WhitespaceRemover - Code generator with whitespace removal.
Usage is changed to a builder pattern:
```rust
CodeGenerator::new()
.enable_comment(...)
.enable_sourcemap(...)
.build(&program);
```
The typescript transform pass is now required to strip typescript syntax
for codegen to print things properly.
Codegen will now print whatever is in the AST.
part of #3213
We should only have one diagnostic struct instead 353 copies of them, so we don't end up choking LLVM with 50k lines of the same code due to monomorphization.
If the proposed approach is good, then I'll start writing a codemod to turn all the existing structs to plain functions.
---
Background:
Using `--timings`, we see `oxc_linter` is slow on codegen (the purple part).

The crate currently contains 353 miette errors. [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) displays
```
cargo llvm-lines -p oxc_linter --lib --release
Lines Copies Function name
----- ------ -------------
830350 33438 (TOTAL)
29252 (3.5%, 3.5%) 808 (2.4%, 2.4%) <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
23298 (2.8%, 6.3%) 353 (1.1%, 3.5%) miette::eyreish::error::object_downcast
19062 (2.3%, 8.6%) 706 (2.1%, 5.6%) core::error::Error::type_id
12610 (1.5%, 10.1%) 65 (0.2%, 5.8%) alloc::raw_vec::RawVec<T,A>::grow_amortized
12002 (1.4%, 11.6%) 706 (2.1%, 7.9%) miette::eyreish::ptr::Own<T>::boxed
9215 (1.1%, 12.7%) 115 (0.3%, 8.2%) core::iter::traits::iterator::Iterator::try_fold
9150 (1.1%, 13.8%) 1 (0.0%, 8.2%) oxc_linter::rules::RuleEnum::read_json
8825 (1.1%, 14.9%) 353 (1.1%, 9.3%) <miette::eyreish::error::ErrorImpl<E> as core::error::Error>::source
8822 (1.1%, 15.9%) 353 (1.1%, 10.3%) miette::eyreish::error::<impl miette::eyreish::Report>::construct
8119 (1.0%, 16.9%) 353 (1.1%, 11.4%) miette::eyreish::error::object_ref
8119 (1.0%, 17.9%) 353 (1.1%, 12.5%) miette::eyreish::error::object_ref_stderr
7413 (0.9%, 18.8%) 353 (1.1%, 13.5%) <miette::eyreish::error::ErrorImpl<E> as core::fmt::Display>::fmt
7413 (0.9%, 19.7%) 353 (1.1%, 14.6%) miette::eyreish::ptr::Own<T>::new
6669 (0.8%, 20.5%) 39 (0.1%, 14.7%) alloc::raw_vec::RawVec<T,A>::try_allocate_in
6173 (0.7%, 21.2%) 353 (1.1%, 15.7%) miette::eyreish::error::<impl miette::eyreish::Report>::from_std
6027 (0.7%, 21.9%) 70 (0.2%, 16.0%) <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
6001 (0.7%, 22.7%) 353 (1.1%, 17.0%) miette::eyreish::error::object_drop
6001 (0.7%, 23.4%) 353 (1.1%, 18.1%) miette::eyreish::error::object_drop_front
5648 (0.7%, 24.1%) 353 (1.1%, 19.1%) <miette::eyreish::error::ErrorImpl<E> as core::fmt::Debug>::fmt
```
It's totalling more than 50k llvm lines, and is putting pressure on rustc codegen (the purple part on `oxc_linter` in the image above.
---
It's pretty obvious by looking at https://github.com/zkat/miette/blob/main/src/eyreish/error.rs, the generics can expand out to lots of code.
It seems like we need to rebuild the scopes and symbols while
traversing. We can't utilize the scopes and symbols built by semantic
because they are immutable.