After studying google closure compiler, I'm leaning towards a multi-ast-pass infrastructure for the minifier.
This is one of the few places where we are going to trade maintainability over performance, given the goal of the minifier is compression size not performance.
All of the terminologies and separation of concerns are aligned with google closure compiler.
Infrastructure of `terser` and `esbuild` are not suitable for us to study nor pursuit. Their code are so tightly coupled - I failed to comprehend any of them every time I try to walk through a piece of optmization. Google closure compiler despite being written in Java, it's actually the most readable minifier out there.
To improve performance between ast passes, I envision a change detection system over a portion of the code.
The benchmark will demonstrate the performance regression of running 5 ast passes instead of 2.
To complete this PR, I need to figure out "fix-point" and order of these ast passes.
This PR introduces two type alias to avoid the confusing const generic `pub struct Codegen<'a, const MINIFY: bool>`
* CodeGenerator - Code generator without whitespace removal.
* WhitespaceRemover - Code generator with whitespace removal.
Usage is changed to a builder pattern:
```rust
CodeGenerator::new()
.enable_comment(...)
.enable_sourcemap(...)
.build(&program);
```
The typescript transform pass is now required to strip typescript syntax
for codegen to print things properly.
Codegen will now print whatever is in the AST.
Currently, we lack a test to check if the TS AST has been completely deleted. I have thought of a way to test it. Let's have our idempotency test print the TypeScript code the first time and the second time print the JavaScript code only. If the two results do not match, it means that there are still undeleted TS ASTs or other bugs. Since ideally the TS ASTs are completely deleted, the two results should be the same.
We have a conclusion that codegen will print whatever is in the AST,
instead of having an option to enable printing TypeScript syntax. I plan
to remove codegen's `enable_typescript` option after we strip out all
typescript AST in the transformer typescript plugin.
---------
Co-authored-by: Boshen <boshenc@gmail.com>
The `@jsx: react` is a typescript option. The `Babel` typescript plugin handles @jsx as well, but this is different. `@jsx` in babel is a [pragma](https://babeljs.io/docs/babel-plugin-transform-react-jsx#pragma) option. So we should use code without these meta options to avoid Babel parsing `@jsx` incorrectly
We should not print typescript code as javascript code. Forcing to print as JavaScript code may result in syntax errors. If we truly want javascript code, we can use the `oxc_transformer`.
Move `BabelOptions` to Transformer. The `output.json` is a standard babel configuration. We can reuse BabelOptions to read [babel.config.json](https://babeljs.io/docs/configuration#babelconfigjson) or our configuration(maybe oxc.config.json)
The current `from_babel_options` implementation is copied from the `transform_options` in `test_case.rs`, which I'll completely reimplement next
If submodules are outdated, it'll panic with the following message
```
Repository is outdated, please run `just submodules` to update it.
```
For us maintainers, we'll need the env `UPDATE_SNAPSHOT` to force an update.
part of #3213
We should only have one diagnostic struct instead 353 copies of them, so we don't end up choking LLVM with 50k lines of the same code due to monomorphization.
If the proposed approach is good, then I'll start writing a codemod to turn all the existing structs to plain functions.
---
Background:
Using `--timings`, we see `oxc_linter` is slow on codegen (the purple part).

The crate currently contains 353 miette errors. [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) displays
```
cargo llvm-lines -p oxc_linter --lib --release
Lines Copies Function name
----- ------ -------------
830350 33438 (TOTAL)
29252 (3.5%, 3.5%) 808 (2.4%, 2.4%) <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
23298 (2.8%, 6.3%) 353 (1.1%, 3.5%) miette::eyreish::error::object_downcast
19062 (2.3%, 8.6%) 706 (2.1%, 5.6%) core::error::Error::type_id
12610 (1.5%, 10.1%) 65 (0.2%, 5.8%) alloc::raw_vec::RawVec<T,A>::grow_amortized
12002 (1.4%, 11.6%) 706 (2.1%, 7.9%) miette::eyreish::ptr::Own<T>::boxed
9215 (1.1%, 12.7%) 115 (0.3%, 8.2%) core::iter::traits::iterator::Iterator::try_fold
9150 (1.1%, 13.8%) 1 (0.0%, 8.2%) oxc_linter::rules::RuleEnum::read_json
8825 (1.1%, 14.9%) 353 (1.1%, 9.3%) <miette::eyreish::error::ErrorImpl<E> as core::error::Error>::source
8822 (1.1%, 15.9%) 353 (1.1%, 10.3%) miette::eyreish::error::<impl miette::eyreish::Report>::construct
8119 (1.0%, 16.9%) 353 (1.1%, 11.4%) miette::eyreish::error::object_ref
8119 (1.0%, 17.9%) 353 (1.1%, 12.5%) miette::eyreish::error::object_ref_stderr
7413 (0.9%, 18.8%) 353 (1.1%, 13.5%) <miette::eyreish::error::ErrorImpl<E> as core::fmt::Display>::fmt
7413 (0.9%, 19.7%) 353 (1.1%, 14.6%) miette::eyreish::ptr::Own<T>::new
6669 (0.8%, 20.5%) 39 (0.1%, 14.7%) alloc::raw_vec::RawVec<T,A>::try_allocate_in
6173 (0.7%, 21.2%) 353 (1.1%, 15.7%) miette::eyreish::error::<impl miette::eyreish::Report>::from_std
6027 (0.7%, 21.9%) 70 (0.2%, 16.0%) <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
6001 (0.7%, 22.7%) 353 (1.1%, 17.0%) miette::eyreish::error::object_drop
6001 (0.7%, 23.4%) 353 (1.1%, 18.1%) miette::eyreish::error::object_drop_front
5648 (0.7%, 24.1%) 353 (1.1%, 19.1%) <miette::eyreish::error::ErrorImpl<E> as core::fmt::Debug>::fmt
```
It's totalling more than 50k llvm lines, and is putting pressure on rustc codegen (the purple part on `oxc_linter` in the image above.
---
It's pretty obvious by looking at https://github.com/zkat/miette/blob/main/src/eyreish/error.rs, the generics can expand out to lots of code.
It seems like we need to rebuild the scopes and symbols while
traversing. We can't utilize the scopes and symbols built by semantic
because they are immutable.