Fixing code is an important part of linter logic. We want to make sure fixers for each rule, and the code responsible for applying those fixes, are included in benchmarks.
As it currently stands, fixer closures are applied regardless of whether the user wants fixers to be applied. However, this is an implementation detail and is subject to change. I also want to bench the performance of `Fixer`.
Transformer benchmark measure only the transformer itself. Parse, generate `Semantic`, and drop `Semantic` outside of the measured section.
This should:
1. Give us greater visibility of how changes to transformer affect its performance.
2. Reduce variance in transformer benchmarks, since they will no longer include the variance introduced by `SemanticBuilder`.
Not ready to merge yet. We should first add an "end to end" benchmark testing the entire compilation process (parse - semantic - transform - minify - codegen).
This PR depends on https://github.com/Boshen/criterion2.rs/pull/49. This PR currently makes Oxc's dependency on `criterion2` a git dependency on that PR's branch. That can be changed once the upstream PR is merged.
Include dropping `Semantic` in semantic benchmark measure.
If you create a `Semantic`, you have to drop it, so the time it takes to
drop is part of the cost of using this API, and we should be working to
reduce it. Therefore I think it should be included in the benchmark.
It'll be interesting to see what effect a PR like #5232 which removes a
bunch of `Vec`s from `Semantic` has on the drop time.
We no longer need to build semantic data inside the transformer.
The caller should be responsible for handling semantic data and its
errors.
The best way to achieve this in via `CompilerInterface`.
closes#3565
Introduce new method `ConcatSourceMapBuilder::from_sourcemaps`.
Where all the sourcemaps being concatenated exist at time that you
create `ConcatSourceMapBuilder`, it's faster to use `from_sourcemaps`,
because it pre-allocates enough space for the data it will hold and so
avoids memory copying.
Before:
```rs
let mut builder = ConcatSourceMapBuilder::default();
builder.add_sourcemap(&sourcemap1, 0);
builder.add_sourcemap(&sourcemap2, 100);
builder.add_sourcemap(&sourcemap3, 100);
let combined = builder.into_sourcemap();
```
After:
```rs
let builder = ConcatSourceMapBuilder::from_sourcemaps(&[
(&sourcemap1, 0),
(&sourcemap2, 100),
(&sourcemap3, 200),
]);
let combined = builder.into_sourcemap();
```
After studying google closure compiler, I'm leaning towards a multi-ast-pass infrastructure for the minifier.
This is one of the few places where we are going to trade maintainability over performance, given the goal of the minifier is compression size not performance.
All of the terminologies and separation of concerns are aligned with google closure compiler.
Infrastructure of `terser` and `esbuild` are not suitable for us to study nor pursuit. Their code are so tightly coupled - I failed to comprehend any of them every time I try to walk through a piece of optmization. Google closure compiler despite being written in Java, it's actually the most readable minifier out there.
To improve performance between ast passes, I envision a change detection system over a portion of the code.
The benchmark will demonstrate the performance regression of running 5 ast passes instead of 2.
To complete this PR, I need to figure out "fix-point" and order of these ast passes.
Make 2 changes to sourcemap benchmark:
1. Move counting line breaks in output text to outside of the measured loop. This operation is reasonably expensive, and isn't part of what we're trying to measure.
2. It looks like benchmark is trying to measure concatenating 2 source maps, but `for i in 0..1` was actually only concatenating 1.
> This PR is (unfortunately) quite large, but all changes are needed in tandem for this to work properly.
## What This PR Does
Updates the linter to populate diagnostics reported by rules with error codes statically derived from `RuleMeta` + `RuleEnum`.
Doing so required changing how we handle vitest rules. I know @mysterven was hoping to refactor that part of the code, and I think this approach is an improvement (but could probably be cleaned up further).
## Changes
### 1. Auto-Populate Error Codes
`LintContext` now sets an error code scope + error code number for diagnostics reported by lint rules. `LintContext` will not clobber existing codes set by rules, allowing for rule-specific override behavior (e.g. to use `eslint-plugin-react-hooks` as an error scope).
In order to accomplish this, I had to update every diagnostic factory for every rule. While doing this I found some incorrect error messages, or messages that could be easily improved. This is where a large majority of the snapshot diffs come from. Additionally, I was able to reduce string allocations from `format!` usages in diagnostic factories, especially within jest rules.
### 2. Framework and Library Detection
This PR adds `FrameworkFlags`, which specify what (if any) set of libraries and frameworks are being used by a project and/or file. They are passed in two ways:
1. `LintOptions` can specify a set of `framework_hints` that apply to the entire target codebase. Right now these are always empty, but I'm thinking in the future we could sniff `package.json`. It may be helpful for enabling/disabling default rules.
2. When `Linter` gets run on a file, framework information is sniffed from the `LintContext`. Right now, we are only checking for `vitest` imports in `ModuleRecord` and test path prefixes from `source_path`. It may be useful to do something similar for React/NextJS rules in the future. I know that [next/no-html-link-for-pages](https://nextjs.org/docs/messages/no-html-link-for-pages) could benefit greatly from this.
This PR adds a new method `build_with_symbols_and_scopes` to make semantic building optional, there may be prior steps that has the semantic data already built.
This PR introduces two type alias to avoid the confusing const generic `pub struct Codegen<'a, const MINIFY: bool>`
* CodeGenerator - Code generator without whitespace removal.
* WhitespaceRemover - Code generator with whitespace removal.
Usage is changed to a builder pattern:
```rust
CodeGenerator::new()
.enable_comment(...)
.enable_sourcemap(...)
.build(&program);
```
This PR builds on #3201 to further speed up the benchmarks and reduce CI
time.
* Build and run each benchmark as separate job (like before).
* But now each bench is only built with the dependencies it needs.
* For linter benchmarks, build benchmark in 1 job (like #3201 does).
* Run each linter fixture in a separate job as they're slow.
This reduces total time to complete benchmarks from between 6m-7m to
~4m40s.
All the individual jobs complete in under 1m30s, except for building
linter benchmark which takes 2m30s. So there won't be the problem of
blocking the CI queue that there was before.
NB: I did try this before, and didn't see a benefit. But I realized
today what I was doing wrong - it only works once the caches are
populated by a previous run on main branch.
So the CI times in this PR won't look good, but once it's merged to
main, it will take effect. Here it is running on main branch of my fork:
https://github.com/overlookmotel/oxc/actions/runs/9030511348
I also added a step to delete the temp artefacts which aren't needed
once the run has completed.
It seems like we need to rebuild the scopes and symbols while
traversing. We can't utilize the scopes and symbols built by semantic
because they are immutable.
Re-use allocator in parser + lexer benchmarks.
I believe this is the recommended usage when parsing a bunch of files -
to re-use one allocator rather than create a fresh one for each run, so
it makes sense to me that this is what the benchmark should measure.
Doesn't show much difference on CodSpeed because it only runs the
benchmark once, and it treats allocations as free anyway. But I imagine
the difference may show up a bit more in a standard criterion benchmark.
The sourcemap implement port from
[rust-sourcemap](https://github.com/getsentry/rust-sourcemap), but has
some different with it.
- Encode sourcemap at parallel, including quote `sourceContent` and
encode token to `vlq` mappings.
- Avoid `Sourcemap` some methods overhead, like `SourceMap::tokens()`
caused extra overhead at common cases. Here using `SourceViewToken` to
instead of it.
Add NodeJS parser to benchmarks.
Previous attempt #2724 did not work due CodSpeed producing very
inaccurate results (https://github.com/CodSpeedHQ/action/issues/96).
This version runs the actual benchmarks without CodSpeed's
instrumentation. Then another faux-benchmark runs within Codspeed's
instrumented action and just performs meaningless calculations in a loop
for as long as is required to take same amount of time as the original
uninstrumented benchmarks took.
It's unfortunate that we therefore don't get flame graphs on CodSpeed,
but this seems to be the best we can do for now.
Follow-on from #2751. Further shards linter benchmarks so each fixture runs in its own job.
This reduces total time to run benchmarks by another ~75 secs. So approx 2.5 mins shaved off in total.
Introduce invariant that only a single `lexer::Source` can exist on a thread at one time.
This is a preparatory step for #2341.
2 notes:
Restriction is only 1 x `ParserImpl` / `Lexer` / `Source` on 1 *thread* at a time, not globally. So this does not prevent parsing multiple files simultaneously on different threads.
Restriction does not apply to public type `Parser`, only `ParserImpl`. `ParserImpl`s are not created in created in `Parser::new`, but instead in `Parser::parse`, where they're created and then immediately consumed. So the end user is also free to create multiple `Parser` instances (if they want to for some reason) on the same thread.
This PR adds benchmarks for the lexer. I'm doing some work on optimizing
the lexer and I thought it'd be useful to see the effects of changes in
isolation, separate from the parser.
These benchmarks may not be ideal to keep long-term, but for now it'd be
useful.
In order to do so, it's necessary for `oxc_parser` crate to expose the
lexer, but have done that without adding it to the docs, and using an
alias `__lexer`.
When we developed linter for #1141 , we needed to configure some
settings for `jsx-a11y`, which was not supported before, but I am trying
to support it now.
like this:
```
fn config() -> serde_json::Value {
serde_json::json!([2,{
"ignoreNonDOM": true
}])
}
fn settings() -> serde_json::Value {
serde_json::json!({
"jsx-a11y": {
"components": {
"Button": "button",
}
}
})
}
let pass = vec![
("<Button />", Some(config()), Some(settings())),
];
```