The lexer benchmarks had a problem. The lexer alone cannot make sense of regexp literals, template literals, or JSX text elements - it needs the parser "driving" it.
So lexer was producing plenty of errors on some benchmarks. This is unrealistic - when driven by the parser, the lexer produces no errors. Generating diagnostics is relatively expensive, so this was skewing the benchmarks somewhat.
Solve this by cleaning up the input source text to replace these syntaxes with string literals prior to running the benchmarks.
Unfortunately lexer benchmarks don't exercise the code paths for these syntaxes, but there isn't much we can do about that. We can judge by the parser benchmarks, which are the more important ones anyway.
More simplification/preparations for nested configurations:
1. renames `LinterBuilder` to `ConfigStoreBuilder`
2. moves the `options` logic out of `LintBuilder`
3. make `ConfigStoreBuilder::build()` return a result (currently always returns OK, but it will return errors when nested config is implemented
The next steps to implement nested config which i hope to do in the next week are:
1. refactor the `from_oxlintrc` to accept a file path
2. introduce a new method on `ConfigStoreBuilder` (name TBC) that walks all child directories, collecting `.oxlintrc` files. these will be put into `ConfigStore` as a hash map of path > config.
Pure refactor. Use `bench_function` instead of `bench_with_input` and
just borrow data from outside closure. This shortens the code and (I
think) makes it easier to read.
The parser returns a simple `ModuleRecord` that is allocated in the arena for performance reasons.
The linter uses a more complicated, `Send` + `Sync` `ModuleRecord` that will hold more cross-module information.
The next step is to return more esm information from the parser to eliminated the need of the `oxc_module_lexer` crate.
This has the benefit of:
* expose dynamic import / import meta info from parser
* 1 less ast shallow in semantic builder
* no ast walk in oxc's module lexer
* some more benefits coming soon
Firstly, a massive thanks to @alisnic for starting this (incredibly complicated) lint rule in https://github.com/oxc-project/oxc/pull/2637 !
still a draft. current state:
3x false positives (all todo with refs)
3x false negatives (TS will catch these
13x false negatvies todo with refs
1x false negative TODO
closes #2637
relates to #2174
> Closes#5046
This PR migrates the linter crate and oxlint app to use the new `LinterBuilder` API. This PR has the following effects:
1. `plugins` in oxlint config files are now supported
2. irons out weirdness when using CLI args and config files together. CLI args are now always applied on top of config file settings, overriding them.
# Breaking Changes
Before, config files would override rules set in CLI arguments. For example, running this command:
```sh
oxlint -A correctness -c oxlintrc.json
```
With this config file
```jsonc
// oxlintrc.json
{
"rules": {
"no-const-assign": "error"
}
}
```
Would result in a single rule, `no-const-assign` being turned on at an error level with all other rules disabled (i.e. set to "allow").
Now, **CLI arguments will override config files**. That same command with the same config file will result with **all rules being disabled**.
## Details
For a more in-depth explanation, assume we are running the below command using the `oxlintrc.json` file above:
```sh
oxlint -A all -W correctness -A all -W suspicious -c oxlintrc.json
```
### Before
> Note: GitHub doesn't seem to like deeply nested lists. Apologies for the formatting.
Here was the config resolution process _before_ this PR:
<details><summary>Before Steps</summary>
1. Start with a default set of filters (["correctness", "warn"]) if no filters were passed to the CLI. Since some were, the filter list starts out empty.
2. Apply each filter taken from the CLI from left to right. When a filter allows a rule or category, it clears the configured set of rules. So applying those filters looks like this
a. start with an empty list `[]`
b. `("all", "allow")` -> `[]`
c. `("correctness", "warn")` -> `[ <correctness rules> ]`
d. `("all", "allow")` -> `[]`
e. `("suspicious", "warn")` -> `[ <suspicious rules> ]`. This is the final rule set for this step
3. Apply overrides from `oxlintrc.json`. This is where things get a little funky, as mentioned in point 2 of what this PR does. At this point, all rules in the rules list are only from the CLI.
a. If a rule is only set in the CLI and is not present in the config file, there's no effect
b. If a rule is in the config file but not the CLI, it gets inserted into the list.
c. If a rule is already in the list and in the config file
i. If the rule is only present once (e.g. `"no-loss-of-precision": "error"`), unconditionally override whatever was in the CLI with what was set in the config file
ii. If the rule is present twice (e.g. `"no-loss-of-precision": "off", "@typescript-eslint/no-loss-of-precision": "error"`),
a. if all rules in the config file are set to `allow`, then turn the rule off
b. If one of them is `warn` or `deny`, then update the currently-set rule's config object, but _leave its severity alone_.
So, for our example, the final rule set would be `[<all suspicious rules: "warn">, no-const-assign: "error"]`
</details>
### After
Rule filters were completely re-worked in a previous PR. Now, lint filters aren't kept on hand-only the rule list is.
<details><summary>After Steps</summary>
1. Start with the default rule set, which is all correctness rules for all enabled plugins (`[<all correctness rules: "warn">]`)
2. Apply configuration from `oxlintrc.json` _first_.
a. If the rule is warn/deny exists in the list already, update its severity and config object. If it's not in the list, add it.
b. If the rule is set to allow, remove it from the list.
The list is now `[<all correctness rules except no-const-assign: "warn">, no-const-assign: "error"]`.
3. Apply each filter taken from the CLI from left to right. This works the same way as before. So, after they're applied, the list is now `[<suspicous rules: "warn">]`
</details>
Minifier benchmark currently includes the time to run parser and semantic as well as the minifier itself. Similar to transformer benchmark, only measure the minifier itself in the benchmark.
This will give us a clearer picture of performance changes, and should also reduce variance in the benchmarks.
Add `SemanticBuilder::with_excess_capacity` method to request that `SemanticBuilder` over-allocate space in `Semantic`'s `Vec`s.
Use this method to reserve 200% extra capacity for transformer to create more scopes, symbols and references.
200% is an unscientific guess of how much extra capacity is required. Obviously it depends on what transforms are enabled and content of the source code.
Realized we can get the source type from the AST.
The next PR will introduce `unambiguous` to `SourceType` and directly set `Program::source_type` to either `script` or `module`.
Fixing code is an important part of linter logic. We want to make sure fixers for each rule, and the code responsible for applying those fixes, are included in benchmarks.
As it currently stands, fixer closures are applied regardless of whether the user wants fixers to be applied. However, this is an implementation detail and is subject to change. I also want to bench the performance of `Fixer`.
Transformer benchmark measure only the transformer itself. Parse, generate `Semantic`, and drop `Semantic` outside of the measured section.
This should:
1. Give us greater visibility of how changes to transformer affect its performance.
2. Reduce variance in transformer benchmarks, since they will no longer include the variance introduced by `SemanticBuilder`.
Not ready to merge yet. We should first add an "end to end" benchmark testing the entire compilation process (parse - semantic - transform - minify - codegen).
This PR depends on https://github.com/Boshen/criterion2.rs/pull/49. This PR currently makes Oxc's dependency on `criterion2` a git dependency on that PR's branch. That can be changed once the upstream PR is merged.
Include dropping `Semantic` in semantic benchmark measure.
If you create a `Semantic`, you have to drop it, so the time it takes to
drop is part of the cost of using this API, and we should be working to
reduce it. Therefore I think it should be included in the benchmark.
It'll be interesting to see what effect a PR like #5232 which removes a
bunch of `Vec`s from `Semantic` has on the drop time.
We no longer need to build semantic data inside the transformer.
The caller should be responsible for handling semantic data and its
errors.
The best way to achieve this in via `CompilerInterface`.
closes#3565
Introduce new method `ConcatSourceMapBuilder::from_sourcemaps`.
Where all the sourcemaps being concatenated exist at time that you
create `ConcatSourceMapBuilder`, it's faster to use `from_sourcemaps`,
because it pre-allocates enough space for the data it will hold and so
avoids memory copying.
Before:
```rs
let mut builder = ConcatSourceMapBuilder::default();
builder.add_sourcemap(&sourcemap1, 0);
builder.add_sourcemap(&sourcemap2, 100);
builder.add_sourcemap(&sourcemap3, 100);
let combined = builder.into_sourcemap();
```
After:
```rs
let builder = ConcatSourceMapBuilder::from_sourcemaps(&[
(&sourcemap1, 0),
(&sourcemap2, 100),
(&sourcemap3, 200),
]);
let combined = builder.into_sourcemap();
```
After studying google closure compiler, I'm leaning towards a multi-ast-pass infrastructure for the minifier.
This is one of the few places where we are going to trade maintainability over performance, given the goal of the minifier is compression size not performance.
All of the terminologies and separation of concerns are aligned with google closure compiler.
Infrastructure of `terser` and `esbuild` are not suitable for us to study nor pursuit. Their code are so tightly coupled - I failed to comprehend any of them every time I try to walk through a piece of optmization. Google closure compiler despite being written in Java, it's actually the most readable minifier out there.
To improve performance between ast passes, I envision a change detection system over a portion of the code.
The benchmark will demonstrate the performance regression of running 5 ast passes instead of 2.
To complete this PR, I need to figure out "fix-point" and order of these ast passes.
Make 2 changes to sourcemap benchmark:
1. Move counting line breaks in output text to outside of the measured loop. This operation is reasonably expensive, and isn't part of what we're trying to measure.
2. It looks like benchmark is trying to measure concatenating 2 source maps, but `for i in 0..1` was actually only concatenating 1.