Commit graph

1954 commits

Author SHA1 Message Date
overlookmotel
0be8397c77
perf(parser): optimize lexing strings (#2366)
Optimize lexing strings a bit.
2024-02-09 23:52:45 +08:00
Boshen
d6d921ea1f
Publish crates v0.7.0 2024-02-09 23:01:12 +08:00
Boshen
70c983d4bb
feat(prettier): print rest parameters (#2370) 2024-02-09 22:59:28 +08:00
Boshen
9b6c313ebf
fix(prettier): flatten all binary expressions for now (#2367) 2024-02-09 21:48:27 +08:00
overlookmotel
c0d1d6b08a
perf(parser): lex strings as bytes (#2357)
Lex string literals as bytes, using same techniques as for identifiers.

Handling escapes could be optimized a bit more, and maybe I'll return to that, but as escapes are fairly rare, it wouldn't be the biggest gain.
2024-02-09 21:00:27 +08:00
Yuji Sugiura
2521b52011
fix(linter/jsx_a11y): Ensure plugin settings are used (#2359)
Currently, some of the rules did not use `settings`, so make sure they
do.

- [x] Align implementation of `getElementType()`
  - (This is the only util that depends on `settings`)

Original rules which use `getElementType()` are:

- [x] 🙅🏻‍♂️ accessible-emoji
- [x]  alt-text
- [x] 🙅🏻‍♂️ anchor-ambiguous-text
- [x]  anchor-has-content
- [x] anchor-is-valid
  - TODO: `from_configuration()` not implemented => #2361 
- [x] 🙆🏻‍♀️ aria-activedescendant-has-tabindex
- [x] 🙆🏻‍♀️ aria-role
- [x] 🙆🏻‍♀️ aria-unsupported-elements
- [x] autocomplete-valid
- TODO: `from_configuration()` not implemented and needed for this =>
#2362
- [x] 🙆🏻‍♀️ click-events-have-key-events
- [x] 🙅🏻‍♂️ control-has-associated-label
- [x] heading-has-content
  - TODO: 1 test should be failed but passes 🤔 => #2360 
- [x] 🙆🏻‍♀️ html-has-lang
- [x]  iframe-has-title
- [x] 🙆🏻‍♀️ img-redundant-alt
- [x] 🙅🏻‍♂️ interactive-supports-focus
- [x] 🙅🏻‍♂️ label-has-associated-control
- [x] 🙅🏻‍♂️ label-has-for
- [x] 🙆🏻‍♀️ lang
- [x] 🙆🏻‍♀️ media-has-caption
- [x]  no-aria-hidden-on-focusable
- [x] 🙆🏻‍♀️ no-autofocus
- [x] 🙆🏻‍♀️ no-distracting-elements
- [x] 🙅🏻‍♂️ no-interactive-element-to-noninteractive-role
- [x] 🙅🏻‍♂️ no-noninteractive-element-interactions
- [x] 🙅🏻‍♂️ no-noninteractive-element-to-interactive-role
- [x] 🙅🏻‍♂️ no-noninteractive-tabindex
- [x] 🙅🏻‍♂️ no-onchange
- [x] 🙆🏻‍♀️ no-redundant-roles
- [x] 🙅🏻‍♂️ no-static-element-interactions
- [x]  prefer-tag-over-role
- [x] 🙆🏻‍♀️ role-has-required-aria-props
- [x] 🙆🏻‍♀️ role-supports-aria-props
- [x] 🙆🏻‍♀️ scope

🙅🏻‍♂️ = Not implemented yet by oxlint /  = Fixed by this PR / 🙆🏻‍♀️ =
Already used
2024-02-09 20:55:50 +08:00
overlookmotel
2f6cf73d51
fix(parser): remove erroneous debug assertion (#2356)
This was a bit of a whoopsie in last batch of PRs. This assertion shouldn't be there, because all reads are now via `source.position().read()`, so this assertion says "you can only read some byte values".

Only reason it didn't blow up conformance tests is that they run in release mode.

Sorry. Please merge soon as you can and cover my shame!
2024-02-09 20:55:12 +08:00
Dunqing
2eb489e996
fix(codegen): format new expession + import expression with the correct parentheses (#2346)
Similar to #2330
2024-02-09 20:51:50 +08:00
Boshen
f49ffb2b63
fix(linter): getter-return false positive with TypeScript syntax (#2363)
closes #2349
2024-02-09 18:22:53 +08:00
Boshen
ca77ccc951
refactor(prettier): add a space!() macro (#2348) 2024-02-09 12:11:42 +08:00
overlookmotel
8376f15b9a
perf(parser): eat whitespace after line break (#2353)
Uses the `byte_search!` macro introduced in #2352 to consume whitespace after a line break.
2024-02-09 12:02:51 +08:00
overlookmotel
d3a59f27f7
perf(parser): lex identifiers as bytes not chars (#2352)
This PR re-implements lexing identifiers with a fast path for the most common case - identifiers which are pure ASCII characters, using the new `Source` / `SourcePosition` APIs.

Lexing identifiers is a hot path, and accounts for the majority of the time the Lexer spends. The performance bump from this change is (if I do say so myself!) quite decent.

I've spent a lot of time tuning the implementation, which gained a further 10-15% on the Lexer benchmarks compared to my first, simpler attempt. Some of the design decisions, if they look odd, are likely motivated by gains in performance.

### Techniques

This implementation uses a few different strategies for performance:

* Search byte-by-byte, not char-by-char.
* Process batches of 32 bytes at a time to reduce bounds checks.
* Mark uncommon paths `#[cold]`.

### Structure

The implementation is built in 3 layers:

1. ASCII characters only.
2. ASCII and Unicode characters.
3. `\` escape sequences (and all the above).

`identifier_name_handler` starts at the top layer, and is optimized for consuming ASCII as fast as possible. Each "layer" is considered more uncommon than the previous, and dropping down a layer is a de-opt.

I'm assuming that 95%+ of JavaScript code does not include either Unicode characters or escapes in identifiers, so the speed of the fast path is prioritised.

That said, once a Unicode character is encountered, the next layer does expect to find further Unicode characters, rather than de-opting over and over again. If an identifier *starts* with a Unicode character, it enters the code straight on the 2nd layer, so is not penalised by going through a `#[cold]` boundary. Lexing Unicode is never going to be as fast as ASCII, but still I felt it was important not to penalise it unnecessarily, so as not to be Anglo-centric.

### ASCII search macro

The main ASCII search is implemented as a macro. I found that, for reasons I don't understand, it's significantly faster to have all the code in a single function, even compared to multiple functions marked `#[inline]` or `#[inline(always)]`. The fastest implementation also requires some code to be repeated twice, which is nicer to do with a macro.

This macro, and the `ByteMatchTable` types that go with it, are designed to be re-usable. Next step will be to apply them for whitespace and strings, which should be fairly simple.

Searching in batches of 32 bytes is also designed to be forward-compatible with SIMD.

### Bye bye `AutoCow`

`AutoCow` is removed. Instead, a string-builder is only created if it's needed, when a `\` escape is first encountered. The string builder is also more efficient than `AutoCow` was, as it copies bytes in chunks, rather than 1-by-1.

This won't make much difference for identifiers, as escapes are so rare anyway, but this same technique can be used for strings, where they're more common.
2024-02-09 12:01:30 +08:00
overlookmotel
6910e4f71b
refactor(parser): macro for ASCII identifier byte handlers (#2351)
Add a macro for ASCII identifier byte handlers.

This is a preparatory step towards #2352.
2024-02-09 11:55:35 +08:00
overlookmotel
6f597b18bc
refactor(parser): all pointer manipulation through SourcePosition (#2350)
A safer and faster interface for reading source text using pointers than `*ptr`.
2024-02-09 10:26:51 +08:00
Boshen
651b0b15d1
refactor(prettier): s/nodes/stack (#2347) 2024-02-08 23:22:44 +08:00
Dunqing
e4754873ee
fix(prettier): printing value instead of key in BindingProperty (#2334)
fix: #2314
2024-02-08 21:49:10 +08:00
overlookmotel
185b3dbcc3
refactor(parser): fix outdated comment (#2344)
Just fixes an outdated comment.
2024-02-08 19:47:33 +08:00
Yuji Sugiura
63b4741ff3
refactor(linter/config): Use serde::Deserialize for config parsing (#2325)
Fixes #2258 

### Overview

- Re-implemented the config parser to use `serde::Deserialize`
- In order to benefit from it as much as possible, avoided implementing
custom deserializers and tried to use attributes as much as possible
  - This required some changes to the caller signatures...

 

- Fixed a bug that did not support for abbreviations like `"rule-name":
1`
- Fixed settings that should have been located in `settings.react` but
were not
2024-02-08 16:48:38 +08:00
overlookmotel
f3470163d9
refactor(parser): make Source::set_position safe (#2341)
Make `Source::set_position` a safe function.

This addresses a shortcoming of #2288.

Instead of requiring caller of `Source::set_position` to guarantee that the `SourcePosition` is created from this `Source`, the preceding PRs enforce this guarantee at the type level.

`Source::set_position` is going to be a central API for transitioning the lexer to processing the source as bytes, rather than `char`s (and the anticipated speed-ups that will produce). So making this method safe will remove the need for a *lot* of unsafe code blocks, and boilerplate comments promising "SAFETY: There's only one `Source`", when to the developer, this is blindingly obvious anyway.

So, while splitting the parser into `Parser` and `ParserImpl` (#2339) is an annoying change to have to make, I believe the benefit of this PR justifies it.
2024-02-08 14:56:26 +08:00
overlookmotel
aef593fb50
parser(refactor): promise only one Source on a thread at a time (#2340)
Introduce invariant that only a single `lexer::Source` can exist on a thread at one time.

This is a preparatory step for #2341.

2 notes:

Restriction is only 1 x `ParserImpl` / `Lexer` / `Source` on 1 *thread* at a time, not globally. So this does not prevent parsing multiple files simultaneously on different threads.

Restriction does not apply to public type `Parser`, only `ParserImpl`. `ParserImpl`s are not created in created in `Parser::new`, but instead in `Parser::parse`, where they're created and then immediately consumed. So the end user is also free to create multiple `Parser` instances (if they want to for some reason) on the same thread.
2024-02-08 14:51:17 +08:00
Maurice Nicholson
ebc08d4e1e
fix(linter): add missing typescript-eslint(_) prefix for some errors (#2342)
Running latest on one of my projects these warnings jumped out at me
because they were "anonymous" vs the others.

This PR just adds the usual rule-name prefix to the errors where it was
missing
2024-02-08 14:28:56 +08:00
overlookmotel
0bdecb5043
refactor(parser): wrapper type for parser (#2339)
Split parser into public interface `Parser` and internal implementation `ParserImpl`.

This involves no changes to public API.

This change is a bit annoying, but justification is that it's required for #2341, which I believe to be very worthwhile.

The `ParserOptions` type also makes it a bit clearer what the defaults for `allow_return_outside_function` and `preserve_parens` are. It came as a surprise to me that `preserve_parens` defaults to `true`, and this refactor makes that a bit more obvious when reading the code.

All the real changes are in [oxc_parser/src/lib.rs](https://github.com/oxc-project/oxc/pull/2339/files#diff-8e59dfd35fc50b6ac9a9ccd991e25c8b5d30826e006d565a2e01f3d15dc5f7cb). The rest of the diff is basically replacing `Parser` with `ParserImpl` everywhere else.
2024-02-07 23:22:08 +08:00
Dunqing
55011e2793
feat(codegen): avoid printing comma in ArrayAssignmentTarget if the elements is empty (#2331) 2024-02-06 22:45:19 +08:00
Boshen
721f6cb74e
fix(codegen): format new expression + call expression with the correct parentheses (#2330)
closes #2328
2024-02-06 22:06:12 +08:00
Dunqing
40e9541cec
feat(semantic): add export binding for ExportDefaultDeclarations in module record (#2329) 2024-02-06 22:01:16 +08:00
luhc228
8771c6410f
feat: add typescript-eslint rule array-type (#2292)
Ref: https://github.com/oxc-project/oxc/issues/2180
2024-02-06 11:35:29 +08:00
Boshen
6fe9300880
chore(linter); add regression case for require-yield (#2326)
closes #2323
closes #2324
2024-02-05 22:57:28 +08:00
Boshen
1db780960c
Revert "refactor(semantic): get function by scope_id in set_function_node_flag (#2208)"
This reverts commit c62495d23f.
2024-02-05 22:49:10 +08:00
overlookmotel
cdef41d552
refactor(parser): lexer replace Chars with Source (#2288)
This PR replaces the `Chars` iterator in the lexer with a new structure
`Source`.

## What it does

`Source` holds the source text, and allows:

* Iterating through source text char-by-char (same as `Chars` did).
* Iterating byte-by-byte.
* Getting a `SourcePosition` for current position, which can be used
later to rewind to that position, without having to clone the entire
`Source` struct.

`Source` has the same invariants as `Chars` - cursor must always be
positioned on a UTF-8 character boundary (i.e. not in the middle of a
multi-byte Unicode character).

However, unsafe APIs are provided to allow a caller to temporarily break
that invariant, as long as they satisfy it again before they pass
control back to safe code. This will be useful for processing batches of
bytes.

## Why

I envisage most of the Lexer migrating to byte-by-byte iteration, and I
believe it'll make a significant impact on performance.

It will allow efficiently processing batches of bytes (e.g. consuming
identifiers or whitespace) without the overhead of calculating code
points for every character. It should also make all the many `peek()`,
`next_char()` and `next_eq()` calls faster.

`Source` is also more performant than `Chars` in itself. This wasn't my
intent, but seems to be a pleasant side-effect of it being less opaque
to the compiler than `Chars`, so it can apply more optimizations.

In addition, because checkpoints don't need to store the entire `Source`
struct, but only a `SourcePosition` (8 bytes), was able to reduce the
size of `LexerCheckpoint` and `ParserCheckpoint`, and make them both
`Copy`.

## Notes on implementation

`Source` is heavily based on Rust's `std::str::Chars` and
`std::slice::Iter` iterators and I've copied the code/concepts from them
as much as possible.

As it's a low-level primitive, it uses raw pointers and contains a *lot*
of unsafe code. I *think* I've crossed the T's and dotted the I's, and
I've commented the code extensively, but I'd appreciate a close review
if anyone has time.

I've split it into 2 commits.

* First commit is all the substantive changes.
* 2nd commit just does away with `lexer.current` which is no longer
needed, and replaces `lexer.current.token` with `lexer.token`
everywhere.

Hopefully looking just at the 1st commit will reduce the noise and make
it easier to review.

### `SourcePosition`

There is one annoyance with the API which I haven't been able solve:

`SourcePosition` is a wrapper around a pointer, which can only be
created from the current position of `Source`. Due to the invariant
mentioned above, therefore `SourcePosition` is always in bounds of the
source text, and points to a UTF-8 character boundary. So `Source` can
be rewound to a `SourcePosition` cheaply, without any checks. I had
originally envisaged `Source::set_position` being a safe function, as
`SourcePosition` enforces the necessary invariants itself.

The fly in the ointment is that a `SourcePosition` could theoretically
have been created from *another* `Source`. If that was the case, it
would be out of bounds, and it would be instant UB. Consequently,
`Source::set_position` has to be an unsafe function.

This feels rather ridiculous. *Of course* the parser won't create 2
Lexers at the same time. But still it's *possible*, so I think better to
take the strict approach and make it unsafe until can find a way to
statically prove the safety by some other means. Any ideas?

## Oddity in the benchmarks

There's something really odd going on with the semantic benchmark for
`pdf.mjs`.

While I was developing this, small and seemingly irrelevant changes
would flip that benchmark from +0.5% or so to -4%, and then another
small change would flip it back.

What I don't understand is that parsing happens outside of the
measurement loop in the semantic benchmark, so the parser shouldn't have
*any* effect either way on semantic's benchmarks.

If CodSpeed's flame graph is to be believed, most of the negative effect
appears to be a large Vec reallocation happening somewhere in semantic.

I've ruled out a few things: The AST produced by the parser for
`pdf.mjs` after this PR is identical to what it was before. And
semantic's `nodes` and `scopes` Vecs are same length as they were
before. Nothing seems to have changed!

I really am at a loss to explain it. Have you seen anything like this
before?

One possibility is a fault in my unsafe code which is manifesting only
with `pdf.mjs`, and it's triggering UB, which I guess could explain the
weird effects. I'm running the parser on `pdf.mjs` in Miri now and will
see if it finds anything (Miri doesn't find any problem running the
tests). It's been running for over an hour now. Hopefully it'll be done
by morning!

I feel like this shouldn't merged until that question is resolved, so
marking this as draft in the meantime.
2024-02-05 13:51:46 +00:00
Yuji Sugiura
b27079cf8e
chore(linter): Add more tests for ESLintConfig (#2284)
Before trying  #2258 , I'd like to prevent regression. 🦺 

### Overview

- Rename `ESLintConfig::new(path)` -> `from_file(path)`
- Split `from_file()` implementation into 2 parts
  - Parse path, strip json comment, check `.json` ext part
  - `from_value()`: Read +parse JSON contents part
    - ☝🏻used in tests
- Add tests for parsing rules, settings, env

### TODOs found, for next PR
- `rules` parser should handle `"no-debugger": 1` form
- `settings.xxx_components` should go under `settings.react.`

### Notes

- `rules`'s type
  - https://github.com/eslint/eslint/blob/main/lib/shared/types.js#L12
- `settings`'s type is `Object` 😅 
  - https://github.com/eslint/eslint/blob/main/lib/shared/types.js#L53
  - and its usage is extended by each plugin
-
https://github.com/jsx-eslint/eslint-plugin-react?tab=readme-ov-file#configuration-legacy-eslintrc-
-
https://github.com/jsx-eslint/eslint-plugin-jsx-a11y/?tab=readme-ov-file#configurations
-
https://nextjs.org/docs/pages/building-your-application/configuring/eslint#eslint-plugin
- `env`'s type is just a `Record<string, boolean>`
  - https://github.com/eslint/eslint/blob/main/lib/shared/types.js#L40
2024-02-05 20:42:03 +08:00
magic-akari
577d7ab72f
feat(prettier): Support TSImportEqualsDeclaration (#2321) 2024-02-05 20:37:26 +08:00
magic-akari
c6273732f6
feat(prettier): Support TSExportAssignment (#2320) 2024-02-05 20:33:03 +08:00
Dunqing
d571839ab8
feat(ast): enter AstKind::ExportDefaultDeclaration, AstKind::ExportNamedDeclaration and AstKind::ExportAllDeclaration (#2317) 2024-02-05 17:43:30 +08:00
Dunqing
a3570d41f0
feat(semantic): report parameter related errors for setter/getter (#2316) 2024-02-05 17:38:43 +08:00
Dunqing
9ca13d040d
feat(semantic): report type parameter list cannot be empty (#2315) 2024-02-05 16:05:51 +08:00
Boshen
a762d17603
feat(linter): promote no-this-before-super to correctness (#2313)
I've tested this in all real world test repos and found no false
positives. Thank you so much @u9g @TzviPM for making this happen!
2024-02-05 16:01:09 +08:00
renovate[bot]
41d1876650
chore(deps): update rust crates (#2302) 2024-02-05 14:36:53 +08:00
Dunqing
540b2a0396
fix(semantic): remove unnecessary SymbolFlags::Import (#2311) 2024-02-05 14:16:29 +08:00
Dunqing
f53c54ced9
feat(semantic): report unexpected type annotation in ArrayPattern (#2309) 2024-02-05 13:45:52 +08:00
Dunqing
f3035f1bbe
feat(semantic): apply ImportSpecifier's binder and remove ModuleDeclaration's binder (#2307)
Added in #2230, But i forgot to call.
2024-02-05 13:16:05 +08:00
overlookmotel
9811c3a2c3
refactor(parser): name byte handler functions (#2301)
This PR solves the problem of lexer byte handlers all being called
`core::ops::function::FnOnce::call_once` in the flame graphs on
CodSpeed, by defining them as named functions instead of closures.

Pure refactor, no substantive changes.
2024-02-05 13:06:09 +08:00
Dunqing
cb17a83f4f
fix(semantic): remove ignore cases (#2300) 2024-02-04 22:40:41 +08:00
Boshen
6002560fa1
feat(span): fix memory leak by implementing inlineable string for oxc_allocator (#2294)
closes #1803

This string is currently unsafe, but I want to get miri working before
introducing more changes.

I want to make a progress from memory leak to unsafe then to safety.
It's harder to do the steps in one go.
2024-02-04 19:28:23 +08:00
Boshen
1822cfe18d
refactor(ast): fix BigInt memory leak by removing it (#2293)
relates

We'll need to evaluate the value by other means.
2024-02-04 16:47:00 +08:00
Boshen
b5e43fbc5d
fix(linter): fix no_dupe_keys false postive on similar key names (#2291)
closes #2287
2024-02-04 14:54:09 +08:00
Tzvi Melamed
0060d6a730
feat(linter): Implement no_this_before_super with cfg (#2254)
Implements `eslint/no-this-before-super` in #479.

Closes #2279
2024-02-04 13:51:04 +08:00
Boshen
d2b304b1f8
Publish crates v0.6.0 2024-02-03 22:35:30 +08:00
Wenzhe Wang
0c225a49aa
fix(codegen): print space before with clause in import (#2278) 2024-02-02 14:52:32 +00:00
Dunqing
37a2676e1e
fix(linter): AllowFunction doesn't support generator (#2277) 2024-02-02 21:53:44 +08:00
Boshen
28daf83b19
feat(semantic): report no class name error (#2273)
closes #2144
2024-02-02 19:05:00 +08:00