Closes#6358
@preyneyv I know you've been working on this problem.
This is an implementation that has been dormant on my local for a while.
- All tests are passing
- However, the approach is simple but not general, so there might be some edge cases that were missed
- There's also room for improvement in terms of performance
For these reasons, it was marked as WIP for me.
I believe the test cases and other parts are usable, so feel free to fork and replace them with your implementation if you'd like.
Preparation for #6141
`oxc_regular_expression` can already parse and validate both `/regexp-literal/` and `new RegExp("string-literal")`.
But one thing that is not well-supported was reporting `Span` for the `RegExp("string-literal-with-\\escape")` case.
For example, these two cases produce the same `RegExp` instances in JavaScript:
- `/\d+/`
- `new RegExp("\\d+")`
For now, mainly in `oxc_linter`, the latter case is parsed with `oxc_parser` -> `ast::literal::StringLiteral` AST node -> `value` property.
At this point, escape sequences are resolved(!), `oxc_regular_expression` can handle aligned `&str` as an argument without any problem in both cases.
However, in terms of `Span` representation, these cases should be handled differently because of the `\\` in string literals...
As a result, the parsed AST's `Span` for `new RegExp("string-literal")` is not accurate if it contains escape sequences.
e.g. a01a5dfdaf/crates/oxc_linter/src/snapshots/no_invalid_regexp.snap (L118-L122)
Each time the `\` appears, the subsequent position is shifted. `_` should be placed under `*` in this case.
So... to resolve this issue, we need to implement `string_literal_parser` first, and use them as reading units of `oxc_regular_expression`.
Beginning of #6347. Instead of using serde-derive, we generate
`Serialize` impls manually.
---------
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: overlookmotel <theoverlookmotel@gmail.com>
This PR makes 2 changes to improve the existing API that are not very useful.
- Remove `(Literal)Parser` and `FlagsParser` and their ASTs
- Add `with_flags(flags_text)` helper to `ParserOptions`
Here are the details.
> Remove `(Literal)Parser` and `FlagsParser` and their ASTs
Previously, the `oxc_regular_expression` crate exposed 3 parsers.
- `(Literal)Parser`: assumes `/pattern/flags` format
- `PatternParser`: assumes `pattern` part only
- `FlagsParser`: assumes `flags` part only
However, it turns out that in actual usecases, only the `PatternParser` is actually sufficient, as the pattern and flags are validated and sliced in advance on the `oxc_parser` side.
The current usecase for `(Literal)Parser` is mostly for internal testing.
There were also some misuses of `(Literal)Parser` that restore `format!("/{pattern}/{flags}")` back and use `(Literal)Parser`.
Therefore, only `PatternParser` is now published, and unnecessary ASTs have been removed.
(This also obsoletes #5592 .)
> Added `with_flags(flags_text)` helper to `ParserOptions`
Strictly speaking, there was a subtle difference between the "flag" strings that users were aware of and the "mode" recognised by the parser.
Therefore, it was a common mistake to forget to enable `unicode_mode` when using the `v` flag.
With this helper, crate users no longer need to distinguish between flags and modes.
- resolves https://github.com/oxc-project/oxc/issues/5977
- supersedes https://github.com/oxc-project/oxc/pull/5951
To facilitate easier traversal of the Regex AST, this PR defines a `Visit` trait with default implementations that will walk the entirety of the Regex AST. Methods in the `Visit` trait can be overridden with custom implementations to do things like analyzing only certain nodes in a regular expression, which will be useful for regex-related `oxc_linter` rules.
In the future, we should consider automatically generating this code as it is very repetitive, but for now a handwritten visitor is sufficient.
As of now if we remove the implementation of a trait for a type and implement the method on that type directly it wouldn't break while it isn't the original trait anymore so that method might do something entirely different.
This change is more explicit on trait calls so we hit compile errors on these kinds of changes.
- Fix example to handle `new RegExp()` too
- Update NOTE comments
- - -
Until I tried interacting with the actual AST parsed by `oxc_parser`, I thought that the current `oxc_regular_expression` lacked support for the `RegExp` constructor due to escape sequences.
This was because `"\""` remained `"\""` after reading the source text from `.js` files.
However, once it was parsed by `oxc_parser`, I found that everything was [resolved](8ef85a43c0/crates/oxc_parser/src/lexer/string.rs)! (Wonderful work as usual. 👏🏻 )
Now there is nothing to worry about. 😌