oxc/crates
leaysgur 368364d47b feat(regex_parser): Implement RegExp parser (#3824)
Part of #1164

## Progress updates 🗞️

Waiting for the review and advice, while thinking how to handle escaped string when `new RegExp(pat)`.

## TODOs

- [x] `RegExp(Literal = Body + Flags)#parse()` structure
- [x] Base `Reader` impl to handle both unicode(u32) and utf-16(u16) units
- [x] Global `Span` and local offset conversion
- [x] Design AST shapes
  - [x] Keep `enum` size small by `Box<'a, T>`
  - [x] Rework AST shapes
- [x] Split body and flags w/ validating literal
- [x] Parse `RegExpFlags`
- [x] Parse `RegExpBody` = `Pattern`
- [x] Parse `Pattern` > `Disjunction`
- [x] Parse `Disjunction` > `Alternative`
- [x] Parse `Alternative` > `Term`
- [x] Parse `Term` > `Assertion`
	- [x] Parse `BoundaryAssertion`
	- [x] Parse `LookaroundAssertion`
- [x] Parse `Term` > `Quantifier`
- [x] Parse `Term` > `Atom`
	- [x] Parse `Atom` > `PatternCharacter`
	- [x] Parse `Atom` > `.`
	- [x] Parse `Atom` > `\AtomEscape`
		- [x] Parse `\AtomEscape` > `DecimalEscape`
		- [x] Parse `\AtomEscape` > `CharacterClassEscape`
			- [x] Parse `CharacterClassEscape` > `\d, \D, \s, \S, \w, \W`
			- [x] Parse `CharacterClassEscape` > `\p{UnicodePropertyValueExpression}, \P{UnicodePropertyValueExpression}`
		- [x] Parse `\AtomEscape` > `CharacterEscape`
			- [x] Parse `CharacterEscape` > `ControlEscape`
			- [x] Parse `CharacterEscape` > `c AsciiLetter`
			- [x] Parse `CharacterEscape` > `0`
			- [x] Parse `CharacterEscape` > `HexEscapeSequence`
			- [x] Parse `CharacterEscape` > `RegExpUnicodeEscapeSequence`
			- [x] Parse `CharacterEscape` > `IdentityEscape`
		- [x] Parse `\AtomEscape` > `kGroupName`
	- [x] Parse `Atom` > `[CharacterClass]`
    	- [x] Parse `[CharacterClass]` > `ClassContents` > `[~UnicodeSetsMode] NonemptyClassRanges`
    	- [x] Parse `[CharacterClass]` > `ClassContents` > `[+UnicodeSetsMode] ClassSetExpression`
          - [x] Parse `ClassSetExpression` > `ClassUnion`
          - [x] Parse `ClassSetExpression` > `ClassIntersection`
          - [x] Parse `ClassSetExpression` > `ClassSubtraction`
          - [x] Parse `ClassSetExpression` > `ClassSetOperand`
          - [x] Parse `ClassSetExpression` > `ClassSetRange`
          - [x] Parse `ClassSetExpression` > `ClassSetCharacter`
	- [x] Parse `Atom` > `(GroupSpecifier)`
	- [x] Parse `Atom` > `(?:Disjunction)`
- [x] Annex B
    - [x] Parse `QuantifiableAssertion`
	- [x] Parse `ExtendedAtom`
      - [x] Parse `ExtendedAtom` > `\ [lookahead = c]`
      - [x] Parse `ExtendedAtom` > `InvalidBracedQuantifier`
      - [x] Parse `ExtendedAtom` > `ExtendedPatternCharacter`
      - [x] Parse `ExtendedAtom` > `\AtomEscape` > `CharacterEscape` > `LegacyOctalEscapeSequence`
- [x] Early errors
	- [x] Pattern :: Disjunction(1/2)
	- [x] Pattern :: Disjunction(2/2)
	- [x] QuantifierPrefix :: { DecimalDigits , DecimalDigits }
	- [x] ExtendedAtom :: InvalidBracedQuantifier (Annex B)
	- [x] AtomEscape :: k GroupName
	- [x] AtomEscape :: DecimalEscape
	- [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(1/2)
	- [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(2/2)
	- [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(Annex B)
	- [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(1/2)
	- [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(2/2)
	- [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(Annex B)
	- [x] RegExpIdentifierStart :: \ RegExpUnicodeEscapeSequence
	- [x] RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate
	- [x] RegExpIdentifierPart :: \ RegExpUnicodeEscapeSequence
	- [x] RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate
	- [x] UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue(1/2)
	- [x] UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue(2/2)
	- [x] UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue(1/2)
	- [x] UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue(2/2)
	- [x] CharacterClassEscape :: P{ UnicodePropertyValueExpression }
	- [x] CharacterClass :: [^ ClassContents ]
	- [x] NestedClass :: [^ ClassContents ]
	- [x] ClassSetRange :: ClassSetCharacter - ClassSetCharacter
- [x] Add `Span` to `Err(OxcDiagnostic::error())` calls
- [x] Perf improvement
	- [x] `Reader#peek()` should avoid `iter.next()` equivalent
	- [x] ~~Use `char` everywhere and split and push 2 surrogates(pair) for `Character`?~~
	- [x] ~~Try 1(+1) loop parsing for capturing groups?~~

## Follow up

- [x] @Boshen Test suite > #4242
  - [x] Investigate CI errors...
- Next...
  - Support ES2025 Duplicate named capturing groups?
  - Support ES20XX Stage3 Modifiers?
2024-08-20 02:19:24 +00:00
..
oxc refactor(transform_conformance): add driver (#4969) 2024-08-19 07:27:39 +00:00
oxc_allocator Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_ast refactor(ast)!: Change order of fields in CallExpression (#4859) 2024-08-20 09:47:12 +08:00
oxc_ast_macros Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_cfg Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_codegen Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_diagnostics Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_index Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_isolated_declarations fix(isolated_declarations): namespaces that are default exported should be considered for expando functions (#4935) 2024-08-19 11:24:05 +08:00
oxc_language_server refactor(linter): use diagnostic codes in lint rules (#4349) 2024-07-20 03:35:00 +00:00
oxc_linter refactor(ast)!: Change order of fields in CallExpression (#4859) 2024-08-20 09:47:12 +08:00
oxc_macros chore(linter): update docs for declare_oxc_lint! (#4825) 2024-08-11 15:27:54 +00:00
oxc_mangler Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_minifier Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_module_lexer Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_parser refactor(ast)!: Change order of fields in CallExpression (#4859) 2024-08-20 09:47:12 +08:00
oxc_prettier chore: remove unsafe_code = "warn" rust lint 2024-07-15 10:39:08 +08:00
oxc_regexp_parser feat(regex_parser): Implement RegExp parser (#3824) 2024-08-20 02:19:24 +00:00
oxc_semantic refactor(ast)!: Change order of fields in CallExpression (#4859) 2024-08-20 09:47:12 +08:00
oxc_sourcemap Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_span refactor(span): clarify Atom conversion methods lifetimes (#4978) 2024-08-19 11:53:01 +00:00
oxc_syntax Release crates v0.24.3 (#4950) 2024-08-18 14:16:25 +08:00
oxc_transformer refactor(ast)!: Change order of fields in CallExpression (#4859) 2024-08-20 09:47:12 +08:00
oxc_traverse refactor(ast)!: Change order of fields in CallExpression (#4859) 2024-08-20 09:47:12 +08:00
oxc_wasm refactor(minifier): ast passes infrastructure (#4625) 2024-08-04 11:58:39 +00:00