oxc/.typos.toml
leaysgur 368364d47b feat(regex_parser): Implement RegExp parser (#3824)
Part of #1164

## Progress updates 🗞️

Waiting for the review and advice, while thinking how to handle escaped string when `new RegExp(pat)`.

## TODOs

- [x] `RegExp(Literal = Body + Flags)#parse()` structure
- [x] Base `Reader` impl to handle both unicode(u32) and utf-16(u16) units
- [x] Global `Span` and local offset conversion
- [x] Design AST shapes
  - [x] Keep `enum` size small by `Box<'a, T>`
  - [x] Rework AST shapes
- [x] Split body and flags w/ validating literal
- [x] Parse `RegExpFlags`
- [x] Parse `RegExpBody` = `Pattern`
- [x] Parse `Pattern` > `Disjunction`
- [x] Parse `Disjunction` > `Alternative`
- [x] Parse `Alternative` > `Term`
- [x] Parse `Term` > `Assertion`
	- [x] Parse `BoundaryAssertion`
	- [x] Parse `LookaroundAssertion`
- [x] Parse `Term` > `Quantifier`
- [x] Parse `Term` > `Atom`
	- [x] Parse `Atom` > `PatternCharacter`
	- [x] Parse `Atom` > `.`
	- [x] Parse `Atom` > `\AtomEscape`
		- [x] Parse `\AtomEscape` > `DecimalEscape`
		- [x] Parse `\AtomEscape` > `CharacterClassEscape`
			- [x] Parse `CharacterClassEscape` > `\d, \D, \s, \S, \w, \W`
			- [x] Parse `CharacterClassEscape` > `\p{UnicodePropertyValueExpression}, \P{UnicodePropertyValueExpression}`
		- [x] Parse `\AtomEscape` > `CharacterEscape`
			- [x] Parse `CharacterEscape` > `ControlEscape`
			- [x] Parse `CharacterEscape` > `c AsciiLetter`
			- [x] Parse `CharacterEscape` > `0`
			- [x] Parse `CharacterEscape` > `HexEscapeSequence`
			- [x] Parse `CharacterEscape` > `RegExpUnicodeEscapeSequence`
			- [x] Parse `CharacterEscape` > `IdentityEscape`
		- [x] Parse `\AtomEscape` > `kGroupName`
	- [x] Parse `Atom` > `[CharacterClass]`
    	- [x] Parse `[CharacterClass]` > `ClassContents` > `[~UnicodeSetsMode] NonemptyClassRanges`
    	- [x] Parse `[CharacterClass]` > `ClassContents` > `[+UnicodeSetsMode] ClassSetExpression`
          - [x] Parse `ClassSetExpression` > `ClassUnion`
          - [x] Parse `ClassSetExpression` > `ClassIntersection`
          - [x] Parse `ClassSetExpression` > `ClassSubtraction`
          - [x] Parse `ClassSetExpression` > `ClassSetOperand`
          - [x] Parse `ClassSetExpression` > `ClassSetRange`
          - [x] Parse `ClassSetExpression` > `ClassSetCharacter`
	- [x] Parse `Atom` > `(GroupSpecifier)`
	- [x] Parse `Atom` > `(?:Disjunction)`
- [x] Annex B
    - [x] Parse `QuantifiableAssertion`
	- [x] Parse `ExtendedAtom`
      - [x] Parse `ExtendedAtom` > `\ [lookahead = c]`
      - [x] Parse `ExtendedAtom` > `InvalidBracedQuantifier`
      - [x] Parse `ExtendedAtom` > `ExtendedPatternCharacter`
      - [x] Parse `ExtendedAtom` > `\AtomEscape` > `CharacterEscape` > `LegacyOctalEscapeSequence`
- [x] Early errors
	- [x] Pattern :: Disjunction(1/2)
	- [x] Pattern :: Disjunction(2/2)
	- [x] QuantifierPrefix :: { DecimalDigits , DecimalDigits }
	- [x] ExtendedAtom :: InvalidBracedQuantifier (Annex B)
	- [x] AtomEscape :: k GroupName
	- [x] AtomEscape :: DecimalEscape
	- [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(1/2)
	- [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(2/2)
	- [x] NonemptyClassRanges :: ClassAtom - ClassAtom ClassContents(Annex B)
	- [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(1/2)
	- [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(2/2)
	- [x] NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassContents(Annex B)
	- [x] RegExpIdentifierStart :: \ RegExpUnicodeEscapeSequence
	- [x] RegExpIdentifierStart :: UnicodeLeadSurrogate UnicodeTrailSurrogate
	- [x] RegExpIdentifierPart :: \ RegExpUnicodeEscapeSequence
	- [x] RegExpIdentifierPart :: UnicodeLeadSurrogate UnicodeTrailSurrogate
	- [x] UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue(1/2)
	- [x] UnicodePropertyValueExpression :: UnicodePropertyName = UnicodePropertyValue(2/2)
	- [x] UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue(1/2)
	- [x] UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue(2/2)
	- [x] CharacterClassEscape :: P{ UnicodePropertyValueExpression }
	- [x] CharacterClass :: [^ ClassContents ]
	- [x] NestedClass :: [^ ClassContents ]
	- [x] ClassSetRange :: ClassSetCharacter - ClassSetCharacter
- [x] Add `Span` to `Err(OxcDiagnostic::error())` calls
- [x] Perf improvement
	- [x] `Reader#peek()` should avoid `iter.next()` equivalent
	- [x] ~~Use `char` everywhere and split and push 2 surrogates(pair) for `Character`?~~
	- [x] ~~Try 1(+1) loop parsing for capturing groups?~~

## Follow up

- [x] @Boshen Test suite > #4242
  - [x] Investigate CI errors...
- Next...
  - Support ES2025 Duplicate named capturing groups?
  - Support ES20XX Stage3 Modifiers?
2024-08-20 02:19:24 +00:00

40 lines
1.1 KiB
TOML

# https://github.com/crate-ci/typos
# cargo install typos-cli
# typos
[files]
extend-exclude = [
"**/*.snap",
"**/*/CHANGELOG.md",
"crates/oxc_linter/fixtures",
"crates/oxc_linter/src/rules/eslint/no_unused_vars/ignored.rs",
"crates/oxc_linter/src/rules/eslint/no_unused_vars/options.rs",
"crates/oxc_linter/src/rules/eslint/no_unused_vars/tests/eslint.rs",
"crates/oxc_linter/src/rules/jsx_a11y/aria_props.rs",
"crates/oxc_linter/src/rules/jsx_a11y/img_redundant_alt.rs",
"crates/oxc_linter/src/rules/react/no_unknown_property.rs",
"crates/oxc_parser/src/lexer/byte_handlers.rs",
"crates/oxc_syntax/src/xml_entities.rs",
"pnpm-lock.yaml",
"tasks/coverage/babel",
"tasks/coverage/test262",
"tasks/coverage/typescript",
"tasks/prettier_conformance/prettier",
]
[default]
extend-ignore-re = [
"(?Rm)^.*(#|//)\\s*spellchecker:disable-line$",
"(?s)(#|//)\\s*spellchecker:off.*?\\n\\s*(#|//)\\s*spellchecker:on",
]
[default.extend-words]
trivias = "trivias"
trivia = "trivia"
xdescribe = "xdescribe"
seeked = "seeked"
labeledby = "labeledby"
[default.extend-identifiers]
IIFEs = "IIFEs"
allowIIFEs = "allowIIFEs"