Part of #1880
`Token` size is reduced from 32 to 16 bytes by changing the previous
token value `Option<&'a str>` to a u32 index handle.
It would be nice if this handle is eliminated entirely because
the normal case for a string is always
`&source_text[token.span.start.token.span.end]`
Unfortunately, JavaScript allows escaped characters to appear in
identifiers, strings and templates. These strings need to be unescaped
for equality checks, i.e. `"\a" === "a"`.
This leads us to adding a `escaped_strings[]` vec for storing these
unescaped and allocated
strings.
Performance regression for adding this vec should be minimal because
escaped strings are rare.
Background Reading:
* https://floooh.github.io/2018/06/17/handles-vs-pointers.html
This PR is part of #1880.
`Token` size is reduced from 48 to 40 bytes.
To reconstruct the regex pattern and flags within the parser , the regex
string is
re-parsed from the end by reading all valid flags.
In order to make things work nicely, the lexer will no longer recover
from a invalid regex.
This PR partially fixes#1803 and is part of #1880.
BigInt is removed from the `Token` value, so that the token size can be
reduced once we removed all the variants.
`Token` is now also `Copy`, which removes all the `clone` and `drop`
calls.
This yields 5% performance improvement for the parser.
Parser incorrectly identifies string literals as directives if they
follow after `import`s, `export`s, or decorators.
In all of these cases, `'use strict'` produces a directive in the AST,
where it should be parsed as an `ExpressionStatement` containing a
`StringLiteral`:
```js
import x from 'foo';
'use strict';
```
```js
export {x};
'use strict';
```
```js
@foo
'use strict';
```
[Playground](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC0gICAgICAgIC0G8rnONK89ITJ3zrK%2FUP7OmSZPgHQzStr3yMtwFTU%2BD1WPt09JgqZJLoYooydbGsM5vGcf34BnIA%3D)
This PR should fix that.
I'm not sure about the decorator case, though. I assume it's not a
directive. But is prefixing a string literal with a decorator even legal
syntax anyway?
And a side nit: If I'm reading it right, I don't think the `continue`
statement in the decorator arm of the match does anything. Do I have
that right?
Last question: Where does one go about putting a test? I guess these
silly cases aren't covered by Babel etc's tests.
---------
Co-authored-by: Boshen <boshenc@gmail.com>
`Token` and `Span` both represent `start` and `end` as `u32`.
This limits size of source which can be parsed to `u32::MAX`.
19577709db/crates/oxc_span/src/span.rs (L14-L20)
However, this constraint is currently not enforced.
In a release build, code will not panic on arithmetic overflow, so
`start`/`end` could wrap around back to zero if source is 4 GiB or more.
That'd produce nonsense spans. But worse, the lexer relies in some
places on `self.current.token.start` being correct, so if the value
wrapped around, possibly it'd keep rewinding to the start of the source
and lexing it again, causing an infinite loop.
In worst case, if for some reason an application's public API used OXC's
parser with user-supplied source code (parser-as-a-service!), this could
be exploited for denial of service.
This PR adds an assertion to catch this at the start of parsing instead.
This does add an extra instruction, but I imagine the effect will be
negligible compared to the work required to parse the code.
Parser, trivias and trivias_builder were edited to get all whitespaces.
Now Trivias struct store comments and whitespaces Vec. After that, i
will implement the no-irregular-whitespace rule.
P.S.: There isn't a way to implement this feature without lose a little
bit of performance, comparing with my last PR #1819 to minimax this
trouble instead of store the irregular whitespace as Span it was stored
as u32, i removed a map iterator and removed too a unused function. If
you have a suggestion about it pls give me a feedback.
Just removes a couple of lines of redundant code from the lexer.
A note on the 2nd one:
```rs
let mut builder = AutoCow::new(lexer);
let c = lexer.consume_char();
builder.push_matching(c);
```
`push_matching()` is a no-op unless
`force_allocation_without_current_ascii_char()` has already been called.
Here the `AutoCow` has just been freshly created, so we know it hasn't.
Most TypeScript types can be eliminated during the code generation phase
by not printing the corresponding AST nodes.
The changes in this PR enable applying a similar technique to the `this`
parameter.
The ECMA specification seems to added the "Tokens" section to the
specification as 12.6. This pushed all the other sections down,
resulting in e.g. former 12.6 now being 12.7. Comments in the parser
mention this part of the specification. All the mentions of section
12.6+ therefor are outdated now. This pull request tries to fix that by
updating all the comments.
closes#949closes#950closes#951
All minifier tests are disable from this PR.
We are going to fix the compilation errors first, then the behavioral
errors.
When initially written types were not in the symbol table. Now that
types are in the symbol table it makes sense given
```ts
type A = 1
type B = A
```
that you can get to the symbol id for for A from type B = A.
Please correct me if I'm wrong about how I implemented this. I also
verified that occurrence (I believe this is the correct word) behaves
how I would expect.
```ts
type RecursiveType = string | {[x: string]: RecursiveType}
```
Does populate a reference.
---------
Co-authored-by: Boshen <boshenc@gmail.com>