Commit graph

59 commits

Author SHA1 Message Date
overlookmotel
0bdecb5043
refactor(parser): wrapper type for parser (#2339)
Split parser into public interface `Parser` and internal implementation `ParserImpl`.

This involves no changes to public API.

This change is a bit annoying, but justification is that it's required for #2341, which I believe to be very worthwhile.

The `ParserOptions` type also makes it a bit clearer what the defaults for `allow_return_outside_function` and `preserve_parens` are. It came as a surprise to me that `preserve_parens` defaults to `true`, and this refactor makes that a bit more obvious when reading the code.

All the real changes are in [oxc_parser/src/lib.rs](https://github.com/oxc-project/oxc/pull/2339/files#diff-8e59dfd35fc50b6ac9a9ccd991e25c8b5d30826e006d565a2e01f3d15dc5f7cb). The rest of the diff is basically replacing `Parser` with `ParserImpl` everywhere else.
2024-02-07 23:22:08 +08:00
overlookmotel
71898ffdd5
refactor(parser): move source length check into lexer (#2206)
This change makes little difference in itself, but moving the check into
the lexer will allow some optimizations in lexer using unsafe code which
depend on this invariant.
2024-01-29 22:29:02 +08:00
overlookmotel
e123be0a00
fix(parser): correct MAX_LEN for 32-bit systems (#2204)
Maximum length of source parser can accept is limited on 32-bit systems
to `isize::MAX` (i.e. `i32::MAX` not `u32::MAX`) because Rust [limits
the size of
allocations](https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.from_size_align)
to `isize::MAX`.

This PR takes that constraint into account when calculating
`Parser::MAX_LEN`.

It also speeds up the `overlong_source` test so it runs in under 500ms
(previously it took ~4 secs on a M1 Macbook Pro).
2024-01-29 21:45:45 +08:00
overlookmotel
36c718ee82
feat(tasks): benchmarks for lexer (#2101)
This PR adds benchmarks for the lexer. I'm doing some work on optimizing
the lexer and I thought it'd be useful to see the effects of changes in
isolation, separate from the parser.

These benchmarks may not be ideal to keep long-term, but for now it'd be
useful.

In order to do so, it's necessary for `oxc_parser` crate to expose the
lexer, but have done that without adding it to the docs, and using an
alias `__lexer`.
2024-01-21 14:32:50 +00:00
Boshen
09c7570560
ci: use miri to detect memory leak for the parser (#2037)
We'll merge this and then eventually turn it on as a nightly check, it's
a manual run for now.
2024-01-15 15:11:02 +00:00
Boshen
4706765d2a
refactor(parser): reduce Token size from 32 to 16 bytes (#1962)
Part of #1880

`Token` size is reduced from 32 to 16 bytes by changing the previous
token value `Option<&'a str>` to a u32 index handle.

It would be nice if this handle is eliminated entirely because
the normal case for a string is always
`&source_text[token.span.start.token.span.end]`

Unfortunately, JavaScript allows escaped characters to appear in
identifiers, strings and templates. These strings need to be unescaped
for equality checks, i.e. `"\a"  === "a"`.

This leads us to adding a `escaped_strings[]` vec for storing these
unescaped and allocated
strings.

Performance regression for adding this vec should be minimal because
escaped strings are rare.

Background Reading:

* https://floooh.github.io/2018/06/17/handles-vs-pointers.html
2024-01-09 15:17:02 +08:00
overlookmotel
eb2966c512
fix(parser): fix incorrectly identified directives (#1885)
Parser incorrectly identifies string literals as directives if they
follow after `import`s, `export`s, or decorators.

In all of these cases, `'use strict'` produces a directive in the AST,
where it should be parsed as an `ExpressionStatement` containing a
`StringLiteral`:

```js
import x from 'foo';
'use strict';
```

```js
export {x};
'use strict';
```

```js
@foo
'use strict';
```


[Playground](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC0gICAgICAgIC0G8rnONK89ITJ3zrK%2FUP7OmSZPgHQzStr3yMtwFTU%2BD1WPt09JgqZJLoYooydbGsM5vGcf34BnIA%3D)

This PR should fix that.

I'm not sure about the decorator case, though. I assume it's not a
directive. But is prefixing a string literal with a decorator even legal
syntax anyway?

And a side nit: If I'm reading it right, I don't think the `continue`
statement in the decorator arm of the match does anything. Do I have
that right?

Last question: Where does one go about putting a test? I guess these
silly cases aren't covered by Babel etc's tests.

---------

Co-authored-by: Boshen <boshenc@gmail.com>
2024-01-04 13:39:15 +00:00
overlookmotel
62bc8c5cea
fix(parser): error on source larger than 4 GiB (#1860)
`Token` and `Span` both represent `start` and `end` as `u32`.

This limits size of source which can be parsed to `u32::MAX`.


19577709db/crates/oxc_span/src/span.rs (L14-L20)

However, this constraint is currently not enforced.

In a release build, code will not panic on arithmetic overflow, so
`start`/`end` could wrap around back to zero if source is 4 GiB or more.

That'd produce nonsense spans. But worse, the lexer relies in some
places on `self.current.token.start` being correct, so if the value
wrapped around, possibly it'd keep rewinding to the start of the source
and lexing it again, causing an infinite loop.

In worst case, if for some reason an application's public API used OXC's
parser with user-supplied source code (parser-as-a-service!), this could
be exploited for denial of service.

This PR adds an assertion to catch this at the start of parsing instead.

This does add an extra instruction, but I imagine the effect will be
negligible compared to the work required to parse the code.
2024-01-02 11:05:28 +08:00
Boshen
07b010912a
feat(parser): add preserve_parens option (default: true) (#1474)
closes #1461
2023-11-21 11:16:30 +08:00
Boshen
dd7749f949
improve README (#800)
closes #686

Rendered: https://github.com/web-infra-dev/oxc/blob/readme/README.md

This is a refinement for the README, which should include information
for different interests: first time reader, explorer, rust crate / napi
user etc.
2023-08-27 22:36:17 +08:00
Boshen
2f48bdf26f
fix(parser,semantic): make semantic own Trivias (#711)
closes #708

Making the parser return Rc<Trivias> is not a good API, and ideally
`Semantic` should just own `Trivias` so it can process or mutate it.
2023-08-10 15:30:32 +08:00
Boshen
608ee9116b
refactor(parser): remove portable simd because it is not stable Rust (#645)
related #626
2023-07-27 12:43:11 +08:00
Sg
2203d08199
refactor: remove unstable feature slice_as_chunks (#632) 2023-07-26 19:21:35 +08:00
Boshen
8aba8bcbb5
feat(oxc): a single oxc crate (#522) 2023-07-06 13:35:25 +08:00
Boshen
19b839efe9
perf(semantic): use IndexVec instead of indextree for ast nodes (#462) 2023-06-20 15:21:58 +08:00
Carter Snook
c0726e444f
feat(lexer): use linear lexing on WASM (#436)
Co-authored-by: Boshen <boshenc@gmail.com>
2023-06-13 15:18:02 +08:00
Carter Snook
985b8f21d9
feat: support hashbang interpreter comments (#431) 2023-06-11 23:55:58 +08:00
Carter Snook
23d2a9f6d7
chore(typo): expect -> except (#415) 2023-06-08 14:28:29 +08:00
Boshen
a8641b9921
chore(parser): move inline tests to snapshot testing 2023-05-21 12:05:25 +08:00
Boshen
14720e7c69
refactor: move SourceType from oxc_ast to oxc_span (#351)
related #350
2023-05-12 23:16:14 +08:00
Boshen
7f93e58f10
chore: remove all #[must_use] 2023-05-11 21:08:00 +08:00
Boshen
cd276c2850
feat: add oxc_span crate (#323) 2023-04-27 21:51:15 +08:00
Boshen
08dfbc98b2
fix(oxc_ast,oxc_parser): fix clippy warnings 2023-04-22 16:24:50 +08:00
Boshen
6cbfc29c90
refactor(parser): remove some useless derive(Debug) 2023-04-16 12:19:39 +08:00
Boshen
48736e53af
fix(parser): fix panic in unexpected token 2023-04-10 21:57:31 +08:00
Boshen
024f1a1552
fix(parser): fix [+In] Destructuring Binding Pattern Initializer (#267) 2023-04-06 21:47:07 +08:00
Boshen
398dbfd2a7
fix(paresr): parse [+In] in template (#266)
relates #255
2023-04-06 21:37:34 +08:00
Boshen
6360bdad31
fix(parser): fix [+in] context in CallArguments (#265)
relates #255
2023-04-06 21:02:30 +08:00
Boshen
dc090208c4
fix(parser): fix crashing on empty ParenthesizedExpression with comments (#263)
relates #232
2023-04-06 17:16:15 +08:00
Boshen
0674899b88
Fuzz async (#257)
* fix(parser): parse `async(...null)` as call expression

relates #255

* fix(parser): parse `null?async():null`

relates #255
2023-04-05 14:36:37 +08:00
Boshen
36f4a12b9f chore: update README about conformance 2023-04-02 16:41:53 +08:00
Boshen
96ad67db92
fix(parser): clean up type arguments parsing (#242)
closes #169
2023-04-02 12:06:52 +08:00
Boshen
e576b50fe2
docs(parser): add details 2023-04-02 00:23:21 +08:00
Boshen
61d2aedd43
docs(oxc_parser): add section on performance and visitor 2023-04-02 00:11:19 +08:00
Boshen
b11f774c41 refactor(oxc_parser): clean up doc 2023-04-01 19:03:33 +08:00
Boshen
d917348f9b refactor(ast,parser): move parsing context from ast to parser 2023-04-01 18:01:33 +08:00
Boshen
f2fcbb30c3 refactor(oxc_parser): removed not needed generic from unexpected function 2023-04-01 15:59:42 +08:00
Boshen
d4ff0bb40e refactor(oxc_parser): parser and lexer does not need to share the errors vec 2023-04-01 15:59:42 +08:00
Boshen
174330561c
fix(parser): fix panic on multi-byte characaters (#233)
* fix(oxc_parser): fix panic when EOF on a multi-byte character

relates #232

* fix(parser): fix panic on multi-byte char in private identifer

relates #232
2023-04-01 13:34:18 +08:00
Boshen
d232199e1c
refactor(parser): return Rc<Trivias> from TriviaBuilder (#231)
closes #229
2023-03-31 09:02:48 -07:00
Boshen
37c7b7a752
refactor(oxc_parser): simply diagnostic messages 2023-03-18 14:39:44 +08:00
Boshen
bee548b945
fix(coverage): correct the number on AST Parsed 2023-03-17 11:15:33 +08:00
Boshen
f36e3301fd
refactor(lexer): change TokenValue::String(Atom) to TokenValue::String(&str) (#174) 2023-03-13 09:33:44 +08:00
Boshen
4d32bfb55e
refactor: remove all declarations of const fn, which is useless for us 2023-03-07 21:29:47 +08:00
Shannon Rothe
6647752e03
refactor(ast): change Option<Vec> to Vec for decorators (#84)
* remove `Option<Vec>` from `FormalParameter`

* `unwrap` -> `unwrap_or_else`

* prefer `AstBuilder` helper

* implement `consume_decorators`
2023-03-02 15:52:46 +08:00
Boshen
73ea3d6361 feat(ast,lexer,linter): save and check comments 2023-02-27 12:31:57 +08:00
Boshen
915518b614 refactor(oxc_diagnostics): s/PError/Error 2023-02-26 02:02:05 +08:00
Boshen
4f4a9802b7 refactor(diagnostics,parser): move diagnostics to parser 2023-02-22 19:23:01 +08:00
Boshen
5390d3e6b4 refactor(diagnostic): change Err type to miette::Error
This is the prerequisite for breaking up the large Diagnostic enum.
2023-02-22 11:08:21 +08:00
Boshen
4c6407b152 refactor(ast): s/node/span
This corrects the jargon for span. The term `node` came from `estree`,
which is a bit misleading here in Rust.

closes #9
2023-02-21 19:17:49 +08:00