Pure refactor. Introduce `ParserImpl::alloc` method. Shorten `self.ast.alloc(...)` to `self.alloc(...)`.
Also reduce `alloc` calls by using `AstBuilder` methods which already allocate where possible.
Previously it included a newline in the value
```
"hashbang": {
"type": "Hashbang",
"start": 0,
"end": 16,
"value": "/usr/bin/node\n"
},
```
This change will also make the lexer emit a `\n` token, which will make comment position detection correct.
`UniquePromise::new_for_tests` is not used in tests, so produces a dead code warning when running tests. Prevent that.
Also, rename it to `new_for_benchmarks`, since that's where it's used.
See https://babel.dev/docs/options#misc-options for background on `unambiguous`
Once `SourceType::Unambiguous` is parsed, it will correctly set the returned `Program::source_type` to either `module` or `script`.
This PR serves two purposes, First off it would lower the amount of characters we have to type in for a simple operation such as wrapping an expression in a vector. Secondly, it would follow the generated names more closely since nowhere else in the builder we do have `new_xxx`, We always say `xxx` since a builder always constructs something.
```
new_vec -> vec
new_vec_single -> vec1*
new_vec_from_iter -> vec_from_iter
new_vec_with_capacity -> vec_with_capacity
new_str -> str
new_atom -> atom
```
`*` This one is the main motivation behind this PR, It saves 10 characters!
part of #3213
We should only have one diagnostic struct instead 353 copies of them, so we don't end up choking LLVM with 50k lines of the same code due to monomorphization.
If the proposed approach is good, then I'll start writing a codemod to turn all the existing structs to plain functions.
---
Background:
Using `--timings`, we see `oxc_linter` is slow on codegen (the purple part).

The crate currently contains 353 miette errors. [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) displays
```
cargo llvm-lines -p oxc_linter --lib --release
Lines Copies Function name
----- ------ -------------
830350 33438 (TOTAL)
29252 (3.5%, 3.5%) 808 (2.4%, 2.4%) <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
23298 (2.8%, 6.3%) 353 (1.1%, 3.5%) miette::eyreish::error::object_downcast
19062 (2.3%, 8.6%) 706 (2.1%, 5.6%) core::error::Error::type_id
12610 (1.5%, 10.1%) 65 (0.2%, 5.8%) alloc::raw_vec::RawVec<T,A>::grow_amortized
12002 (1.4%, 11.6%) 706 (2.1%, 7.9%) miette::eyreish::ptr::Own<T>::boxed
9215 (1.1%, 12.7%) 115 (0.3%, 8.2%) core::iter::traits::iterator::Iterator::try_fold
9150 (1.1%, 13.8%) 1 (0.0%, 8.2%) oxc_linter::rules::RuleEnum::read_json
8825 (1.1%, 14.9%) 353 (1.1%, 9.3%) <miette::eyreish::error::ErrorImpl<E> as core::error::Error>::source
8822 (1.1%, 15.9%) 353 (1.1%, 10.3%) miette::eyreish::error::<impl miette::eyreish::Report>::construct
8119 (1.0%, 16.9%) 353 (1.1%, 11.4%) miette::eyreish::error::object_ref
8119 (1.0%, 17.9%) 353 (1.1%, 12.5%) miette::eyreish::error::object_ref_stderr
7413 (0.9%, 18.8%) 353 (1.1%, 13.5%) <miette::eyreish::error::ErrorImpl<E> as core::fmt::Display>::fmt
7413 (0.9%, 19.7%) 353 (1.1%, 14.6%) miette::eyreish::ptr::Own<T>::new
6669 (0.8%, 20.5%) 39 (0.1%, 14.7%) alloc::raw_vec::RawVec<T,A>::try_allocate_in
6173 (0.7%, 21.2%) 353 (1.1%, 15.7%) miette::eyreish::error::<impl miette::eyreish::Report>::from_std
6027 (0.7%, 21.9%) 70 (0.2%, 16.0%) <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
6001 (0.7%, 22.7%) 353 (1.1%, 17.0%) miette::eyreish::error::object_drop
6001 (0.7%, 23.4%) 353 (1.1%, 18.1%) miette::eyreish::error::object_drop_front
5648 (0.7%, 24.1%) 353 (1.1%, 19.1%) <miette::eyreish::error::ErrorImpl<E> as core::fmt::Debug>::fmt
```
It's totalling more than 50k llvm lines, and is putting pressure on rustc codegen (the purple part on `oxc_linter` in the image above.
---
It's pretty obvious by looking at https://github.com/zkat/miette/blob/main/src/eyreish/error.rs, the generics can expand out to lots of code.
Should be merged after #2829, Tried a few times to get it done with
graphite stacking but found no success. I guess it either doesn't work
with forks or It is just a skill issue since I'm not familiar with it.
closes: #2829closes: #2830
---------
Co-authored-by: Dmytro Maretskyi <maretskii@gmail.com>
This gets all the new TS types working to the same level TS output was
before and fixes a bunch of other codegen
---------
Co-authored-by: Boshen <boshenc@gmail.com>
Introduce invariant that only a single `lexer::Source` can exist on a thread at one time.
This is a preparatory step for #2341.
2 notes:
Restriction is only 1 x `ParserImpl` / `Lexer` / `Source` on 1 *thread* at a time, not globally. So this does not prevent parsing multiple files simultaneously on different threads.
Restriction does not apply to public type `Parser`, only `ParserImpl`. `ParserImpl`s are not created in created in `Parser::new`, but instead in `Parser::parse`, where they're created and then immediately consumed. So the end user is also free to create multiple `Parser` instances (if they want to for some reason) on the same thread.
Split parser into public interface `Parser` and internal implementation `ParserImpl`.
This involves no changes to public API.
This change is a bit annoying, but justification is that it's required for #2341, which I believe to be very worthwhile.
The `ParserOptions` type also makes it a bit clearer what the defaults for `allow_return_outside_function` and `preserve_parens` are. It came as a surprise to me that `preserve_parens` defaults to `true`, and this refactor makes that a bit more obvious when reading the code.
All the real changes are in [oxc_parser/src/lib.rs](https://github.com/oxc-project/oxc/pull/2339/files#diff-8e59dfd35fc50b6ac9a9ccd991e25c8b5d30826e006d565a2e01f3d15dc5f7cb). The rest of the diff is basically replacing `Parser` with `ParserImpl` everywhere else.
This change makes little difference in itself, but moving the check into
the lexer will allow some optimizations in lexer using unsafe code which
depend on this invariant.
Maximum length of source parser can accept is limited on 32-bit systems
to `isize::MAX` (i.e. `i32::MAX` not `u32::MAX`) because Rust [limits
the size of
allocations](https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.from_size_align)
to `isize::MAX`.
This PR takes that constraint into account when calculating
`Parser::MAX_LEN`.
It also speeds up the `overlong_source` test so it runs in under 500ms
(previously it took ~4 secs on a M1 Macbook Pro).
This PR adds benchmarks for the lexer. I'm doing some work on optimizing
the lexer and I thought it'd be useful to see the effects of changes in
isolation, separate from the parser.
These benchmarks may not be ideal to keep long-term, but for now it'd be
useful.
In order to do so, it's necessary for `oxc_parser` crate to expose the
lexer, but have done that without adding it to the docs, and using an
alias `__lexer`.
Part of #1880
`Token` size is reduced from 32 to 16 bytes by changing the previous
token value `Option<&'a str>` to a u32 index handle.
It would be nice if this handle is eliminated entirely because
the normal case for a string is always
`&source_text[token.span.start.token.span.end]`
Unfortunately, JavaScript allows escaped characters to appear in
identifiers, strings and templates. These strings need to be unescaped
for equality checks, i.e. `"\a" === "a"`.
This leads us to adding a `escaped_strings[]` vec for storing these
unescaped and allocated
strings.
Performance regression for adding this vec should be minimal because
escaped strings are rare.
Background Reading:
* https://floooh.github.io/2018/06/17/handles-vs-pointers.html