The sourcemap implement port from
[rust-sourcemap](https://github.com/getsentry/rust-sourcemap), but has
some different with it.
- Encode sourcemap at parallel, including quote `sourceContent` and
encode token to `vlq` mappings.
- Avoid `Sourcemap` some methods overhead, like `SourceMap::tokens()`
caused extra overhead at common cases. Here using `SourceViewToken` to
instead of it.
Change the `profile.dev.debug` value from `limited` to `1` which is the
same thing according to
[this](https://doc.rust-lang.org/cargo/reference/profiles.html#debug).
For some reason, the numeric value was failing when running the codspeed
benchmark.
------
#### Edit:
it was resulting in the following error:
```
failed to parse manifest at `/home/runner/work/oxc/oxc/Cargo.toml`
```
Related to #2812
Unlike on other OS, on Windows there is no wildcard expansion/globbing
by the shell. Instead the application has to handle this. Therefore I
used the `glob` package to handle wildcards on Windows.
I also had to make the parent directory check more strict due to the
glob package resolving `..` in the middle of the path as well.
This closes#2695.
This PR re-implements lexing identifiers with a fast path for the most common case - identifiers which are pure ASCII characters, using the new `Source` / `SourcePosition` APIs.
Lexing identifiers is a hot path, and accounts for the majority of the time the Lexer spends. The performance bump from this change is (if I do say so myself!) quite decent.
I've spent a lot of time tuning the implementation, which gained a further 10-15% on the Lexer benchmarks compared to my first, simpler attempt. Some of the design decisions, if they look odd, are likely motivated by gains in performance.
### Techniques
This implementation uses a few different strategies for performance:
* Search byte-by-byte, not char-by-char.
* Process batches of 32 bytes at a time to reduce bounds checks.
* Mark uncommon paths `#[cold]`.
### Structure
The implementation is built in 3 layers:
1. ASCII characters only.
2. ASCII and Unicode characters.
3. `\` escape sequences (and all the above).
`identifier_name_handler` starts at the top layer, and is optimized for consuming ASCII as fast as possible. Each "layer" is considered more uncommon than the previous, and dropping down a layer is a de-opt.
I'm assuming that 95%+ of JavaScript code does not include either Unicode characters or escapes in identifiers, so the speed of the fast path is prioritised.
That said, once a Unicode character is encountered, the next layer does expect to find further Unicode characters, rather than de-opting over and over again. If an identifier *starts* with a Unicode character, it enters the code straight on the 2nd layer, so is not penalised by going through a `#[cold]` boundary. Lexing Unicode is never going to be as fast as ASCII, but still I felt it was important not to penalise it unnecessarily, so as not to be Anglo-centric.
### ASCII search macro
The main ASCII search is implemented as a macro. I found that, for reasons I don't understand, it's significantly faster to have all the code in a single function, even compared to multiple functions marked `#[inline]` or `#[inline(always)]`. The fastest implementation also requires some code to be repeated twice, which is nicer to do with a macro.
This macro, and the `ByteMatchTable` types that go with it, are designed to be re-usable. Next step will be to apply them for whitespace and strings, which should be fairly simple.
Searching in batches of 32 bytes is also designed to be forward-compatible with SIMD.
### Bye bye `AutoCow`
`AutoCow` is removed. Instead, a string-builder is only created if it's needed, when a `\` escape is first encountered. The string builder is also more efficient than `AutoCow` was, as it copies bytes in chunks, rather than 1-by-1.
This won't make much difference for identifiers, as escapes are so rare anyway, but this same technique can be used for strings, where they're more common.
closes#1803
This string is currently unsafe, but I want to get miri working before
introducing more changes.
I want to make a progress from memory leak to unsafe then to safety.
It's harder to do the steps in one go.