1. add a `test.js` file to the project root:
```js
class A extends B {
constructor() {
try {
super();
} finally {
this.a;
}
}
}
```
2. run:
```bash
$ cargo run -p oxc_semantic --example simple
Compiling oxc_semantic v0.5.0 (/home/tzvipm/src/github.com/tzvipm/oxc/crates/oxc_semantic)
Finished dev [unoptimized + debuginfo] target(s) in 32.07s
Running `target/debug/examples/simple`
Wrote AST to: test.ast.txt
Wrote CFG blocks to: test.cfg.txt
Wrote CFG dot diagram to: test.dot
```
3. resulting graph from .dot file:

This fixes a mistake I made in #2237.
I was confused by the `!(...)` wrapping of the preceding `if` test and
missed that there are definitely 2 chars to consume, so can use
`consume_char()` instead of `next_char()`. This makes no difference to
behavior, but it follows the convention to always prefer
`consume_char()` when possible.
I've also refactored the code which confused me, so hopefully others
won't be confused too!
Fixes#2217
Previously we were using the miette default, but given ours is a fork,
and for example now prints slightly more relevant line- and
column-numbers vs miette, we should dog-food our own have the tests tell
us if/when the output changes.
I did actually scan all the snapshot deltas and all look correct to me.
One funny one I noticed was this
```diff
diff --git a/crates/oxc_linter/src/snapshots/no_empty_file.snap b/crates/oxc_linter/src/snapshots/no_empty_file.snap
index cfc53e1c..6f001fbd 100644
--- a/crates/oxc_linter/src/snapshots/no_empty_file.snap
+++ b/crates/oxc_linter/src/snapshots/no_empty_file.snap
@@ -2,6 +2,7 @@
source: crates/oxc_linter/src/tester.rs
expression: no_empty_file
---
+
⚠ eslint-plugin-unicorn(no-empty-file): Empty files are not allowed.
╭─[no_empty_file.tsx:1:1]
╰────
@@ -29,7 +30,7 @@ expression: no_empty_file
help: Delete this file or add some code to it.
⚠ eslint-plugin-unicorn(no-empty-file): Empty files are not allowed.
- ╭─[no_empty_file.tsx:1:1]
+ ╭─[no_empty_file.tsx:0:1]
0 │
· ▲
╰────
@@ -149,4 +150,3 @@ expression: no_empty_file
╰────
help: Delete this file or add some code to it.
-
```
...which I suppose is technically correct but also a bit confusing
perhaps? Should we make the line **minimum 1**? If so I can create
another PR for that.
There is a subtle change in whitespace too - each file gains a newline
at the start but looses one at the end. My assumption is that oxc's
reporter is adding a newline at the start of each report (compared to
miette's), plus I removed the extra newline in `tester.rs` or else the
snapshot diffs would have been even larger.
Finally there are no changes to reports with *multi-line* annotations
like this:
```
⚠ typescript-eslint(ban-types): Prefer explicitly define the object shape
╭─[ban_types.tsx:1:1]
1 │ ╭─▶ const emptyObj: {
2 │ │
3 │ ╰─▶ } = {foo: "bar"};
╰────
help: This type means "any non-nullish value", which is slightly better than 'unknown', but it's still a broad type
```
Again I can create a separate PR to improve those and we should see
snapshot diffs when I make that change 😄
I'd appreciate a quick review on this one if at all possible, given the
high chance of conflict.
This PR replaces most usages of `lexer.current.chars.next()` with
`lexer.consume_char()`, or a new function `lexer.next_char()`.
This is a preparatory step towards replacing the `Chars` iterator with
something more flexible which can also consume bytes (not `char`s), and
this PR was intended as pure refactor. But surprised to see there is a
small performance bump (no idea why!).
There's an additional benefit: Using `consume_char()` everywhere where
we believe there's definitely a char there to be consumed will make
logic errors produce a panic, rather than silently outputting garbage.
This adds a separate byte handler to the lexer for byte values which
should never be encountered:
1. UTF-8 continuation bytes (i.e. middle of a multi-byte UTF-8 byte
sequence).
2. Bytes values which are illegal in valid UTF-8 strings.
At present, this function is impossible to reach, because
`std::str::Chars` ensures the next byte is always the *start* of a valid
UTF-8 byte sequence. But later changes I intend introducing unsafe code
will make it possible (but highly undesirable!). In the meantime, I
don't think it does any harm to handle this case.
This PR has a large diff, but it contains no substantive changes
whatsoever. It purely breaks up the lexer into multiple smaller files.
I've been working quite intensively on the lexer over past few weeks,
but still have been finding it hard to make sense of, due to most of the
logic currently being contained in [a single 1800-line
file](018675ceb1/crates/oxc_parser/src/lexer/mod.rs).
I feel that breaking it up into multiple files makes it much easier to
navigate and understand.
An additional benefit is that many functions can have their visibility
reduced to module scope, so sub-systems for e.g. lexing numbers have
fewer exposed functions. This makes it clearer what the entry points
are, and makes it harder to make mistakes when working on the lexer.
I intend to later make changes to the lexer for performance which will
introduce unsafe code. Keeping that unsafe code encapsulated in modules
will make it more viable to validate the workings of that code, and
avoid accidental UB.
There is one downside to this change. Previously
[`lexer/mod.rs`](018675ceb1/crates/oxc_parser/src/lexer/mod.rs)
was laid out in same order as the JS spec. If you were trying to
validate the lexer against the spec, this would make it easier. However,
as OXC's parser is fairly mature at this point, and I imagine most
spec-compliance issues have been flushed out by now, in my opinion this
advantage is less compelling than it probably used to be. So in my view
it's outweighed by the benefit of more readable code.
Reviewing this could be a bit of a battle due to the size of the diff. I
do have further changes I'd like to make, but I've intentionally kept
this PR as 100% just:
1. Moving code around.
2. Reducing visibility of functions to module/super scope where that's
possible to do without changing anything else.
Aside from that, not even a single comment has changed.
If you're willing to trust me on that promise, I think it can be merged
without poring through it line by line.
A faster way to calculate offset in the lexer.
This only moves the needle because it's on the hottest path in the lexer
- `Lexer::offset` is called for every token in `Lexer::read_next_token`.
Attempt to improve the graphical reporter source file **line number**
and **column number** .
Currently `oxlint` renders the **line number** as the first line in the
snippet **column number** is always `1` AFAICT. Example of current
version on VS Code repo:

This means that when CTRL-clicking on the source-file:line:col "link" it
takes me to the wrong place.
This PR is an attempt to improve things by getting closer to the source
of the issue. Here is my version again on VS Code:

I'm a bit worried that none of the tests failed though - would that be
expected?
Also the approach feels naïve and there may be several cases it won't
work for, or won't work quite so well. Specifically it's just finding
the first labelled-span, but I'm guessing in some cases that could be
providing context, rather than the thing the user would want to change.
As before I'm no expert and any feedback is very much appreciated.
Also in my local testing, VS Code is still a bit unpredictable - even
with the correct line + col it will always go to the correct line number
but doesn't always go to the correct column 🤷 But that's not our issue 😄
All the ASCII `ByteHandler`s are unsafe to call. I forgot to mark them
as unsafe when making that change.
This PR fixes that, and will make it harder for someone to accidentally
call one of them without considering the safety invariants.
Counter-intuitively, it seems that *increasing* the size of `Token`
improves performance slightly.
This appears to be because when `Token` is 16 bytes, copying `Token` is
a single 16-byte load/store. At present, it's 12 bytes which requires an
8-byte load/store + a 4-byte load/store.
https://godbolt.org/z/KPYsn3ab7
This suggests that either:
1. #2010 could be reverted at no cost, and the overhead of the hash
table removed.
or:
2. We need to get `Token` down to 8 bytes!
I have an idea how to *maybe* do (2), so I'd suggest leaving it as is
for now until I've been able to research that.
NB I also tried putting `#[repr(align(16))]` on `Token` so that copying
uses aligned loads/stores. That [hurt the benchmarks very
slightly](https://codspeed.io/overlookmotel/oxc/branches/lexer-pad-token),
though it might produce a gain on architectures where unaligned loads
are more expensive (ARM64 I think?). But I can't test that theory, so
have left it out.
This change makes little difference in itself, but moving the check into
the lexer will allow some optimizations in lexer using unsafe code which
depend on this invariant.
Maximum length of source parser can accept is limited on 32-bit systems
to `isize::MAX` (i.e. `i32::MAX` not `u32::MAX`) because Rust [limits
the size of
allocations](https://doc.rust-lang.org/std/alloc/struct.Layout.html#method.from_size_align)
to `isize::MAX`.
This PR takes that constraint into account when calculating
`Parser::MAX_LEN`.
It also speeds up the `overlong_source` test so it runs in under 500ms
(previously it took ~4 secs on a M1 Macbook Pro).
I think `UnusedLabeled` can do more than that.
1. Collect unused label
2. Support check duplication label
3. Support check label in `BreakStatement`
4. Support check label in `ContinueStatement` (Not yet)
But then the `UnusedLabeled` name wouldn't fit, so I renamed it
`LabelBuilder` and moved it to `label.rs`