closes#2231
Submodules are a blocker for beginners, we should make it clone on
demand.
It is also a blocker for people who wants to target this repo as a crate
for testing purposes, cargo will do a full clone if you specify
oxc_parser = { git = "this repo" } in Cargo.toml
1. add a `test.js` file to the project root:
```js
class A extends B {
constructor() {
try {
super();
} finally {
this.a;
}
}
}
```
2. run:
```bash
$ cargo run -p oxc_semantic --example simple
Compiling oxc_semantic v0.5.0 (/home/tzvipm/src/github.com/tzvipm/oxc/crates/oxc_semantic)
Finished dev [unoptimized + debuginfo] target(s) in 32.07s
Running `target/debug/examples/simple`
Wrote AST to: test.ast.txt
Wrote CFG blocks to: test.cfg.txt
Wrote CFG dot diagram to: test.dot
```
3. resulting graph from .dot file:

This fixes a mistake I made in #2237.
I was confused by the `!(...)` wrapping of the preceding `if` test and
missed that there are definitely 2 chars to consume, so can use
`consume_char()` instead of `next_char()`. This makes no difference to
behavior, but it follows the convention to always prefer
`consume_char()` when possible.
I've also refactored the code which confused me, so hopefully others
won't be confused too!
Fixes#2217
Previously we were using the miette default, but given ours is a fork,
and for example now prints slightly more relevant line- and
column-numbers vs miette, we should dog-food our own have the tests tell
us if/when the output changes.
I did actually scan all the snapshot deltas and all look correct to me.
One funny one I noticed was this
```diff
diff --git a/crates/oxc_linter/src/snapshots/no_empty_file.snap b/crates/oxc_linter/src/snapshots/no_empty_file.snap
index cfc53e1c..6f001fbd 100644
--- a/crates/oxc_linter/src/snapshots/no_empty_file.snap
+++ b/crates/oxc_linter/src/snapshots/no_empty_file.snap
@@ -2,6 +2,7 @@
source: crates/oxc_linter/src/tester.rs
expression: no_empty_file
---
+
⚠ eslint-plugin-unicorn(no-empty-file): Empty files are not allowed.
╭─[no_empty_file.tsx:1:1]
╰────
@@ -29,7 +30,7 @@ expression: no_empty_file
help: Delete this file or add some code to it.
⚠ eslint-plugin-unicorn(no-empty-file): Empty files are not allowed.
- ╭─[no_empty_file.tsx:1:1]
+ ╭─[no_empty_file.tsx:0:1]
0 │
· ▲
╰────
@@ -149,4 +150,3 @@ expression: no_empty_file
╰────
help: Delete this file or add some code to it.
-
```
...which I suppose is technically correct but also a bit confusing
perhaps? Should we make the line **minimum 1**? If so I can create
another PR for that.
There is a subtle change in whitespace too - each file gains a newline
at the start but looses one at the end. My assumption is that oxc's
reporter is adding a newline at the start of each report (compared to
miette's), plus I removed the extra newline in `tester.rs` or else the
snapshot diffs would have been even larger.
Finally there are no changes to reports with *multi-line* annotations
like this:
```
⚠ typescript-eslint(ban-types): Prefer explicitly define the object shape
╭─[ban_types.tsx:1:1]
1 │ ╭─▶ const emptyObj: {
2 │ │
3 │ ╰─▶ } = {foo: "bar"};
╰────
help: This type means "any non-nullish value", which is slightly better than 'unknown', but it's still a broad type
```
Again I can create a separate PR to improve those and we should see
snapshot diffs when I make that change 😄
I'd appreciate a quick review on this one if at all possible, given the
high chance of conflict.
This PR replaces most usages of `lexer.current.chars.next()` with
`lexer.consume_char()`, or a new function `lexer.next_char()`.
This is a preparatory step towards replacing the `Chars` iterator with
something more flexible which can also consume bytes (not `char`s), and
this PR was intended as pure refactor. But surprised to see there is a
small performance bump (no idea why!).
There's an additional benefit: Using `consume_char()` everywhere where
we believe there's definitely a char there to be consumed will make
logic errors produce a panic, rather than silently outputting garbage.
This adds a separate byte handler to the lexer for byte values which
should never be encountered:
1. UTF-8 continuation bytes (i.e. middle of a multi-byte UTF-8 byte
sequence).
2. Bytes values which are illegal in valid UTF-8 strings.
At present, this function is impossible to reach, because
`std::str::Chars` ensures the next byte is always the *start* of a valid
UTF-8 byte sequence. But later changes I intend introducing unsafe code
will make it possible (but highly undesirable!). In the meantime, I
don't think it does any harm to handle this case.
This PR has a large diff, but it contains no substantive changes
whatsoever. It purely breaks up the lexer into multiple smaller files.
I've been working quite intensively on the lexer over past few weeks,
but still have been finding it hard to make sense of, due to most of the
logic currently being contained in [a single 1800-line
file](018675ceb1/crates/oxc_parser/src/lexer/mod.rs).
I feel that breaking it up into multiple files makes it much easier to
navigate and understand.
An additional benefit is that many functions can have their visibility
reduced to module scope, so sub-systems for e.g. lexing numbers have
fewer exposed functions. This makes it clearer what the entry points
are, and makes it harder to make mistakes when working on the lexer.
I intend to later make changes to the lexer for performance which will
introduce unsafe code. Keeping that unsafe code encapsulated in modules
will make it more viable to validate the workings of that code, and
avoid accidental UB.
There is one downside to this change. Previously
[`lexer/mod.rs`](018675ceb1/crates/oxc_parser/src/lexer/mod.rs)
was laid out in same order as the JS spec. If you were trying to
validate the lexer against the spec, this would make it easier. However,
as OXC's parser is fairly mature at this point, and I imagine most
spec-compliance issues have been flushed out by now, in my opinion this
advantage is less compelling than it probably used to be. So in my view
it's outweighed by the benefit of more readable code.
Reviewing this could be a bit of a battle due to the size of the diff. I
do have further changes I'd like to make, but I've intentionally kept
this PR as 100% just:
1. Moving code around.
2. Reducing visibility of functions to module/super scope where that's
possible to do without changing anything else.
Aside from that, not even a single comment has changed.
If you're willing to trust me on that promise, I think it can be merged
without poring through it line by line.