Fix reducePos for zero-length contextual tokens #65
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix-zero-length-reducepos"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Zero-length tokens produced by contextual tokenizers should still advance
reducePos. Whenend == this.pos,posdoesn’t change, butreducePosmust so that repeat reductions compute correct sizes.This patch updates
reducePosfor non-skipped tokens regardless of whetherend > this.pos, fixing negative-size repeat nodes and buffer reordering.Root cause
Stack.shiftonly updatedreducePosinside theend > this.posbranch. For zero-length tokens,posstays the same andreducePosis left behind. When a+/*repeat reduces, it computessize = reducePos - start, which becomes negative.storeNodethen sees a node withend < startand can move it before skipped tokens, leading to TreeBuffer underflow (e.g. comment nodes ending up at 65536).Reproduction (minimal)
A contextual tokenizer emits a zero-length
statementEndtoken and a skip token for line comments.Grammar sketch:
Inputs:
# Comment\nname→ OK# Comment\n\nname→ BUG (comment node positioned at 65536; TreeBuffer underflow)With this change, both parse correctly.
Why this repro matters
statementEndis used as a disambiguator in a language I’m working on that is not fully newline-sensitive. Line terminators are skipped, and a contextual tokenizer inserts a zero-lengthstatementEndonly when a line break can terminate a statement and the next token does not indicate a line continuation (open delimiters, infix operators, certain keywords, etc.).The grammar then allows
statementEnd*/statementEnd+between items to tolerate blank lines and comments.That’s the same shape as the minimal repro: a zero-length
statementEnd, skipped comments/newlines, and repeated separators.Tests
@lezer/lrdoesn’t have tests; verified via the minimal repro above.This seems like a good idea. Attached patch does this in a slightly simplified way.
This fix caused a regression (JavaScript statements with an inserted semicolon suddenly started covering the whitespace after the statement). Attached patch tweaks it. Could you verify that this doesn't reintroduce the issue you were having?
Pull request closed