Parenthesized assignment expressions fail to parse after a certain AST size #34

Closed
opened 2024-04-23 13:36:06 +02:00 by szuend · 2 comments
szuend commented 2024-04-23 13:36:06 +02:00 (Migrated from github.com)

Side-note: I apologize for filing all these bugs, I'm currently exploring replacing Acorn with Lezer in DevTools for pretty-printing and I encounter some bugs when parsing minified code. Feel free to close as "Wont'fix".

The following snippet fails to parse:

(a=b(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41, 42))

Note that it's sensitive to the number of arguments. Removing the 42 results in a successful parse. Adding more arguments fails the parse.

Note that it's not necessarily the argument list. The following snippet also fails the parse:

(a = function(b,c,d) {
  const e = 1 + 1 + 1;
  const f = 2 + 2 + 2;
  const g = 3 + 3 + 3;
  const h = 4 + 4 + 4;
  const i = 5 + 5 + 5;
  const j = 6 + 6 + 6;
  const k = 7 + 7 + 7;
  const l = 8 + 8 + 8;
}(1,2,3))

Again it's sensitive to the number of AST nodes inside the function. Deleting one ore more lines makes the parse successful. Adding more const m = ... keeps it failing.

I enabled the debug logging and seems that in the failing case it gets stuck trying to reduce with a ParamList (instead of an ArgList) for the CallExpression, but that might just be a red herring.

Side-note: I apologize for filing all these bugs, I'm currently exploring replacing Acorn with Lezer in DevTools for pretty-printing and I encounter some bugs when parsing minified code. Feel free to close as "Wont'fix". The following snippet fails to parse: ``` (a=b(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41, 42)) ``` Note that it's sensitive to the number of arguments. Removing the `42` results in a successful parse. Adding more arguments fails the parse. Note that it's not necessarily the argument list. The following snippet also fails the parse: ``` (a = function(b,c,d) { const e = 1 + 1 + 1; const f = 2 + 2 + 2; const g = 3 + 3 + 3; const h = 4 + 4 + 4; const i = 5 + 5 + 5; const j = 6 + 6 + 6; const k = 7 + 7 + 7; const l = 8 + 8 + 8; }(1,2,3)) ``` Again it's sensitive to the number of AST nodes inside the function. Deleting one ore more lines makes the parse successful. Adding more `const m = ...` keeps it failing. I enabled the debug logging and seems that in the failing case it gets stuck trying to reduce with a `ParamList` (instead of an `ArgList`) for the CallExpression, but that might just be a red herring.
marijnh commented 2024-04-23 13:53:38 +02:00 (Migrated from github.com)

After a given amount of tokens in which multiple parses run alongside each other, @lezer/lr will drop one parallel parse even if both parses fully match the input (to avoid situations where huge stretches of input get parsed in multiple ways due to allowed ambiguity in the grammar). In this case this might be an arrow function parameter list or parenthesized expression, both of which could potentially go on, in a syntactically valid way, for megabytes (and even branch out to more inner ambiguities). I don't really see a way to avoid this issue, with our current architecture.

After a given amount of tokens in which multiple parses run alongside each other, @lezer/lr will drop one parallel parse even if both parses fully match the input (to avoid situations where huge stretches of input get parsed in multiple ways due to allowed ambiguity in the grammar). In this case this might be an arrow function parameter list or parenthesized expression, both of which could potentially go on, in a syntactically valid way, for megabytes (and even branch out to more inner ambiguities). I don't really see a way to avoid this issue, with our current architecture.
szuend commented 2024-04-23 14:10:41 +02:00 (Migrated from github.com)

Ack, then let's close this bug if this is a known limitation.

Ack, then let's close this bug if this is a known limitation.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lezer/javascript#34
No description provided.