LineComments get incorrectly nested following an IfStatement #51

Closed
opened 2025-11-03 13:35:02 +01:00 by smahs · 3 comments
smahs commented 2025-11-03 13:35:02 +01:00 (Migrated from github.com)

Consider this code:

const fn = (x: boolean) => {
  if (x) {}
  // comment
  // comment
}

This get parsed as (this is a custom printing format which uses a prefix of 2 spaces to print nested child nodes, if there is a preferred format, please comment and I can update it):

Script (from 0 to 68)
  VariableDeclaration (from 0 to 68)
    const (from 0 to 5)
    VariableDefinition (from 6 to 8)
    Equals (from 9 to 10)
    ArrowFunction (from 11 to 68)
      ParamList (from 11 to 23)
        ( (from 11 to 12)
        VariableDefinition (from 12 to 13)
        TypeAnnotation (from 13 to 22)
          : (from 13 to 14)
          TypeName (from 15 to 22)
        ) (from 22 to 23)
      Arrow (from 24 to 26)
      Block (from 27 to 68)
        { (from 27 to 28)
        IfStatement (from 31 to 40)
          if (from 31 to 33)
          ParenthesizedExpression (from 34 to 37)
            ( (from 34 to 35)
            VariableName (from 35 to 36)
            ) (from 36 to 37)
          Block (from 38 to 40)
            { (from 38 to 39)
            } (from 39 to 40)
          LineComment (from 43 to 53)
          LineComment (from 56 to 66)
        } (from 67 to 68)

For comparison, this is how tree-sitter parses it:

(program ; [0, 0] - [5, 0]
  (lexical_declaration ; [0, 0] - [4, 1]
    (variable_declarator ; [0, 6] - [4, 1]
      name: (identifier) ; [0, 6] - [0, 8]
      value: (arrow_function ; [0, 11] - [4, 1]
        parameters: (formal_parameters ; [0, 11] - [0, 23]
          (required_parameter ; [0, 12] - [0, 22]
            pattern: (identifier) ; [0, 12] - [0, 13]
            type: (type_annotation ; [0, 13] - [0, 22]
              (predefined_type)))) ; [0, 15] - [0, 22]
        body: (statement_block ; [0, 27] - [4, 1]
          (if_statement ; [1, 2] - [1, 11]
            condition: (parenthesized_expression ; [1, 5] - [1, 8]
              (identifier)) ; [1, 6] - [1, 7]
            consequence: (statement_block)) ; [1, 9] - [1, 11]
          (comment) ; [2, 2] - [2, 12]
          (comment)))))) ; [3, 2] - [3, 12]

The comments are within body but outside the if_statement.

I also noticed that this only happens if there are two or more lines of comments. If there is a single comment line, then it gets parsed correctly:

const fn = (x: boolean) => {
  if (x) {}
  // comment
}
Script (from 0 to 55)
  VariableDeclaration (from 0 to 55)
    ArrowFunction (from 11 to 55)
      Block (from 27 to 55)
        IfStatement (from 31 to 40)
        LineComment (from 43 to 53)
Consider this code: ```ts const fn = (x: boolean) => { if (x) {} // comment // comment } ``` This get parsed as (this is a custom printing format which uses a prefix of 2 spaces to print nested child nodes, if there is a preferred format, please comment and I can update it): ``` Script (from 0 to 68) VariableDeclaration (from 0 to 68) const (from 0 to 5) VariableDefinition (from 6 to 8) Equals (from 9 to 10) ArrowFunction (from 11 to 68) ParamList (from 11 to 23) ( (from 11 to 12) VariableDefinition (from 12 to 13) TypeAnnotation (from 13 to 22) : (from 13 to 14) TypeName (from 15 to 22) ) (from 22 to 23) Arrow (from 24 to 26) Block (from 27 to 68) { (from 27 to 28) IfStatement (from 31 to 40) if (from 31 to 33) ParenthesizedExpression (from 34 to 37) ( (from 34 to 35) VariableName (from 35 to 36) ) (from 36 to 37) Block (from 38 to 40) { (from 38 to 39) } (from 39 to 40) LineComment (from 43 to 53) LineComment (from 56 to 66) } (from 67 to 68) ``` For comparison, this is how tree-sitter parses it: ``` (program ; [0, 0] - [5, 0] (lexical_declaration ; [0, 0] - [4, 1] (variable_declarator ; [0, 6] - [4, 1] name: (identifier) ; [0, 6] - [0, 8] value: (arrow_function ; [0, 11] - [4, 1] parameters: (formal_parameters ; [0, 11] - [0, 23] (required_parameter ; [0, 12] - [0, 22] pattern: (identifier) ; [0, 12] - [0, 13] type: (type_annotation ; [0, 13] - [0, 22] (predefined_type)))) ; [0, 15] - [0, 22] body: (statement_block ; [0, 27] - [4, 1] (if_statement ; [1, 2] - [1, 11] condition: (parenthesized_expression ; [1, 5] - [1, 8] (identifier)) ; [1, 6] - [1, 7] consequence: (statement_block)) ; [1, 9] - [1, 11] (comment) ; [2, 2] - [2, 12] (comment)))))) ; [3, 2] - [3, 12] ``` The comments are within `body` but outside the `if_statement`. I also noticed that this only happens if there are two or more lines of comments. If there is a single comment line, then it gets parsed correctly: ``` const fn = (x: boolean) => { if (x) {} // comment } Script (from 0 to 55) VariableDeclaration (from 0 to 55) ArrowFunction (from 11 to 55) Block (from 27 to 55) IfStatement (from 31 to 40) LineComment (from 43 to 53) ```
marijnh commented 2025-11-03 15:00:26 +01:00 (Migrated from github.com)

This isn't what I'm seeing (the comment nodes are outside the IfStatement node when I run this). Could it be that you're using an old @lezer/lr package?

This isn't what I'm seeing (the comment nodes are outside the `IfStatement` node when I run this). Could it be that you're using an old @lezer/lr package?
marijnh commented 2025-11-03 15:02:52 +01:00 (Migrated from github.com)

Oh, actually, it seems I was testing with patch github.com/lezer-parser/lr@3eaa5d3751 , but that hadn't been released yet. Try with @lezer/lr 1.4.3

Oh, actually, it seems I was testing with patch https://github.com/lezer-parser/lr/commit/3eaa5d3751dc0a865421f78494db1882c4fbeb78 , but that hadn't been released yet. Try with @lezer/lr 1.4.3
smahs commented 2025-11-03 15:17:37 +01:00 (Migrated from github.com)

The patch fixed it. Thanks.

The patch fixed it. Thanks.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lezer/javascript#51
No description provided.