Make entity parsing conform to XML standard #3

Merged
siefkenj merged 1 commit from entities into main 2023-09-05 09:52:56 +02:00
siefkenj commented 2023-09-03 17:11:02 +02:00 (Migrated from github.com)

The XML specification says entity references must have a valid identifier as their name. This PR adjusts the grammar to match the specification. The grammar will now produce parse errors when invalidly-named entities are used.

The XML specification says entity references must have a valid identifier as their name. This PR adjusts the grammar to match the specification. The grammar will now produce parse errors when invalidly-named entities are used.
marijnh commented 2023-09-04 08:16:21 +02:00 (Migrated from github.com)

This is not a validator, though. In what way is having error nodes in the tree in this case an improvement over the existing behavior?

This is not a validator, though. In what way is having error nodes in the tree in this case an improvement over the existing behavior?
siefkenj commented 2023-09-04 14:57:44 +02:00 (Migrated from github.com)

It is helpful for syntax highlighting in a text editor, which is where I came across the issue. One could add more special characters to the existing exceptions list to leave something slightly more general (e.g. &foo\n; and &foo&bar; &foo!bar; are currently accepted as entities, which is quite surprising), but it seems by that point you're already pretty close to the standard anyways.

It is helpful for syntax highlighting in a text editor, which is where I came across the issue. One could add more special characters to the existing exceptions list to leave something slightly more general (e.g. `&foo\n;` and `&foo&bar;` `&foo!bar;` are currently accepted as entities, which is quite surprising), but it seems by that point you're already pretty close to the standard anyways.
marijnh commented 2023-09-05 09:54:11 +02:00 (Migrated from github.com)

Fair enough. I guess XML is well-defined enough that people aren't using weird dialects that'd be impacted by this. Merged and followed up with 2e8b402

Fair enough. I guess XML is well-defined enough that people aren't using weird dialects that'd be impacted by this. Merged and followed up with 2e8b402
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lezer/xml!3
No description provided.