Proposed exports #16

Open
opened 2022-05-18 08:20:52 +02:00 by willcrichton · 4 comments
willcrichton commented 2022-05-18 08:20:52 +02:00 (Migrated from github.com)

For Nota, I have implemented a new syntax as a Markdown extension using the interface in @lezer-parser/markdown: https://github.com/nota-lang/nota/blob/markdown/packages/nota-syntax/lib/parse.ts

Everything worked quite well -- great job on the flexible API design! But I ran into a few cases where I needed functionality that's currently private to markdown.ts. I'd like to show what those cases are, and in turn propose they be exported.

  1. Type should be exported from index.ts: I need Type in order to post-process the Markdown AST. However, although Type is exported from markdown.ts, it's not accessible through the module root. My bundler can't seem to resolve import "@lezer/markdown/dist/markdown" so it's not easy for me to import it from the file directly.
  2. BlockResult should be exported: this type appears in a public interface (BlockParser.parse), so it would be nice to name it directly.
  3. InlineContext constructor should be exported: similar to the problems described in #6, I had issues dealing with relative vs. global indexes inside block parsers. My solution was to just construct an InlineContext for a given line, use absolute indexes everywhere, and then let the context methods handle the translation to relative indexes for me. But this strategy requires access to the InlineContext constructor, which isn't exported.

If you're ok with any of these, let me know and I will put up a pull request.

For [Nota](https://github.com/nota-lang/nota/), I have implemented a new syntax as a Markdown extension using the interface in @lezer-parser/markdown: https://github.com/nota-lang/nota/blob/markdown/packages/nota-syntax/lib/parse.ts Everything worked quite well -- great job on the flexible API design! But I ran into a few cases where I needed functionality that's currently private to `markdown.ts`. I'd like to show what those cases are, and in turn propose they be exported. 1. **`Type` should be exported from `index.ts`**: I need `Type` in order to post-process the Markdown AST. However, although `Type` is exported from `markdown.ts`, it's not accessible through the module root. My bundler can't seem to resolve `import "@lezer/markdown/dist/markdown"` so it's not easy for me to import it from the file directly. 2. **`BlockResult` should be exported**: this type appears in a public interface (`BlockParser.parse`), so it would be nice to name it directly. 3. **`InlineContext` constructor should be exported**: similar to the problems described in #6, I had issues dealing with relative vs. global indexes inside block parsers. My solution was to just construct an `InlineContext` for a given line, use absolute indexes everywhere, and then let the context methods handle the translation to relative indexes for me. But this strategy requires access to the `InlineContext` constructor, which isn't exported. If you're ok with any of these, let me know and I will put up a pull request.
marijnh commented 2022-05-18 10:49:30 +02:00 (Migrated from github.com)

Since Type only covers a part of the node types in an extended parser, and the library provides separate vocabulary for defining node types, I intentionally kept that enum out of the public interface. You can construct something like it, for a specific parser, by iterating through parser.nodeSet.types and building up an object. Or, if you want to refer to node types statically, use their names. Does that help?

BlockResult should be exported:

It's boolean | null, which is only a few characters more than the name. What is gained by exporting this?

My solution was to just construct an InlineContext for a given line

There's a bunch of other internal stuff that would probably have to be exposed for this (such as resolveMarkers) and I'm not sure I want to complicate the public interface for this rather specific use. What kind of block are you defining here, and where do the problems with relative/absolute indices come from?

Since `Type` only covers a part of the node types in an extended parser, and the library provides separate vocabulary for defining node types, I intentionally kept that enum out of the public interface. You can construct something like it, for a specific parser, by iterating through `parser.nodeSet.types` and building up an object. Or, if you want to refer to node types statically, use their names. Does that help? > BlockResult should be exported: It's `boolean | null`, which is only a few characters more than the name. What is gained by exporting this? > My solution was to just construct an InlineContext for a given line There's a bunch of other internal stuff that would probably have to be exposed for this (such as `resolveMarkers`) and I'm not sure I want to complicate the public interface for this rather specific use. What kind of block are you defining here, and where do the problems with relative/absolute indices come from?
willcrichton commented 2022-05-18 17:59:42 +02:00 (Migrated from github.com)

You can construct something like it, for a specific parser, by iterating through parser.nodeSet.types and building up an object.

I can just do this if you want to keep Type private. I already do this to extract the numeric type ids for my Markdown extension.

It's boolean | null, which is only a few characters more than the name. What is gained by exporting this?

IMO it's just a bit of a surprising type, since I would normally expect this to be an enum with names for each case. The type name conveys that it's very specific to the block parsing context. Not a huge deal though.

There's a bunch of other internal stuff that would probably have to be exposed for this (such as resolveMarkers)

In my use case at least, I was able to stick to the current public interface with the exception of the constructor.

What kind of block are you defining here, and where do the problems with relative/absolute indices come from?

For example, I have a composite block that has a "header" on the first line followed by indented blocks. Such as:

@div[class: "foo"]:
  **content!**

Then I have code like this that needs to parse through the header line, which involves many lookups to line.text. I try to only use absolute positions everywhere to be consistent and avoid confusion. However the Line structure does not expose methods to lookup substrings or characters based on absolute positions (exactly the issue in #6).

An alternative here would be a method like Line.toInlineContext(cx: BlockContext): InlineContext that abstracts the details of the constructor.

> You can construct something like it, for a specific parser, by iterating through parser.nodeSet.types and building up an object. I can just do this if you want to keep `Type` private. I already do this to extract the numeric type ids for my Markdown extension. > It's boolean | null, which is only a few characters more than the name. What is gained by exporting this? IMO it's just a bit of a surprising type, since I would normally expect this to be an enum with names for each case. The type name conveys that it's very specific to the block parsing context. Not a huge deal though. > There's a bunch of other internal stuff that would probably have to be exposed for this (such as resolveMarkers) In my use case at least, I was able to stick to the current public interface with the exception of the constructor. > What kind of block are you defining here, and where do the problems with relative/absolute indices come from? For example, I have a composite block that has a "header" on the first line followed by indented blocks. Such as: ``` @div[class: "foo"]: **content!** ``` Then I have code [like this](https://github.com/nota-lang/nota/blob/markdown/packages/nota-syntax/lib/parse.ts#L219-L256) that needs to parse through the header line, which involves many lookups to `line.text`. I try to only use absolute positions everywhere to be consistent and avoid confusion. However the `Line` structure does not expose methods to lookup substrings or characters based on absolute positions (exactly the issue in #6). An alternative here would be a method like `Line.toInlineContext(cx: BlockContext): InlineContext` that abstracts the details of the constructor.
marijnh commented 2022-05-20 08:22:24 +02:00 (Migrated from github.com)

The code you linked seems to have already moved in the meantime.

In any case, I can see the use for an abstraction to help here, but simply exporting the InlineContext constructor feels dodgy—that does a bunch of things that don't make sense for a task like this. Maybe we should split off a superclass of InlineContext that just does the position-offsetting and element-building, and export that instead? Not too sure about the name, though. Maybe InputWindow or something.

The code you linked seems to have already moved in the meantime. In any case, I can see the use for an abstraction to help here, but simply exporting the `InlineContext` constructor feels dodgy—that does a bunch of things that don't make sense for a task like this. Maybe we should split off a superclass of `InlineContext` that just does the position-offsetting and element-building, and export that instead? Not too sure about the name, though. Maybe `InputWindow` or something.
connebs commented 2023-11-23 04:52:24 +01:00 (Migrated from github.com)

I actually would also like to see the Type enum exported. When dealing with the syntax tree, I find that I frequently am checking if a certain node is a certain type. Although yes I can clearly do node.type.name === 'StrongEmphasis', I think it would be much nicer from a dev perspective to be able to do node.type.name === MarkdownType.StrongEmphasis.

In general, I find enums much better to use in this case, as they are a better guarantee of forwards compatibility and they improve discoverability. Although of course I can just make my own version of the enum, this enum would become outdated if a new node type was ever added to lezer-parser/markdown. Although this is highly unlikely in this particular case (as the Markdown spec is fairly stable at this point), recreating an enum from a library is quite code-smelly. I'm also not sure I understand the reticence – what is lost by making this an export from the lib? Sure, the enum is not "complete" in an extended parser, but I can make it complete with:

import { Type as LezerMarkdownType } from '@lezer/markdown'

type ExtendedMarkdownType = MyCustomMarkdownNodeType | LezerMarkdownType;
I actually would also like to see the `Type` enum exported. When dealing with the syntax tree, I find that I frequently am checking if a certain node is a certain type. Although yes I can clearly do `node.type.name === 'StrongEmphasis'`, I think it would be much nicer from a dev perspective to be able to do `node.type.name === MarkdownType.StrongEmphasis`. In general, I find enums much better to use in this case, as they are a better guarantee of forwards compatibility and they improve discoverability. Although of course I can just make my own version of the enum, this enum would become outdated if a new node type was ever added to `lezer-parser/markdown`. Although this is highly unlikely in this particular case (as the Markdown spec is fairly stable at this point), recreating an enum from a library is quite code-smelly. I'm also not sure I understand the reticence – what is lost by making this an export from the lib? Sure, the enum is not "complete" in an extended parser, but I can make it complete with: ```ts import { Type as LezerMarkdownType } from '@lezer/markdown' type ExtendedMarkdownType = MyCustomMarkdownNodeType | LezerMarkdownType; ```
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lezer/markdown#16
No description provided.