Add delimiterResolvers hook for custom delimiter handling #60

Open
bxff wants to merge 7 commits from bxff/main into main
bxff commented 2025-12-28 09:47:04 +01:00 (Migrated from github.com)

Summary

This adds a delimiterResolvers hook that lets extensions customize delimiter resolution before the standard CommonMark algorithm runs. It also makes InlineContext.parts public as it's required for resolvers to function.

The Problem

Extensions that need to handle delimiters differently from CommonMark have no clean way to do it. The current resolveMarkers function enforces a strict tree structure: find a closer, look backward for an opener, build a node. Unmatched openers become plain text.

For my use case, I needed to:

  • Extend unmatched emphasis markers to the block end (so *hello becomes emphasis immediately)
  • Handle overlapping delimiter ranges that can't be expressed as a tree

The only way to achieve this was monkey-patching resolveMarkers and accessing the @internal parts array, which is fragile and breaks with updates.

I've built an extension that demonstrates this use case: lezer-markdown-partial-emphasis. It creates "live" emphasis that appears as you type, using the new API.

The Solution

Add a delimiterResolvers array to MarkdownConfig. Each resolver receives the InlineContext and can inspect or modify cx.parts before standard resolution.

// In MarkdownConfig
delimiterResolvers?: readonly ((cx: InlineContext) => void)[]

// In resolveMarkers()
for (let resolver of this.parser.delimiterResolvers) resolver(this)

Resolvers can replace InlineDelimiter objects with Element objects or null them out. The standard algorithm skips already-resolved positions.

Making parts public is necessary for this API to work. Resolvers need to read delimiter positions, types, and side flags, then modify the array in place.

Why This Design

I considered alternatives but they all had issues:

  • Per-delimiter resolve methods: The main loop is "closer-driven" and can't handle unmatched openers cleanly
  • Post-process hooks: Inefficient and can't change text already parsed as literals
  • Replace resolveMarkers entirely: Breaks composition if multiple extensions need customization

This approach follows the existing wrap pattern: an array of processors that compose without conflicts. It's opt-in with zero overhead when unused.

Testing

Added test/test-delimiter-resolvers.ts demonstrating the API. The test extension shows how to access and modify delimiters added by the built-in Emphasis parser (clearing asterisks while preserving underscores).

README Changes

Regenerated the README to include documentation for the new API. This also picked up docs from an earlier commit (4d5b25c) about block context nodes and close tokens that had been missed.

## Summary This adds a `delimiterResolvers` hook that lets extensions customize delimiter resolution before the standard CommonMark algorithm runs. It also makes `InlineContext.parts` public as it's required for resolvers to function. ## The Problem Extensions that need to handle delimiters differently from CommonMark have no clean way to do it. The current `resolveMarkers` function enforces a strict tree structure: find a closer, look backward for an opener, build a node. Unmatched openers become plain text. For my use case, I needed to: - Extend unmatched emphasis markers to the block end (so `*hello` becomes emphasis immediately) - Handle overlapping delimiter ranges that can't be expressed as a tree The only way to achieve this was monkey-patching `resolveMarkers` and accessing the `@internal` `parts` array, which is fragile and breaks with updates. I've built an extension that demonstrates this use case: [lezer-markdown-partial-emphasis](https://github.com/bxff/lezer-markdown-partial-emphasis). It creates "live" emphasis that appears as you type, using the new API. ## The Solution Add a `delimiterResolvers` array to `MarkdownConfig`. Each resolver receives the `InlineContext` and can inspect or modify `cx.parts` before standard resolution. ```typescript // In MarkdownConfig delimiterResolvers?: readonly ((cx: InlineContext) => void)[] // In resolveMarkers() for (let resolver of this.parser.delimiterResolvers) resolver(this) ``` Resolvers can replace `InlineDelimiter` objects with `Element` objects or null them out. The standard algorithm skips already-resolved positions. Making `parts` public is necessary for this API to work. Resolvers need to read delimiter positions, types, and side flags, then modify the array in place. ## Why This Design I considered alternatives but they all had issues: - **Per-delimiter resolve methods**: The main loop is "closer-driven" and can't handle unmatched openers cleanly - **Post-process hooks**: Inefficient and can't change text already parsed as literals - **Replace resolveMarkers entirely**: Breaks composition if multiple extensions need customization This approach follows the existing `wrap` pattern: an array of processors that compose without conflicts. It's opt-in with zero overhead when unused. ## Testing Added `test/test-delimiter-resolvers.ts` demonstrating the API. The test extension shows how to access and modify delimiters added by the built-in Emphasis parser (clearing asterisks while preserving underscores). ## README Changes Regenerated the README to include documentation for the new API. This also picked up docs from an earlier commit (4d5b25c) about block context nodes and close tokens that had been missed.
marijnh commented 2025-12-29 19:19:48 +01:00 (Migrated from github.com)

I'm not happy about exporting a weird data structure like InlineContext.parts. I'm guessing you hand-edited the readme rather than using npm run build-readme, since the builddocs tool wouldn't use an unexported type (InlineDelimiter) in a signature.

In general, this feels like too messy and ad-hoc an API for me to commit to. If you can formulate it in some narrower way, for example by providing some way for unclosed delimiters to be auto-closed (rather than turned into plain text) somehow, that would probably be more attractive.

I'm not happy about exporting a weird data structure like `InlineContext.parts`. I'm guessing you hand-edited the readme rather than using `npm run build-readme`, since the builddocs tool wouldn't use an unexported type (`InlineDelimiter`) in a signature. In general, this feels like too messy and ad-hoc an API for me to commit to. If you can formulate it in some narrower way, for example by providing some way for unclosed delimiters to be auto-closed (rather than turned into plain text) somehow, that would probably be more attractive.
bxff commented 2026-01-02 21:54:44 +01:00 (Migrated from github.com)

Thanks for the feedback! Let me clear up a couple things first.

The README was auto-generated with npm run build-readme. That inline type signature for parts is just builddocs inlining the unexported InlineDelimiter class structure, not hand-editing.

The core issue is overlapping emphasis ranges that can't be a tree. For *a **b* c**, I need both:

  • Emphasis wrapping *a **b* (positions 0-7)
  • StrongEmphasis wrapping **b* c** (positions 3-11)

Standard resolution picks the *...* first, leaving **... as plain text. A post-resolution hook can't fix this, the overlapping region is already locked in as literals. That's why I need pre-resolution access.

Auto-closing unclosed delimiters wouldn't help here either. That only extends unmatched openers to block end, but can't apply multiple styles to the same text region. The overlapping case needs both styles, not just extension.

Making this a built-in PartialEmphasis extension would dump 225 lines of complex, single-purpose logic on you to maintain. It solves one specific behavior and isn't generalizable.

I've implemented a tighter API that doesn't expose internals:

interface PreResolveContext {
  readonly delimiters: ReadonlyArray<{
    readonly type: DelimiterType
    readonly from: number
    readonly to: number
    readonly side: number  // 1=open, 2=close, 3=both
  }>
  readonly blockEnd: number
  markResolved(index: number): void
  addElement(element: Element): void
  elt(type: string, from: number, to: number, children?: readonly Element[]): Element
  slice(from: number, to: number): string
}

interface MarkdownConfig {
  preResolveDelimiters?: readonly ((ctx: PreResolveContext) => void)[]
}

Resolvers get an immutable delimiter list and explicit methods to mark resolved or add elements. No raw arrays, no class exposure. Zero overhead when unused.

Would this narrower approach work for you?

Thanks for the feedback! Let me clear up a couple things first. The README was auto-generated with `npm run build-readme`. That inline type signature for `parts` is just builddocs inlining the unexported `InlineDelimiter` class structure, not hand-editing. The core issue is overlapping emphasis ranges that can't be a tree. For `*a **b* c**`, I need both: - Emphasis wrapping `*a **b*` (positions 0-7) - StrongEmphasis wrapping `**b* c**` (positions 3-11) Standard resolution picks the `*...*` first, leaving `**...` as plain text. A post-resolution hook can't fix this, the overlapping region is already locked in as literals. That's why I need pre-resolution access. Auto-closing unclosed delimiters wouldn't help here either. That only extends unmatched openers to block end, but can't apply multiple styles to the same text region. The overlapping case needs both styles, not just extension. Making this a built-in `PartialEmphasis` extension would dump 225 lines of complex, single-purpose logic on you to maintain. It solves one specific behavior and isn't generalizable. I've implemented a tighter API that doesn't expose internals: ```typescript interface PreResolveContext { readonly delimiters: ReadonlyArray<{ readonly type: DelimiterType readonly from: number readonly to: number readonly side: number // 1=open, 2=close, 3=both }> readonly blockEnd: number markResolved(index: number): void addElement(element: Element): void elt(type: string, from: number, to: number, children?: readonly Element[]): Element slice(from: number, to: number): string } interface MarkdownConfig { preResolveDelimiters?: readonly ((ctx: PreResolveContext) => void)[] } ``` Resolvers get an immutable delimiter list and explicit methods to mark resolved or add elements. No raw arrays, no class exposure. Zero overhead when unused. Would this narrower approach work for you?
bxff commented 2026-01-08 07:54:37 +01:00 (Migrated from github.com)

Hey @marijnh, just wanted to gently bump this when you have a moment. I know you're busy, no pressure at all.

Hey @marijnh, just wanted to gently bump this when you have a moment. I know you're busy, no pressure at all.
marijnh commented 2026-01-08 10:39:12 +01:00 (Migrated from github.com)

I have been looking back at this periodically trying to like it or to find an acceptable alternative for you, but I have been successful at neither so far.

I have been looking back at this periodically trying to like it or to find an acceptable alternative for you, but I have been successful at neither so far.
bxff commented 2026-01-08 10:51:03 +01:00 (Migrated from github.com)

Really appreciate that :)

Really appreciate that :)
marijnh commented 2026-01-19 09:37:57 +01:00 (Migrated from github.com)

So I think the main issue with this approach is that it breaks the current hierarchical model of inline resolution. In the current code, the inline content in a block can be resolved piece-by-piece via takeContent, but that always acts on the range between startIndex and the current parse position, and since from comes from findOpeningDelimiter, it always points into unresolved content (resolved content has no delimiters). A pre-resolve context allows user code to insert elements and resolve delimiters willy-nilly, creating a whole bunch of new failure modes that I'm not keen on having.

I assume the code that you're intending to use this with fully takes over resolution of asterisk and underscore delimiters? That seems to be the only use case for which this interface really works. As such, I'm still unconvinced it is a good direction. If you can tell me a bit more about what your implementation looks like—is it an adjusted version of resolveMarkers, or something different?—maybe I can think of an interface that would work for you.

So I think the main issue with this approach is that it breaks the current hierarchical model of inline resolution. In the current code, the inline content in a block can be resolved piece-by-piece via `takeContent`, but that always acts on the range between `startIndex` and the current parse position, and since `from` comes from `findOpeningDelimiter`, it always points into unresolved content (resolved content has no delimiters). A pre-resolve context allows user code to insert elements and resolve delimiters willy-nilly, creating a whole bunch of new failure modes that I'm not keen on having. I assume the code that you're intending to use this with fully takes over resolution of asterisk and underscore delimiters? That seems to be the only use case for which this interface really works. As such, I'm still unconvinced it is a good direction. If you can tell me a bit more about what your implementation looks like—is it an adjusted version of `resolveMarkers`, or something different?—maybe I can think of an interface that would work for you.
bxff commented 2026-01-23 02:04:43 +01:00 (Migrated from github.com)

Hey, thanks for diving into this again. Here's the extension code: https://github.com/bxff/lezer-markdown-partial-emphasis/blob/master/extension/partial-emphases.ts

You're right that my extension completely takes over * and _ resolution. I register an inline parser that runs before the built-in Emphasis one, scoop up all those delimiters, then use the hook to run my own matching logic.

The implementation is basically a tweaked resolveMarkers. Two main differences:

  • I match delimiters atomically—* only pairs with *, ** with **—to avoid the "star stealing" issue where ** partially consumes a single *. This keeps delimiter pairs clean.

  • After matching, any leftover openers get extended to block end as partial emphasis rather than being turned into plain text.

The resolver builds Span objects for each pair, handles overlaps by clipping at intersections, then constructs nested Elements and replaces the corresponding sections of cx.parts.

I think I see what you mean about breaking the hierarchical model now. The current takeContent flow assumes resolved content has no delimiters left. My approach sidesteps this by handling emphasis completely before standard resolution runs—the hook nulls out my delimiters and inserts finished Elements, so when resolveMarkers continues, those positions are either null or already resolved as Elements.

Hey, thanks for diving into this again. Here's the extension code: https://github.com/bxff/lezer-markdown-partial-emphasis/blob/master/extension/partial-emphases.ts You're right that my extension completely takes over `*` and `_` resolution. I register an inline parser that runs before the built-in Emphasis one, scoop up all those delimiters, then use the hook to run my own matching logic. The implementation is basically a tweaked `resolveMarkers`. Two main differences: - I match delimiters atomically—`*` only pairs with `*`, `**` with `**`—to avoid the "star stealing" issue where `**` partially consumes a single `*`. This keeps delimiter pairs clean. - After matching, any leftover openers get extended to block end as partial emphasis rather than being turned into plain text. The resolver builds Span objects for each pair, handles overlaps by clipping at intersections, then constructs nested Elements and replaces the corresponding sections of `cx.parts`. I think I see what you mean about breaking the hierarchical model now. The current `takeContent` flow assumes resolved content has no delimiters left. My approach sidesteps this by handling emphasis completely before standard resolution runs—the hook nulls out my delimiters and inserts finished Elements, so when `resolveMarkers` continues, those positions are either `null` or already resolved as Elements.
This pull request can be merged automatically.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin bxff/main:bxff/main
git switch bxff/main

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git switch main
git merge --no-ff bxff/main
git switch bxff/main
git rebase main
git switch main
git merge --ff-only bxff/main
git switch bxff/main
git rebase main
git switch main
git merge --no-ff bxff/main
git switch main
git merge --squash bxff/main
git switch main
git merge --ff-only bxff/main
git switch main
git merge bxff/main
git push origin main
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lezer/markdown!60
No description provided.