Add delimiterResolvers hook for custom delimiter handling #60
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bxff/main"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
This adds a
delimiterResolvershook that lets extensions customize delimiter resolution before the standard CommonMark algorithm runs. It also makesInlineContext.partspublic as it's required for resolvers to function.The Problem
Extensions that need to handle delimiters differently from CommonMark have no clean way to do it. The current
resolveMarkersfunction enforces a strict tree structure: find a closer, look backward for an opener, build a node. Unmatched openers become plain text.For my use case, I needed to:
*hellobecomes emphasis immediately)The only way to achieve this was monkey-patching
resolveMarkersand accessing the@internalpartsarray, which is fragile and breaks with updates.I've built an extension that demonstrates this use case: lezer-markdown-partial-emphasis. It creates "live" emphasis that appears as you type, using the new API.
The Solution
Add a
delimiterResolversarray toMarkdownConfig. Each resolver receives theInlineContextand can inspect or modifycx.partsbefore standard resolution.Resolvers can replace
InlineDelimiterobjects withElementobjects or null them out. The standard algorithm skips already-resolved positions.Making
partspublic is necessary for this API to work. Resolvers need to read delimiter positions, types, and side flags, then modify the array in place.Why This Design
I considered alternatives but they all had issues:
This approach follows the existing
wrappattern: an array of processors that compose without conflicts. It's opt-in with zero overhead when unused.Testing
Added
test/test-delimiter-resolvers.tsdemonstrating the API. The test extension shows how to access and modify delimiters added by the built-in Emphasis parser (clearing asterisks while preserving underscores).README Changes
Regenerated the README to include documentation for the new API. This also picked up docs from an earlier commit (
4d5b25c) about block context nodes and close tokens that had been missed.I'm not happy about exporting a weird data structure like
InlineContext.parts. I'm guessing you hand-edited the readme rather than usingnpm run build-readme, since the builddocs tool wouldn't use an unexported type (InlineDelimiter) in a signature.In general, this feels like too messy and ad-hoc an API for me to commit to. If you can formulate it in some narrower way, for example by providing some way for unclosed delimiters to be auto-closed (rather than turned into plain text) somehow, that would probably be more attractive.
Thanks for the feedback! Let me clear up a couple things first.
The README was auto-generated with
npm run build-readme. That inline type signature forpartsis just builddocs inlining the unexportedInlineDelimiterclass structure, not hand-editing.The core issue is overlapping emphasis ranges that can't be a tree. For
*a **b* c**, I need both:*a **b*(positions 0-7)**b* c**(positions 3-11)Standard resolution picks the
*...*first, leaving**...as plain text. A post-resolution hook can't fix this, the overlapping region is already locked in as literals. That's why I need pre-resolution access.Auto-closing unclosed delimiters wouldn't help here either. That only extends unmatched openers to block end, but can't apply multiple styles to the same text region. The overlapping case needs both styles, not just extension.
Making this a built-in
PartialEmphasisextension would dump 225 lines of complex, single-purpose logic on you to maintain. It solves one specific behavior and isn't generalizable.I've implemented a tighter API that doesn't expose internals:
Resolvers get an immutable delimiter list and explicit methods to mark resolved or add elements. No raw arrays, no class exposure. Zero overhead when unused.
Would this narrower approach work for you?
Hey @marijnh, just wanted to gently bump this when you have a moment. I know you're busy, no pressure at all.
I have been looking back at this periodically trying to like it or to find an acceptable alternative for you, but I have been successful at neither so far.
Really appreciate that :)
So I think the main issue with this approach is that it breaks the current hierarchical model of inline resolution. In the current code, the inline content in a block can be resolved piece-by-piece via
takeContent, but that always acts on the range betweenstartIndexand the current parse position, and sincefromcomes fromfindOpeningDelimiter, it always points into unresolved content (resolved content has no delimiters). A pre-resolve context allows user code to insert elements and resolve delimiters willy-nilly, creating a whole bunch of new failure modes that I'm not keen on having.I assume the code that you're intending to use this with fully takes over resolution of asterisk and underscore delimiters? That seems to be the only use case for which this interface really works. As such, I'm still unconvinced it is a good direction. If you can tell me a bit more about what your implementation looks like—is it an adjusted version of
resolveMarkers, or something different?—maybe I can think of an interface that would work for you.Hey, thanks for diving into this again. Here's the extension code: https://github.com/bxff/lezer-markdown-partial-emphasis/blob/master/extension/partial-emphases.ts
You're right that my extension completely takes over
*and_resolution. I register an inline parser that runs before the built-in Emphasis one, scoop up all those delimiters, then use the hook to run my own matching logic.The implementation is basically a tweaked
resolveMarkers. Two main differences:I match delimiters atomically—
*only pairs with*,**with**—to avoid the "star stealing" issue where**partially consumes a single*. This keeps delimiter pairs clean.After matching, any leftover openers get extended to block end as partial emphasis rather than being turned into plain text.
The resolver builds Span objects for each pair, handles overlaps by clipping at intersections, then constructs nested Elements and replaces the corresponding sections of
cx.parts.I think I see what you mean about breaking the hierarchical model now. The current
takeContentflow assumes resolved content has no delimiters left. My approach sidesteps this by handling emphasis completely before standard resolution runs—the hook nulls out my delimiters and inserts finished Elements, so whenresolveMarkerscontinues, those positions are eithernullor already resolved as Elements.View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.Merge
Merge the changes and update on Forgejo.Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.