Improve search within NFKD normalized text #19
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bezbac/search:main"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I work at Langfuse, where we use this package, and one of our customers reported a search issue involving Japanese text.
The problem seems to be that
SearchCursorcan miss matches when NFKD normalization expands a single source character into multiple characters. In particular, partial matches inside that expansion, or matches that cross from one expanded character into the next, were not being returned reliably.I do not speak Japanese myself, so I added concrete test cases from the customer provided examples rather than make broader assumptions about the text handling.
This would be my first contribution to the CodeMirror codebase, so I would appreciate any direction if you think there is a more idiomatic way to handle this in
SearchCursor.Thanks for the patches. Unfortunately, doing it like would have a potentially problematic effect on how replace works—if "㌢" is taken to match "ン", replacing "ン" with something else will completely consume the "㌢" character, which I'd consider data loss (you're removing content that wasn't actually matched). So it seems that for searching, this match is desired, but for replacing it, should be skipped.
To that purpose my patch (linked above) adds a
precisefield to search cursor matches, which is false when one of the sides doesn't correspond to an actual character boundary in the document. It sets up replace commands to look at that flag.Could you say a bit more about what the code in your patch that looks at extending characters is trying to do?
View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.Merge
Merge the changes and update on Forgejo.Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.