Support tracking changes in marks and attributes #21
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/support-changes-in-marks"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR adds an option to configure the
ChangeSetclass so that it can find changes in marks and attributes, as well as in nodes.Approach
Adds a
tokenEncoderconfiguration option to theChangeSetclass. It accepts aTokenEncoderobject, which lets developers customize how characters and nodes are encoded for diffing. This lets the developer find changes not only in nodes and content, but also in marks and attributes of their choice.Adds 3 built-in
TokenEncoderobjects:BaseEncoder(default): only encodes the character and node type info inside tokens. Is the equivalent to using the ChangeSet class before this PR.MarkEncoder: encodes the mark data inside tokens.AttributeEncoder: encodes the data of marks and attributes inside tokens.Breaking changes
There are no breaking changes. The default behavior of the
ChangeSetclass is still the same. It only adds thetokenEncoderconfiguration optionsChanges
ChangeSetclass.tokensfunction so that it supports custom token encoders.Motivation for this change
At Tiptap, we're building tools to help track the changes made by multiple users in the same document. Some changes might involve only modifying a mark of the document (like, setting a word from normal to bold). We want to be able to detect these changes with the prosemirror-changeset library.
We're also building workflows where an AI assistant edits the document, and the user reviews the changes it made. We want to be able to detect changes in formatting made by the AI, and show them to the user in a diff format.
Please let us know what you think of the approach we took in solving the problem, and if you'd recommend us to solve it in another way. Also let us know if there are any issues in formatting/tests/docs that we should fix. Thanks for reviewing 😄 .
Closes #4
This seems like a good idea, but are you certain we need this encoder abstraction? Can you think of any modes beyond the three you're providing here that could be useful?
Also, I'm a bit worried about the performance of the string concatenation and JSON encoding — these will be run a lot.
I believe some developers might want to compare some attributes and ignore others, that's why I added the option to define your own encoder.
Yes, I think you're got a point. Especially in the attribute encoder. The
JSON.stringifymethod would run approximately once for each letter and node in the text. This can be a lot, especially if you want the changes to be re-computed on every transaction.An alternative solution could be: instead of storing the tokens as a string/number, store them as a Token object that can have attributes and metadata. Then, in the TokenEncoder, define an equality function that determines if two tokens are equal. This solution would not involve any
JSON.stringifyand string concatenation. What do you think of it?Does attached patch, which includes a compare function in the encoder abstraction (and intentionally doesn't provide any alternative custom encoder implementations) look like it would work for you?
Hi Marijn. Thank you very much for your commit. Yes, I think that it would work for us.
I have left a review comment for if you want to consider it.
github.com/ProseMirror/prosemirror-changeset@562c61c674 (r156383467)Because you already made a commit with the changes of this PR, I'm closing it.
@marijnh feel free to close issue #21 too, if you think it's resolved.
Pull request closed