Improve parser performance by 50% #79
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "improve-parser-performance"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR significantly improves
parse&parseSliceperformance by optimizing how Prosemirror does tag matching:querySelectorAllon the main dom node once instead of callingmatcheson each node individuallyli,petc. This avoids calling any browser API and works much faster.These changes shouldn't break anything.
Benchmarks
Before:
After:
That's a solid ~40% improvement.
I tried running the tests but I keep getting this error:
What kind of things are you doing where parser performance is a big issue? I'm not sure adding this complexity for a 50% increase is worth it. Adding a simple branch for plain tag matching to
matches()might be a reasonable idea (though I'm not sure why browser implementations ofDOMElement.matcheswouldn't already be optimized that way).You're probably using yarn or a very old npm to install your dependencies if you're getting duplicated modules like that.
We are currently using ProseMirror in a note taking app which stores these notes in HTML and later the user can view/edit them. For that to work, the parse is obviously invovled. It isn't an issue for smaller notes but for large notes (> 300K words) optimizing the parser can bring down the waiting time between user clicking on a note and it appearing on the screen ready to edit. Even a small improvement helps.
I mean, it is a total of 30 lines and 2 methods. Even that can be further reduced by refactoring the code. 50% increase is not trivial for the amount of changes this PR makes.
Unfortunately,
matchesis on the hot path. Adding any branch on it will only make things "slightly" better. Browser implementations are optimized but when you start matching thousands of nodes & CSS selectors in a loop, it's obvious why the browser can't do it fast enough.The changes I have made use a simple rule of doing the work once per parse instead of once per match. It can be changed to look less complex. It really isn't doing a whole lot.
Of course, the decision rests with you. I'd be happy to make any further changes.
No, actually, I am running npm v10.2.4. I just ran
npm iand thennpm run test.Benchmarking these changes by parsing and re-parsing the example document in the demo of the dev demo page, I don't see an actual noticeable speed improvement. Are you using any particularly complex selectors in your schema?
The speed difference is most noticeable on Chromium-based browsers, unfortunately. Here's a really simple down snippet that benchmarks the changes made in this PR:
I ran these in Google Chrome and Firefox, and got the following results:
Chrome:

Firefox:

This tries to exclude the time it takes to loop over the elements (not super accurately, though). The crux is that the speed difference is most significant on Chrome, and on Firefox it actually gets slower. However, if you include the cost of creating the set of nodes, this PR isn't really looking all that great (at least that's what independent benchmarking is showing). This is very different from what I am seeing after benchmarking ProseMirror with these changes which makes me wonder whether the performance difference is due to something else?
I benchmarked both Firefox and Chrome. Neither showed a significant difference. I'm not interested in micro-benchmarks that just run one leaf function—there'd have to be a noticeable improvement in
DOMParser.parse's time for this to be interesting.How many nodes are in the document you benchmarked on? I'll run the benchmarks again. It's possible that I am doing something wrong/different because I am seeing 40% improvement in
DOMParser.parse.I was parsing a 3000-node document in a loop for a few seconds, counting the amount of parses per second.
Pull request closed