Empty paragraphs are pasted in with an extra <br> when copied from Google Docs #1511
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I don't actually know if this is Google Docs's copy format in specific or if this would always occur for empty paragraphs like this.
To reproduce:
The HTML we get in this case looks, if I remove the attributes, like this:
ProseMirror assumes the stray
<br>actually stands for a break element, so it parses it to ahard_breaknode. That node must occur in an inline context, so it gets a parent paragraph. There is a hack in ProseMirror's parser that will drop<br>nodes at the end of their parent block, since those are typically used as placeholders in empty textblocks. But the way it completely replaces the entire paragraph with a loose break node in this situation isn't a common style, and is hard to distinguish from situations where the break node is intended to be part of the document.Hmm. Google at it again... This is going to come up a lot for the community using my application, lots of authors copy paste from GDocs. But maybe if that's not common usage for Prosemirror overall it should be fixed on my end? That would be
transformPastedHTML, correct?Edit: I went ahead and just fixed it on my end. Feel free to do what you like with this issue.
I'm going to leave this open and see if more people are running into it. It's quite possible that this is a recent change in Google Docs—I'm pretty sure that last time I looked, empty paragraphs had a
<p>tag around the<br>in their clipboard format.I don't think it is recent but I am not positive. Would appreciate a special case here as I sporadically paste a lot from Google docs.
Right now, we're using a somewhat messy
transformPastedHTMLin production to handle pasting from Google Docs, MS Word and OpenOffice. However, it doesn't feel reliable enough as users report issues every few months. Plus, I'm not sure if such logic should be the responsibility of an application using ProseMirror.It would be great if ProseMirror natively preserved the formatting of content pasted from Google Docs and other Word-like editors.
I'm not sure how you expect that to work. Firstly, ProseMirror is schema-agnostic, so it doesn't magically know how the nodes you define would map to whatever equivalent constructs exist in the various word processing systems. Secondly, as you found, these spit out all kinds of completely ludicrous HTML, and in some situations it's not even clear how to extract the semantic meaning from that.
What a hell!
<br>outside of paragraph miss info. If I set line-height, margin-top, margin-bottom for an empty paragraph, the pasted html will disappear all of them. GoogleDocs save paragraph info to first paragraph id attribute. It seems to prevent getting the same content from html in other editors.Google seems to always use just a
<br>in place of<p><br></p>, I don't actually know how to fix this consistently because it's not just between paragraphs. If I put a heading, then an empty line, then a paragraph, for example, when copy-pasting into ProseMirror that single<br>between the heading and the paragraph becomes<p><br><br></p>. I don't think I can reliably usetransformPastedHTMLto fix this and I'm not sure what other workarounds I could even employ. Any ideas?When I have for example the following clipboard content:
And paste this inside the editor, this leads to multiple line breaks instead of one:
Is this expected?
(It also happens when the initial Content has such markup.)
Yes. The first
<br>was found in the clipboard content, so the editor assumes it refers to an actual hard break that should be part of the content. The second is just there to prevent the browser from collapsing the first one, and is not part of the actual document content.Thanks for the explanation.
I can not completely follow, but maybe it's because of some lack of understanding.
Why should the browser prevent something, at least in our case, we have two "visible" line breaks because of that (we are using TipTap).
In an editing context, you want the line after a trailing
<br>to be visible, so that the user can put the cursor after the break. A<p><br></p>would just display a single line (before the break), so ProseMirror adds a dummy break at the end to make the second line show up, similar to how it adds a dummy to entirely empty textblocks to make them show up at all.So in the end, every paragraph without real text is getting this
<br class="ProseMirror-trailingBreak">part? For the situation when only a<br>exists, it's really strange, because this leads to two visible new lines, and when you look in the source, it's only one. For sure, later in the document content, but this is not what the user is seeing at this moment.Shouldn't the assumption be that places we copy out of are using
<br>'s in exactly the way that you're using them? To make empty blocks actually show?That's only appropriate if the content is copied from a (poorly implemented) web editor. If it's regular HTML, or an editor like ProseMirror which is considerate enough to strip off such internal details when you copy, the assumption that such
<br>nodes should be dropped doesn't hold.I've been investigating extra BRs in the context of https://github.com/ueberdosis/tiptap/issues/1500 (preventing empty paragraphs and lines from "collapsing" in serialized HTML, so that the HTML looks identical to ProseMirror's rendered state) and also noticed this Google Docs behavior. Here is how various editors handle empty paragraphs / lines (all in Mac Firefox, today):
Empty paragraph
To ensure the selection boundary is where I think it is, I always copy/paste an empty paragraph surrounded by non-empty paragraphs, i.e.,
<p>A</p> <p></p> <p>B</p>.<br /><p><br></p><p></p><p>A</p> <br /> <p>B</p>); empty p becomes:<p>A</p> <p><br></p> <p>B</p>); empty p becomes:<p>A</p> <p></p> <p>B</p>); empty p becomes:<p> </p><p></p>Empty line at end of paragraph
A paragraph that ends in a line break (
<br />). E.g. a paragraph that in plain text is "First line\n".I again copy/paste the paragraph surrounded by non-empty paragraphs.
<span ...><br /><br /></span>*<br><br>*<br><br />\n <br>* If you don't include the subsequent paragraph in your selection, there is only one BR.
** This might be issue with my test setup or Tiptap's HardBreak extension.
TLDR
, then strips them out when re-parsing the HTML. ProseMirror doesn't do this, but it's easy enough to wrap Tiptap's getHTML/setContent with some code that does (perhaps using<br />instead of nbsp). I intend to work around https://github.com/ueberdosis/tiptap/issues/1500 that way in our app.@marijnh After some playing around and a deeper look, I understand now what you were mentioning. In the end, a difficult situation, and for sure, I would say the source of the clipboard content is already wrong, and they are not doing it in a "correct" way.
So, a workaround would only be to remove this stuff before, but for sure, you can also not be safe, it was an "expected" new line from the user or not.