Insertion before change #8

Merged
floorish merged 5 commits from insertion-fail into master 2019-03-12 10:54:24 +01:00
floorish commented 2019-03-11 14:06:19 +01:00 (Migrated from github.com)

As described in https://github.com/ProseMirror/prosemirror-changeset/issues/6

Fixes an insertion before a change:

ab|c                                  []
<delete b>       + [2, 3, 2, 2]
a|c                                   = [2, 3, 2, 2]
<insert x>       + [2, 2, 2, 3]
ax|c                                  = [2, 3, 2, 3]
<move left>      + []
a|xc                                  = [2, 3, 2, 3]
<insert x>       + [2, 2, 2, 3]
ax|xc                                 = [2, 4, 2, 4] (old, incorrect) [2, 3, 2, 4] (new, correct)
As described in https://github.com/ProseMirror/prosemirror-changeset/issues/6 Fixes an insertion before a change: ``` ab|c [] <delete b> + [2, 3, 2, 2] a|c = [2, 3, 2, 2] <insert x> + [2, 2, 2, 3] ax|c = [2, 3, 2, 3] <move left> + [] a|xc = [2, 3, 2, 3] <insert x> + [2, 2, 2, 3] ax|xc = [2, 4, 2, 4] (old, incorrect) [2, 3, 2, 4] (new, correct) ```
marijnh commented 2019-03-11 14:23:08 +01:00 (Migrated from github.com)

Thanks for looking into this.

range 3 - 12 cannot be deleted since initial input is not that large.

The initial doc contains a paragraph with 10 characters in it, so including the paragraph start/end tokens, it's 12 tokens long, and 3-12 is a valid range.

I had, at one point, a fuzzer for this that found some bugs for which I created test cases. So these tests are machine generated, and while it's possible that the program that created them contained a bug, or that I transcribed them wrong, a failure might also well point at a real issue. I know they are terribly cryptic, but I'm worried that changing this one to make it pass might cover up a reintroduction of the bug that it originally caught.

Since the inner loop in merge is terribly subtle for zero-length changes, and your condition makes it even more tricky (and seems like the same issue might occur with the (inY && pos == curY.fromA) branch), would it maybe make sense to keep enteredX and enteredY variables, which track whether we've already processed curX.deleted and curY.inserted? You'd have to clear them when curX or curY is updated, and set them in the appropriate branches. The result is probably easier to understand.

Thanks for looking into this. > range 3 - 12 cannot be deleted since initial input is not that large. The initial doc contains a paragraph with 10 characters in it, so including the paragraph start/end tokens, it's 12 tokens long, and 3-12 is a valid range. I had, at one point, a fuzzer for this that found some bugs for which I created test cases. So these tests are machine generated, and while it's possible that the program that created them contained a bug, or that I transcribed them wrong, a failure might also well point at a real issue. I know they are terribly cryptic, but I'm worried that changing this one to make it pass might cover up a reintroduction of the bug that it originally caught. Since the inner loop in `merge` is terribly subtle for zero-length changes, and your condition makes it even more tricky (and seems like the same issue might occur with the `(inY && pos == curY.fromA)` branch), would it maybe make sense to keep `enteredX` and `enteredY` variables, which track whether we've already processed `curX.deleted` and `curY.inserted`? You'd have to clear them when `curX` or `curY` is updated, and set them in the appropriate branches. The result is probably easier to understand.
floorish commented 2019-03-11 19:02:54 +01:00 (Migrated from github.com)

The initial doc contains a paragraph with 10 characters in it, so including the paragraph start/end tokens, it's 12 tokens long, and 3-12 is a valid range.

Yes, it is valid but in this case incorrect :) My wording was perhaps badly chosen. The paragraph end token is not removed in the transaction, the end result of the fuzz test is as follows:

{"type":"doc","content":[{"type":"paragraph","content":[{"type":"text","text":"eAqXUDKkMf"}]}]}

0   1  2  3  4  5  6  7  8  9  10  11  12
  p   e  A  M  I  S  W  g  a  u  f   /p
  p   e  A  q  X  U  D  K  k  M  f   /p
          |>                   <|       

The part between eA and f has changed, i.e. [3, 10, 3, 10]. I think the bug as fixed in this pull request is also recorded by the fuzzer.

and seems like the same issue might occur with the (inY && pos == curY.fromA) branch

Think so too, it's just hard to come up with a test that would cover this.

would it maybe make sense to keep enteredX and enteredY variables, which track whether we've already processed curX.deleted and curY.inserted

I'll try to do this and update the pull request later.

> The initial doc contains a paragraph with 10 characters in it, so including the paragraph start/end tokens, it's 12 tokens long, and 3-12 is a valid range. Yes, it is valid but in this case incorrect :) My wording was perhaps badly chosen. The paragraph end token is not removed in the transaction, the end result of the fuzz test is as follows: ``` {"type":"doc","content":[{"type":"paragraph","content":[{"type":"text","text":"eAqXUDKkMf"}]}]} 0 1 2 3 4 5 6 7 8 9 10 11 12 p e A M I S W g a u f /p p e A q X U D K k M f /p |> <| ``` The part between `eA` and `f` has changed, i.e. `[3, 10, 3, 10]`. I think the bug as fixed in this pull request is also recorded by the fuzzer. > and seems like the same issue might occur with the (inY && pos == curY.fromA) branch Think so too, it's just hard to come up with a test that would cover this. > would it maybe make sense to keep enteredX and enteredY variables, which track whether we've already processed curX.deleted and curY.inserted I'll try to do this and update the pull request later.
floorish commented 2019-03-12 10:05:41 +01:00 (Migrated from github.com)

I've added your suggestions. The inY && !inX and inX && !inY branches both logically cannot be applied twice I believe, so they don't need the same treatment.

I've added your suggestions. The `inY && !inX` and `inX && !inY` branches both logically cannot be applied twice I believe, so they don't need the same treatment.
marijnh commented 2019-03-12 10:54:28 +01:00 (Migrated from github.com)

Yes, it is valid but in this case incorrect

Ah, I see—doing that replace will still preserve the end-of-paragraph token, so indeed, the actual generated step will have a different range and that test was confused.

Thanks for implementing this!

> Yes, it is valid but in this case incorrect Ah, I see—doing that replace will still preserve the end-of-paragraph token, so indeed, the actual generated step will have a different range and that test was confused. Thanks for implementing this!
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
prosemirror/prosemirror-changeset!8
No description provided.