Wrong diff in merge view #7106

Open
opened 2024-09-09 13:37:46 +02:00 by marijnh · 10 comments
marijnh commented 2024-09-09 13:37:46 +02:00 (Migrated from gitlab.com)

Hi,

I ran into weird diff behavior when it shows change for non-changed line when it follows new line ending with comma. I tried to reproduce the issue in demo of underlying library (here) but with no success, therefore I think the problem might be with implementation to CM5.

https://github.com/user-attachments/assets/d3679517-64ee-43a9-9f44-a2db89a952fa

Can you please confirm the bug and hopefully suggest fix?

Thanks!

Hi, I ran into weird diff behavior when it shows change for non-changed line when it follows new line ending with comma. I tried to reproduce the issue in demo of underlying library ([here](https://neil.fraser.name/software/diff_match_patch/demos/diff.html)) but with no success, therefore I think the problem might be with implementation to CM5. https://github.com/user-attachments/assets/d3679517-64ee-43a9-9f44-a2db89a952fa Can you please confirm the bug and hopefully suggest fix? Thanks!
marijnh commented 2024-09-09 13:56:30 +02:00 (Migrated from gitlab.com)

Can you please confirm the bug

Without the inputs that lead to it, probably not.

> Can you please confirm the bug Without the inputs that lead to it, probably not.
marijnh commented 2024-09-09 14:01:00 +02:00 (Migrated from gitlab.com)

It happened to me with the following:

Left:

CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    "Review_Count",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;

Right:

CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
    HAVING
        COUNT("Airline_Reviews"."id") > 1
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;

The Right line 27 with "Rating_StdDev", is incorrectly show as added.

It happened to me with the following: Left: ``` CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS WITH Aggregated_Reviews AS ( SELECT "Review_Routes"."origin" AS "Origin_Airport", "Review_Routes"."destination" AS "Destination_Airport", AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating", STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev", COUNT("Airline_Reviews"."id") AS "Review_Count" FROM "Airline_Reviews" JOIN "Review_Routes" ON "Airline_Reviews"."id" = "Review_Routes"."review_id" WHERE "Airline_Reviews"."Overall_Rating" != 'n' GROUP BY "Review_Routes"."origin", "Review_Routes"."destination" ) SELECT "Origin_Airport", "Destination_Airport", "Average_Rating", "Rating_StdDev", "Review_Count", CASE WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier' WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier' ELSE 'Normal' END AS "Outlier_Status" FROM Aggregated_Reviews; ``` Right: ``` CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS WITH Aggregated_Reviews AS ( SELECT "Review_Routes"."origin" AS "Origin_Airport", "Review_Routes"."destination" AS "Destination_Airport", AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating", STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev", COUNT("Airline_Reviews"."id") AS "Review_Count" FROM "Airline_Reviews" JOIN "Review_Routes" ON "Airline_Reviews"."id" = "Review_Routes"."review_id" WHERE "Airline_Reviews"."Overall_Rating" != 'n' GROUP BY "Review_Routes"."origin", "Review_Routes"."destination" HAVING COUNT("Airline_Reviews"."id") > 1 ) SELECT "Origin_Airport", "Destination_Airport", "Average_Rating", "Rating_StdDev", CASE WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier' WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier' ELSE 'Normal' END AS "Outlier_Status" FROM Aggregated_Reviews; ``` The Right line 27 with `"Rating_StdDev",` is incorrectly show as added.
marijnh commented 2024-09-09 14:07:08 +02:00 (Migrated from gitlab.com)

image

var target = document.getElementById("view");
  target.innerHTML = "";
dv = CodeMirror.MergeView(target, {
    value: `CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    "Review_Count",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;`,
    origLeft: panes == 3 ? orig1 : null,
    orig: `CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
    HAVING
        COUNT("Airline_Reviews"."id") > 1
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;`,
    lineNumbers: true,
    mode: "text/html",
    highlightDifferences: highlight,
    connect: connect,
    collapseIdentical: collapse
  });
![image](https://github.com/user-attachments/assets/84d1a41a-cb5d-4499-bc95-4842048fbf41) ``` var target = document.getElementById("view"); target.innerHTML = ""; dv = CodeMirror.MergeView(target, { value: `CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS WITH Aggregated_Reviews AS ( SELECT "Review_Routes"."origin" AS "Origin_Airport", "Review_Routes"."destination" AS "Destination_Airport", AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating", STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev", COUNT("Airline_Reviews"."id") AS "Review_Count" FROM "Airline_Reviews" JOIN "Review_Routes" ON "Airline_Reviews"."id" = "Review_Routes"."review_id" WHERE "Airline_Reviews"."Overall_Rating" != 'n' GROUP BY "Review_Routes"."origin", "Review_Routes"."destination" ) SELECT "Origin_Airport", "Destination_Airport", "Average_Rating", "Rating_StdDev", "Review_Count", CASE WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier' WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier' ELSE 'Normal' END AS "Outlier_Status" FROM Aggregated_Reviews;`, origLeft: panes == 3 ? orig1 : null, orig: `CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS WITH Aggregated_Reviews AS ( SELECT "Review_Routes"."origin" AS "Origin_Airport", "Review_Routes"."destination" AS "Destination_Airport", AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating", STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev", COUNT("Airline_Reviews"."id") AS "Review_Count" FROM "Airline_Reviews" JOIN "Review_Routes" ON "Airline_Reviews"."id" = "Review_Routes"."review_id" WHERE "Airline_Reviews"."Overall_Rating" != 'n' GROUP BY "Review_Routes"."origin", "Review_Routes"."destination" HAVING COUNT("Airline_Reviews"."id") > 1 ) SELECT "Origin_Airport", "Destination_Airport", "Average_Rating", "Rating_StdDev", CASE WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier' WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier' ELSE 'Normal' END AS "Outlier_Status" FROM Aggregated_Reviews;`, lineNumbers: true, mode: "text/html", highlightDifferences: highlight, connect: connect, collapseIdentical: collapse }); ```
marijnh commented 2024-09-09 14:11:35 +02:00 (Migrated from gitlab.com)

Oh, I see what's going on. If you look at the green underline, it matches the comma at the end of the line in the shorter document to a comma on the next line in the longer document (which is also a valid diff). There's multiple valid diffs for any given change like this, and this library doesn't really try to influence the diffing library to pick a particular one. As such, the change spans two lines when displayed, even though another diff could fit on a single line. I don't think this is going to change in this version of the library (@codemirror/merge for CM6 should behave better).

Oh, I see what's going on. If you look at the green underline, it matches the comma at the end of the line in the shorter document to a comma on the next line in the longer document (which is also a valid diff). There's multiple valid diffs for any given change like this, and this library doesn't really try to influence the diffing library to pick a particular one. As such, the change spans two lines when displayed, even though another diff could fit on a single line. I don't think this is going to change in this version of the library (`@codemirror/merge` for CM6 should behave better).
marijnh commented 2024-09-09 14:30:00 +02:00 (Migrated from gitlab.com)

Oh, I see! So the root cause is "misleading highlighting" in our template where the actual diff is not highlighted, but only lines spanning the diff are highlighted. Thanks!

Oh, I see! So the root cause is "misleading highlighting" in our template where the actual diff is not highlighted, but only lines spanning the diff are highlighted. Thanks!
marijnh commented 2024-09-09 16:41:10 +02:00 (Migrated from gitlab.com)

You are right. Thank you very much for the clarity!

You are right. Thank you very much for the clarity!
marijnh (Migrated from gitlab.com) closed this issue 2024-09-09 16:41:10 +02:00
marijnh commented 2024-09-09 17:09:40 +02:00 (Migrated from gitlab.com)

Well, in the end I think the issue is still somewhere else. When trying it in CM6 there is still the same problem. But it doesn't seem to be just correctly showing diff on trailing comma, instead it shows/hides this diff based on the previous unrelated diff. Please see the video - in the beginning the comma on line 25 is not highlighted, but after adding some change to lines 20-21, the comma on line 25 is highlighted now.

https://github.com/user-attachments/assets/f415ffdd-05fd-4e2b-b276-ad1288eabe4a

Well, in the end I think the issue is still somewhere else. When trying it in [CM6](https://codemirror.net/try/?example=Merge%20View) there is still the same problem. But it doesn't seem to be just correctly showing diff on trailing comma, instead it shows/hides this diff based on the previous unrelated diff. Please see the video - in the beginning the comma on line 25 is not highlighted, but after adding some change to lines 20-21, the comma on line 25 is highlighted now. https://github.com/user-attachments/assets/f415ffdd-05fd-4e2b-b276-ad1288eabe4a
marijnh (Migrated from gitlab.com) reopened this issue 2024-09-09 17:09:40 +02:00
marijnh commented 2024-09-09 19:46:01 +02:00 (Migrated from gitlab.com)

But what constitutes the diff is still the diff library responsibility, not he code editor's, isn't it?

But what constitutes the diff is still the diff library responsibility, not he code editor's, isn't it?
marijnh commented 2024-09-09 21:35:41 +02:00 (Migrated from gitlab.com)

I don't know as the diff library has lots of options which CodeMirror doesn't propagate. I am not able to reproduce the issue in the diff library demo, therefore I am looking for wrong implementation in CodeMirror.

I don't know as the diff library has lots of options which CodeMirror doesn't propagate. I am not able to reproduce the issue in the diff library demo, therefore I am looking for wrong implementation in CodeMirror.
marijnh commented 2024-09-10 10:25:06 +02:00 (Migrated from gitlab.com)

But it doesn't seem to be just correctly showing diff on trailing comma, instead it shows/hides this diff based on the previous unrelated diff.

Again, diffs are not deterministic based on input. There's often multiple correct diffs for a given set of documents.

But it seems the line boundary alignment in @codemirror/merge wasn't working as well as I thought it was. Attached patch should help with that.

I am not able to reproduce the issue in the diff library demo, therefore I am looking for wrong implementation in CodeMirror.

I don't think diff_match_patch gives any guarantees on where it locates changes relative to line breaks, so I'm not really sure what 'reproducing the issue in the diff library' would look like.

> But it doesn't seem to be just correctly showing diff on trailing comma, instead it shows/hides this diff based on the previous unrelated diff. Again, diffs are not deterministic based on input. There's often multiple correct diffs for a given set of documents. But it seems the line boundary alignment in `@codemirror/merge` wasn't working as well as I thought it was. Attached patch should help with that. > I am not able to reproduce the issue in the diff library demo, therefore I am looking for wrong implementation in CodeMirror. I don't think diff_match_patch gives any guarantees on where it locates changes relative to line breaks, so I'm not really sure what 'reproducing the issue in the diff library' would look like.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
codemirror/codemirror5#7106
No description provided.