feat: add getDiagnostics function

nathanlesage commented

2022-11-13 20:24:19 +01:00

(Migrated from github.com)

Adds a function that can be used to retrieve all diagnostics that are present in the provided state. This PR basically addresses this discussion over on discuss.codemirror.net that was originally posted for CodeMirror 5, but since this functionality doesn't exist in the v6 as well, I figured it would be great to finally have it.

Description

What this PR does is add a new function, getDiagnostics that returns all diagnostics that are present within a provided state. To do so, it basically just takes the lintState field and iterates over the generated decorations, collecting the attached diagnostics along the way and returning them as a single array.

This can be used to attach various linters to the state and retrieve them. Users can then afterwards filter them (e.g. look at their source property) and do something with it. This can prove handy for example when you have one or more linters active that probe a file for problems and need to access them outside of the linter function in order to prevent double work.

Rationale

Normally, the action field can be used to define some actions the user can perform to fix or improve the linter error. However, sometimes the potential actions are too many or need to be fetched on-demand. Therefore, it is handy to have a function to quickly retrieve whatever the linters have found and therefore to avoid having to check the file's contents twice.

Example usecase

I've basically moved forward with this because I'm using the built-in linter function to create a custom spell checker (because I can't completely rely on the OS for doing the spell checking).

Said spell checker contains a ton of code to ensure we're spell-checking correctly (i.e. takes into account brackets and custom quote symbols). However, while functionally a spell checker is a linter, potential actions that can be taken are normally provided in a context menu and not via actions like other linters.

In order to avoid having to run the spellchecker twice over the text, one can use this function to retrieve all diagnostics currently attached to the state whenever the user requests a context menu, filter out anything whose source is my spell checker and check if there's a spelling error underneath the cursor position, and provide suggestions as applicable.

Caveats

Currently, this simply takes all diagnostics irrespective of the originating Facets, both because I'm still not comfortably proficient with the use of facets, and because normally the amount of linter errors/warnings/infos should be fairly manageable. Therefore, this PR basically rests on the range iterator being fast.

Also, one could also simply save the decorations manually in the stateField, but I felt this implementation is the least intrusive into the existing code, hence preventing potential other problems.

I tried to stick to the conventions of the repo, but please feel free to outline any potential problems with this implementation and what I can change.

Thank you for your outstanding work – CodeMirror 6 is simply amazing!

Adds a function that can be used to retrieve all diagnostics that are present in the provided state. This PR basically addresses [this discussion over on discuss.codemirror.net](https://discuss.codemirror.net/t/i-want-to-check-if-there-are-lint-errors/2171) that was originally posted for CodeMirror 5, but since this functionality doesn't exist in the v6 as well, I figured it would be great to finally have it. ## Description What this PR does is add a new function, `getDiagnostics` that returns all diagnostics that are present within a provided state. To do so, it basically just takes the `lintState` field and iterates over the generated decorations, collecting the attached diagnostics along the way and returning them as a single array. This can be used to attach various linters to the state and retrieve them. Users can then afterwards filter them (e.g. look at their `source` property) and do something with it. This can prove handy for example when you have one or more linters active that probe a file for problems and need to access them outside of the linter function in order to prevent double work. ## Rationale Normally, the `action` field can be used to define some actions the user can perform to fix or improve the linter error. However, sometimes the potential actions are too many or need to be fetched on-demand. Therefore, it is handy to have a function to quickly retrieve whatever the linters have found and therefore to avoid having to check the file's contents twice. ## Example usecase I've basically moved forward with this because I'm using the built-in linter function to create a custom spell checker (because I can't completely rely on the OS for doing the spell checking). Said spell checker contains a ton of code to ensure we're spell-checking correctly (i.e. takes into account brackets and custom quote symbols). However, while functionally a spell checker is a linter, potential `actions` that can be taken are normally provided in a context menu and not via `actions` like other linters. In order to avoid having to run the spellchecker twice over the text, one can use this function to retrieve all diagnostics currently attached to the state whenever the user requests a context menu, filter out anything whose `source` is my spell checker and check if there's a spelling error underneath the cursor position, and provide suggestions as applicable. ## Caveats Currently, this simply takes all diagnostics irrespective of the originating `Facet`s, both because I'm still not comfortably proficient with the use of facets, and because normally the amount of linter errors/warnings/infos should be fairly manageable. Therefore, this PR basically rests on the range iterator being fast. Also, one could also simply save the decorations manually in the stateField, but I felt this implementation is the least intrusive into the existing code, hence preventing potential other problems. I tried to stick to the conventions of the repo, but please feel free to outline any potential problems with this implementation and what I can change. Thank you for your outstanding work – CodeMirror 6 is simply amazing!

marijnh commented

2022-11-14 13:38:37 +01:00

(Migrated from github.com)

Would it work to listen to transactions with the setDiagnosticsEffect and act on the array of diagnostics they provide, instead of formulating this as a state accessor?

Would it work to listen to transactions with the `setDiagnosticsEffect` and act on the array of diagnostics they provide, instead of formulating this as a state accessor?

nathanlesage commented

2022-11-14 14:40:10 +01:00

(Migrated from github.com)

It would somehow work, but unnecessarily increase the complexity of the state management.

Utilizing a getDiagnostics accessor as I propose here has the benefit that it changes nothing about either my own code or the linter plugin's code. As soon as I need to actually have a look at those diagnostics, I can access the diagnostics with a single function call.

Furthermore, I am here talking of the use case of some external code taking a look at the state to see if something is in there. For purely internal code that already utilizes StateFields, the effect listener is definitely a viable option.

In the case of external code needing to take a look at the state, the getDiagnostics accessor has, in my view, some benefits:

It does not add any overhead to how the linter works.
It utilizes only what's already in the plugin, i.e. it does not increase either the amount of data stored in the linter plugin, nor of some external code one would have to write to listen to the effect.
It is very simple code-wise and keeps the state management centralized within the plugin.
It makes sense because there is already a setter for diagnostics – having no getter makes it appear that there is some intentionality behind fencing away the diagnostics (the effect listener is not the most straight forward way).

The alternative that you suggest has, if I don't misunderstand anything blatantly (please say so, if I do), several drawbacks for the use-case I'm describing:

If external code needs to look at diagnostics, one has to write an additional StateField that simply collects diagnostics as they are being set by the lint plugin and that can then be queried with view.state.field(customField). This would essentially replicate the changes I propose in this PR with an extra step.
It doubles the amount of diagnostics in the state and increases the risk of introducing bugs
Because anotherStateField has to touch the diagnostics every iteration they change, this imposes a tiny overhead over just accessing them when absolutely necessary
It decentralizes the responsibility: We have a state management in (1) the linter, (2) the custom field, (3) whatever external code needs to look at that.

Or, is there some benefit that the detour via the state effect listener has that I am not seeing here?

It would somehow work, but unnecessarily increase the complexity of the state management. Utilizing a `getDiagnostics` accessor as I propose here has the benefit that it changes nothing about either my own code or the linter plugin's code. As soon as I need to actually have a look at those diagnostics, I can access the diagnostics with a single function call. Furthermore, I am here talking of the use case of some *external* code taking a look at the `state` to see if something is in there. For purely internal code that already utilizes `StateField`s, the effect listener is definitely a viable option. In the case of external code needing to take a look at the state, the `getDiagnostics` accessor has, in my view, some benefits: * It does not add any overhead to how the linter works. * It utilizes only what's already in the plugin, i.e. it does not increase either the amount of data stored in the linter plugin, nor of some external code one would have to write to listen to the effect. * It is very simple code-wise and keeps the state management centralized within the plugin. * It makes sense because there is already a setter for diagnostics – having no getter makes it appear that there is some intentionality behind fencing away the diagnostics (the effect listener is not the most straight forward way). The alternative that you suggest has, if I don't misunderstand anything blatantly (please say so, if I do), several drawbacks for the use-case I'm describing: * If external code needs to look at diagnostics, one has to write an additional `StateField` that simply collects diagnostics as they are being set by the `lint` plugin and that can then be queried with `view.state.field(customField)`. This would essentially replicate the changes I propose in this PR with an extra step. * It doubles the amount of diagnostics in the state and increases the risk of introducing bugs * Because another`StateField` has to touch the diagnostics every iteration they change, this imposes a tiny overhead over just accessing them when absolutely necessary * It decentralizes the responsibility: We have a state management in (1) the linter, (2) the custom field, (3) whatever external code needs to look at that. Or, is there some benefit that the detour via the state effect listener has that I am not seeing here?

marijnh commented

2022-11-14 14:56:38 +01:00

(Migrated from github.com)

If people are going to call this a lot (which is a reasonable thing to do, if it is an exported function), converting the range set to an array again and again is definitely not an example of efficiency. Additionally, the diagnostic objects in the range set may hold positions that no longer correspond to their current position, or even be filtered out entirely, making this a rather error-prone interface that I'm not willing to commit to. See also #7 and #8.

nathanlesage commented

2022-11-14 15:49:22 +01:00

(Migrated from github.com)

Hm, I've read through #7 and #8 and now I have a few more questions.

You state here as well as in #8 that the diagnostics being returned by that accessor could be outdated if the document has in the meantime been updated and the linter did not (yet) re-run the linter sources. However, if we use an update listener, doesn't this approach have exactly the same issue, just that the (potentially outdated) diagnostics now reside in a different state field ? After all, the effect will only be fired once the sources are all done computing their set of diagnostics. Therefore, any approach in this regard would be somewhat error-prone, except you add even more code to always clean out the diagnostics array when the document changes and wait for the state effect to arrive.

I absolutely agree that the solution advocated for in #7, #8, and #9 does impose some overhead when it comes to iterating over the decoration ranges and re-converting them to diagnostics. The other way to implement this then would be to simply store the diagnostics separately of the decorations instead of attaching them to the specs. Then, returning the array of current diagnostics would add only very little overhead.

I think if the getDiagnostics accessor is properly documented, one should expect users of this to be sane enough to know that, since the linters run not synchronous to state updates, these decorations may be outdated.

Nevertheless, if you decide that it is not worth implementing it as-is, I have another proposal for you that would address the issue you describe and would prevent errors much better, regardless of whether the state management is located within the lint plugin or externalized to custom-written state fields:

Store the diagnostics array as-is in the lintState field (to ease accessing them)
Additionally, divide the diagnostics up in two levels of precision: "Maybe outdated" and "definitely corresponding to the rest of the current state", e.g. currentDiagnostics and oldDiagnostics
Whenever new diagnostics have been calculated, they are written to currentDiagnostics
When the document changes, the diagnostics are moved from currentDiagnostics to oldDiagnostics. As soon as new diagnostics have been computed, they are written to currentDiagnostics, except the state has changed again, in which case they will be directly written to oldDiagnostics
The getDiagnostics accessor receives another parameter, precise (similarly to some other interfaces that CodeMirror exposes which are also somewhat volatile). If precise is true, only currentDiagnostics is returned, even if it is just an empty array because the sources have not yet finished linting the document. If not, oldDiagnostics may also be returned while the linter computes the next batch of diagnostics.

Depending on how much you would be willing to commit to having "outdated" diagnostics stored in the state, we could also just get rid of the second array, which would make the required changes less difficult to implement and would make the accessor definitely return 100% safe diagnostics that are not outdated.

This would have the following benefits:

People do not need to have additional StateFields that do not shield them from the outdating-issue that you describe
External code has an easier way accessing diagnostics on the state
The interface is by default 100% coupled to the state and does not have possibly wrong diagnostics (but it provides the option)
Since this would include greater state management efforts, it is wise to keep that as close to the source of the diagnostics as possible.
We can also expose this API to update listeners so that those people who can use StateFields because they need to access the diagnostics only internally also can be much more certain about the precision of these diagnostics.

Would this be a compromise you're willing to settle on? Given that apparently there now have been three consecutive PRs that all address this one, single issue, I think there are definitely benefits to it, and people find it curious why there is no simple accessor.

If you're willing to give it a shot, I will implement the changes as necessary and update this PR accordingly. I would very much welcome if we could work this out :)

Hm, I've read through #7 and #8 and now I have a few more questions. You state here as well as in #8 that the diagnostics being returned by that accessor could be outdated if the document has in the meantime been updated and the linter did not (yet) re-run the linter sources. However, if we use an update listener, doesn't this approach have exactly the same issue, just that the (potentially outdated) diagnostics now reside in a *different* state field ? After all, the effect will only be fired once the sources are all done computing their set of diagnostics. Therefore, any approach in this regard would be somewhat error-prone, except you add *even more code* to always clean out the diagnostics array when the document changes and wait for the state effect to arrive. I absolutely agree that the solution advocated for in #7, #8, and #9 does impose some overhead when it comes to iterating over the decoration ranges and re-converting them to diagnostics. The other way to implement this then would be to simply store the diagnostics separately of the decorations instead of attaching them to the specs. Then, returning the array of current diagnostics would add only very little overhead. I think if the `getDiagnostics` accessor is properly documented, one should expect users of this to be sane enough to know that, since the linters run not synchronous to state updates, these decorations may be outdated. Nevertheless, if you decide that it is not worth implementing it as-is, I have another proposal for you that would address the issue you describe and would prevent errors much better, regardless of whether the state management is located within the lint plugin or externalized to custom-written state fields: 1. Store the diagnostics array as-is in the `lintState` field (to ease accessing them) 2. Additionally, divide the diagnostics up in two levels of precision: "Maybe outdated" and "definitely corresponding to the rest of the current state", e.g. `currentDiagnostics` and `oldDiagnostics` 3. Whenever new diagnostics have been calculated, they are written to `currentDiagnostics` 4. When the document changes, the diagnostics are moved from `currentDiagnostics` to `oldDiagnostics`. As soon as new diagnostics have been computed, they are written to `currentDiagnostics`, except the state has changed *again*, in which case they will be directly written to `oldDiagnostics` 5. The `getDiagnostics` accessor receives another parameter, `precise` (similarly to some other interfaces that CodeMirror exposes which are also somewhat volatile). If `precise` is `true`, only `currentDiagnostics` is returned, even if it is just an empty array because the sources have not yet finished linting the document. If not, `oldDiagnostics` may also be returned while the linter computes the next batch of diagnostics. Depending on how much you would be willing to commit to having "outdated" diagnostics stored in the state, we could also just get rid of the second array, which would make the required changes less difficult to implement and would make the accessor definitely return 100% safe diagnostics that are not outdated. This would have the following benefits: 1. People do not need to have additional `StateFields` that do not shield them from the outdating-issue that you describe 2. External code has an easier way accessing diagnostics on the state 3. The interface is by default 100% coupled to the state and does not have possibly wrong diagnostics (but it provides the option) 4. Since this would include greater state management efforts, it is wise to keep that as close to the source of the diagnostics as possible. 5. We can also expose this API to update listeners so that those people who can use StateFields because they need to access the diagnostics only internally also can be much more certain about the precision of these diagnostics. Would this be a compromise you're willing to settle on? Given that apparently there now have been three consecutive PRs that all address this one, single issue, I think there are definitely benefits to it, and people find it curious why there is no simple accessor. If you're willing to give it a shot, I will implement the changes as necessary and update this PR accordingly. I would very much welcome if we could work this out :)

marijnh commented

2022-11-14 16:07:00 +01:00

(Migrated from github.com)

However, if we use an update listener, doesn't this approach have exactly the same issue, just that the (potentially outdated) diagnostics now reside in a different state field ?

Given the nature of transaction effects, the fact that this is a snapshot is a lot more obvious in the current interface.

The approach with current and old diagnostics seems to complicate the concept without providing much benefit.

An internal iterator that runs a function over the diagnostics rangeset, passing the current position + the original object might be a simple compromise. Would that work for you?

> However, if we use an update listener, doesn't this approach have exactly the same issue, just that the (potentially outdated) diagnostics now reside in a different state field ? Given the nature of transaction effects, the fact that this is a snapshot is a lot more obvious in the current interface. The approach with current and old diagnostics seems to complicate the concept without providing much benefit. An internal iterator that runs a function over the diagnostics rangeset, passing the current position + the original object might be a simple compromise. Would that work for you?

nathanlesage commented

2022-11-14 16:23:51 +01:00

(Migrated from github.com)

That sounds good. How do you imagine this to be implemented?

marijnh commented

2022-11-14 16:37:31 +01:00

(Migrated from github.com)

Does attached patch look like it'd cover your use case?

nathanlesage commented

2022-11-14 19:04:24 +01:00

(Migrated from github.com)

Absolutely, yes!