Conversation
Use AbstractTrees and Iterators, to express the relevant operations at a higher level.
|
Based on my understanding, it doesn't seem possible that full tokens could have the same performance as semitokens. A full token contains a pointer to the data structure. This means that the run-time system needs to know about the existence of every token in order to know when the data structure itself can be garbage-collected. I don't see how a compiler optimization could remove this bottleneck. In some usages of tokens, the compiler could determine from the name scoping that the data structure lives longer than a token, but it is easy to imagine scenarios in which the lifetime of the token versus the lifetime of the data structure cannot be determined at compile time. |
|
Tokens are generally short lived. This should mean that the compiler's allocation elision pass should be able to see through the allocation of a token and remove it. |
|
Keno, It is true that tokens are commonly short-lived, but this is not always the case. Indeed, in my application, semitokens are stored in another data structure. This is similar to the case of arrays. Array subscripts are are usually short-lived but not always. Sometimes they are stored in another data structure (e.g., in the implementation of SparseMatrixCSC). I would advise against removing semitokens even though they clutter the interface. Julia programmers do not mind slogging through lengthy documentation if it means better performance. |
|
I don't mind keeping semitokens, but I don't think they should be anywhere near the primary interface. Would it be sufficient to have a method that converts full tokens into semi tokens for storage?
While this is true, it is also true that Julia programmers expect friendly interfaces to be used at the REPL, and get decent, even if not amazing, performance if not amazing performance. Right now I feel like the learning curve for this API is too steep. |
|
Keno, I agree that it is plausible that a compiler could optimize away the overhead with tokens if the token is converted to a semitoken inside a loop (and then discarded). The last time I checked this with a timing test (maybe a year ago), however, the compiler did not make this optimization. If current or near-term versions of Julia are able to make this optimization, then I agree with you that it should be OK to make tokens the main interface, and there could be a secondary interface based on extracting semitokens from tokens and reassembling tokens later. |
|
@Keno, this has gone a bit stale. Shall we close it? It seems like a good idea to me, but would probably take a rewrite at this point. |
This is a WIP to refactor sorted containers.
Currently, this only refactors the 2-3 tree to make use of relevant abstractions in AbstractTrees.jl and Iterators.jl, to express the algorithms at a higher level, making the code quite a bit shorter and hopefully easier to understand. I have done some basic testing, but there's quite a bit more left to do. I've also not done performance tuning yet, so this is likely significantly slower than the previous version. However, I've written the code in a manner that I'm confident I will be able to fully recover the performance without having to give up clarity.
Other potential things I've been thinking about:
cc @StephenVavasis @kmsquire (probably not quite ready for thorough review yet, but in case you want an early look).