Skip to content

WIP: Refactor Sorted Containers#213

Open
Keno wants to merge 1 commit intomasterfrom
kf/refactorsorted
Open

WIP: Refactor Sorted Containers#213
Keno wants to merge 1 commit intomasterfrom
kf/refactorsorted

Conversation

@Keno
Copy link
Copy Markdown
Contributor

@Keno Keno commented Aug 1, 2016

This is a WIP to refactor sorted containers.

Currently, this only refactors the 2-3 tree to make use of relevant abstractions in AbstractTrees.jl and Iterators.jl, to express the algorithms at a higher level, making the code quite a bit shorter and hopefully easier to understand. I have done some basic testing, but there's quite a bit more left to do. I've also not done performance tuning yet, so this is likely significantly slower than the previous version. However, I've written the code in a manner that I'm confident I will be able to fully recover the performance without having to give up clarity.

Other potential things I've been thinking about:

  • Get rid of semi tokens from the exported user interface and only use tokens. I believe in most cases the compiler should be able to do any required optimization to remove the redundant information, and if not that's just a missing compiler optimization as opposed to a reason for complicating the API.
    • Add a getindex method to SortedMultiDict that returns an iterator over all keys of that values, including the ability to binary search by insertion order (some similarity perhaps to 206).

cc @StephenVavasis @kmsquire (probably not quite ready for thorough review yet, but in case you want an early look).

Use AbstractTrees and Iterators, to express the relevant operations at a higher level.
@StephenVavasis
Copy link
Copy Markdown
Contributor

Based on my understanding, it doesn't seem possible that full tokens could have the same performance as semitokens. A full token contains a pointer to the data structure. This means that the run-time system needs to know about the existence of every token in order to know when the data structure itself can be garbage-collected. I don't see how a compiler optimization could remove this bottleneck. In some usages of tokens, the compiler could determine from the name scoping that the data structure lives longer than a token, but it is easy to imagine scenarios in which the lifetime of the token versus the lifetime of the data structure cannot be determined at compile time.

@Keno
Copy link
Copy Markdown
Contributor Author

Keno commented Aug 1, 2016

Tokens are generally short lived. This should mean that the compiler's allocation elision pass should be able to see through the allocation of a token and remove it.

@StephenVavasis
Copy link
Copy Markdown
Contributor

Keno,

It is true that tokens are commonly short-lived, but this is not always the case. Indeed, in my application, semitokens are stored in another data structure. This is similar to the case of arrays. Array subscripts are are usually short-lived but not always. Sometimes they are stored in another data structure (e.g., in the implementation of SparseMatrixCSC).

I would advise against removing semitokens even though they clutter the interface. Julia programmers do not mind slogging through lengthy documentation if it means better performance.

@Keno
Copy link
Copy Markdown
Contributor Author

Keno commented Aug 1, 2016

I don't mind keeping semitokens, but I don't think they should be anywhere near the primary interface. Would it be sufficient to have a method that converts full tokens into semi tokens for storage?

Julia programmers do not mind slogging through lengthy documentation if it means better performance.

While this is true, it is also true that Julia programmers expect friendly interfaces to be used at the REPL, and get decent, even if not amazing, performance if not amazing performance. Right now I feel like the learning curve for this API is too steep.

@StephenVavasis
Copy link
Copy Markdown
Contributor

Keno,

I agree that it is plausible that a compiler could optimize away the overhead with tokens if the token is converted to a semitoken inside a loop (and then discarded). The last time I checked this with a timing test (maybe a year ago), however, the compiler did not make this optimization.

If current or near-term versions of Julia are able to make this optimization, then I agree with you that it should be OK to make tokens the main interface, and there could be a secondary interface based on extracting semitokens from tokens and reassembling tokens later.

@kmsquire
Copy link
Copy Markdown
Member

@Keno, this has gone a bit stale. Shall we close it? It seems like a good idea to me, but would probably take a rewrite at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants