WIP: Refactor Sorted Containers by Keno · Pull Request #213 · JuliaCollections/DataStructures.jl

Keno · 2016-08-01T16:16:36Z

This is a WIP to refactor sorted containers.

Currently, this only refactors the 2-3 tree to make use of relevant abstractions in AbstractTrees.jl and Iterators.jl, to express the algorithms at a higher level, making the code quite a bit shorter and hopefully easier to understand. I have done some basic testing, but there's quite a bit more left to do. I've also not done performance tuning yet, so this is likely significantly slower than the previous version. However, I've written the code in a manner that I'm confident I will be able to fully recover the performance without having to give up clarity.

Other potential things I've been thinking about:

Get rid of semi tokens from the exported user interface and only use tokens. I believe in most cases the compiler should be able to do any required optimization to remove the redundant information, and if not that's just a missing compiler optimization as opposed to a reason for complicating the API.
- Add a getindex method to SortedMultiDict that returns an iterator over all keys of that values, including the ability to binary search by insertion order (some similarity perhaps to 206).

cc @StephenVavasis @kmsquire (probably not quite ready for thorough review yet, but in case you want an early look).

Use AbstractTrees and Iterators, to express the relevant operations at a higher level.

StephenVavasis · 2016-08-01T17:06:53Z

Based on my understanding, it doesn't seem possible that full tokens could have the same performance as semitokens. A full token contains a pointer to the data structure. This means that the run-time system needs to know about the existence of every token in order to know when the data structure itself can be garbage-collected. I don't see how a compiler optimization could remove this bottleneck. In some usages of tokens, the compiler could determine from the name scoping that the data structure lives longer than a token, but it is easy to imagine scenarios in which the lifetime of the token versus the lifetime of the data structure cannot be determined at compile time.

Keno · 2016-08-01T17:08:47Z

Tokens are generally short lived. This should mean that the compiler's allocation elision pass should be able to see through the allocation of a token and remove it.

StephenVavasis · 2016-08-01T17:42:35Z

Keno,

It is true that tokens are commonly short-lived, but this is not always the case. Indeed, in my application, semitokens are stored in another data structure. This is similar to the case of arrays. Array subscripts are are usually short-lived but not always. Sometimes they are stored in another data structure (e.g., in the implementation of SparseMatrixCSC).

I would advise against removing semitokens even though they clutter the interface. Julia programmers do not mind slogging through lengthy documentation if it means better performance.

Keno · 2016-08-01T17:48:55Z

I don't mind keeping semitokens, but I don't think they should be anywhere near the primary interface. Would it be sufficient to have a method that converts full tokens into semi tokens for storage?

Julia programmers do not mind slogging through lengthy documentation if it means better performance.

While this is true, it is also true that Julia programmers expect friendly interfaces to be used at the REPL, and get decent, even if not amazing, performance if not amazing performance. Right now I feel like the learning curve for this API is too steep.

StephenVavasis · 2016-08-02T01:06:13Z

Keno,

I agree that it is plausible that a compiler could optimize away the overhead with tokens if the token is converted to a semitoken inside a loop (and then discarded). The last time I checked this with a timing test (maybe a year ago), however, the compiler did not make this optimization.

If current or near-term versions of Julia are able to make this optimization, then I agree with you that it should be OK to make tokens the main interface, and there could be a secondary interface based on extracting semitokens from tokens and reassembling tokens later.

kmsquire · 2018-09-11T06:40:12Z

@Keno, this has gone a bit stale. Shall we close it? It seems like a good idea to me, but would probably take a rewrite at this point.

WIP: Refactor Balanced Trees

775b691

Use AbstractTrees and Iterators, to express the relevant operations at a higher level.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Refactor Sorted Containers#213

WIP: Refactor Sorted Containers#213
Keno wants to merge 1 commit intomasterfrom
kf/refactorsorted

Keno commented Aug 1, 2016

Uh oh!

StephenVavasis commented Aug 1, 2016

Uh oh!

Keno commented Aug 1, 2016

Uh oh!

StephenVavasis commented Aug 1, 2016

Uh oh!

Keno commented Aug 1, 2016

Uh oh!

StephenVavasis commented Aug 2, 2016

Uh oh!

kmsquire commented Sep 11, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Keno commented Aug 1, 2016

Uh oh!

StephenVavasis commented Aug 1, 2016

Uh oh!

Keno commented Aug 1, 2016

Uh oh!

StephenVavasis commented Aug 1, 2016

Uh oh!

Keno commented Aug 1, 2016

Uh oh!

StephenVavasis commented Aug 2, 2016

Uh oh!

kmsquire commented Sep 11, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants