Draft of potential masked array implementation.#849
Draft of potential masked array implementation.#849andrei-papou wants to merge 2 commits intorust-ndarray:masterfrom
Conversation
|
Is the endgoal of this PR (and further work) to be as close as possible to the ones in numpy? I ask because I'm a long time user of numpy, as is everyone in the company I work for, and nobody used this In brief, I see 3 problems with this (not your PR spedifically, but the concept)
What I would gladly use is a but this is somewhat irrelevant to the current discussion :) |
|
I appreciate reading your sketch andrei, you're more productive than me, just having a go at a draft instead of trying to make something perfect. I think it's been mentioned before yeah, the question whether to have masked arrays or masked operations on arrays. I dread the complexity of either. Thanks nilgoyette for the candid thoughts too. I think we should start with masked operations. I think that's what a masked array type (if it were to exist) would need as basis anyway. And it allows having a separate mask too - which should hopefully be more efficent (packed or sparse bitmap?) |
|
I realise this is an older PR but I would vote in favour of masks. I'm by no means experienced in numpy so the following may have a different solution. My current use case is a 2D jagged array of ids. Each row represents the following: [parent_id, child_id, facet1_id, facet2_id, ...facetN_id]Since numpy docs state 2D arrays must be rectangular, not jagged, I use Following that, I mask the facet_groups = [[A, B], [C]]
filtered = arr[np.logical_and.reduce([np.isin(arr, facet_ids).any(axis=1) for facet_ids in facet_groups])]Now I can remove the mask and extract all the parent_id/child_id values with a slice: filtered.mask = np.ma.nomask
parent_ids = filtered[:,0]
child_ids = filtered[:,1]In addition, I use All in all, it's a very concise bit of code that performs very well for the small dataset of a few hundred thousand values. |
|
Yes, I highly agree with @stuarta0, there needs to be a masked_fill feature in ndarray. Currently, you would do so on a nth dimensional array using loops, where if a specific value is mask, then replacing it with z. |
At least this is what I do, there most definitely is a simpler version out somewhere. |
There are two files:
src/ma/mod.rs- masked array implementation, all the types and traits live there.tests/ma.rs- a couple of tests that demonstrate the potential public API of masked array.The main idea is to have a
Masktrait which is pretty generic and can be implemented not just byArrayBase, but by for example a set of whitelist/blacklist indices, set of whitelisted/blacklisted values, etc.