29th July, 2020
Attending: John Kirkham, Matthias Bussonnier, Josh Moore, Ryan Abernathy,
Ward Fisher, Martin Durant, Nicholas Sofroniew, Alistair Miles,
Dennis Heimbigner, Andrew
Apologies:
-
Standing items
Introductions~~: new joinees should feel welcome tointroduce themselves and their interest in zarr/n5.~~
-
Alistair time off, Report: All was good.
-
**Quantstack: **
- Some administration work remaining. (John to ping, done)
- Scope of project: C vs. C++
- Which stores to implement
- Some boundary issues between libraries (xarray+dask+zarr v xtensor+zarr)
- Support not-multiple-of-8bit types?
- Dynamic extensions should be find. All a question of tradeoffs (DLL vs. header-only library vs. …)
- How much of numcodecs to implement? Cover next Friday?
-
Blosc project, applying for EOSS grant (Josh)
-
Blosc 2 codec
-
NDLZ codec
-
- What is the main motivation for separate development?
- What functionalities do they offer (that zarr doesn’t)?
-
-
TileDB (Ryan)
-
Reached out for a conversation.
-
30M Series-A funding round. 20 people, full-time.
-
Launching TileDB Cloud
-
Different motivations. Discussed avenues for collaboration.
-
Looking at xarray backend. Interested in netcdf compatibility.
- Discussed how tiledb format works. Clever things. Don’t
outsource compression. Have 2 tiers of chunking. 64KB at the low-level to align with L2 caches, etc. Also works around having too many files.
-
Strategy for the relationship.
-
Dennis: single implementation problem like HDF5?
- Ryan: that’s the benefit of Zarr is the openness of the process and the multiple implementations.
- Martin: simplicity is worth a lot if you are going to read all of your data in reasonable chunks in parallel
- Alistair: would like to see benchmarks for different access patterns.
- Ryan: to do it right requires dedicated funding (but would also like to see it done). CS PhD student? Worse to do it wrong rather than not at all.
- To TileDB: “Go for it”
-
Nick: was also reached out to.
- Perhaps reslicing is a benefit?
- Martin: in place updating?
- Ryan: Both analysis-optimized on-disk formats
- Martin: look at some of their implementations, e.g. subranges. Possible, but depends on which compression formats.
- Ryan: dask has dask v. spark page. Very honest. We should try to do the same: v HDF5 and v TileDB (+1)
- Nick: see NASA comparison – https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20200001178.pdf
- John: inclusivity of the spec. Engaging on spec development?
-
-
Add Community Topics Here
-
Rechunker (Ryan)
-
Nick: if you don’t know the use case yet, do you pick the intermediate format? Ryan: good question, probably, yes. Emergent best practices. Suggests figuring out how to peak into the chunks.
-
NB: would be good to have wheels of numcodecs!
- JK: PR (
https://github.com/zarr-developers/numcodecs/pull/224 ) tries to solve this, but could benefit from some reviews :)
-
Ryan: No docker. Fast installation from a .txt
- Matthias: requirement on the C implementation? A static
executable. (Ryan: +1)
- JK: PR (
-
Nick: useful for non-orthogonal views? Not yet, though you could use a large number of chunks in the intermediate
-
Benefits from Martin’s async work
-