12th August, 2020
Attending: Dennis, Ola Tarkowska, Josh Moore, Martin Durant, John Kirkham, Matthias, Andrew, Ryan Williams
Apologies:
-
Standing items
- Introductions: new joinees should feel welcome to introduce
themselves and their interest in zarr/n5.
- Ola: looking into zarr (IO performance, features, compare to
HDF5) after being introduced to Alistair. Context of imaging data.
- Introductions: new joinees should feel welcome to introduce
-
Add community stuff here (30 minutes)
- Josh: likely to have tileDB versions of the zarr images for
(the start of) a benchmarking comparision
- Martin: running locally? Josh: would think both. Martin: would like to look into mult-get/get_items and fsspec. (async s3fs isn’t released yet; is breaking pandas currently) Got blocked by the various implementations Nested (which only shows up in the tests)
- Josh: could also see an intermediate solution (pre-v3) for the storage issue.
- Matthias: also fighting with Windows. For async everywhere
need to rethink the zarr API. Martin: work at the moment is focused on the fetching of multiple keys (lots of overhead on s3). E.g. xarray with a few hundred bytes per chunk. Looked at Matthias’ implementation (async def), but am closer to dask-distributed (i.e. asyncio). Can be added to v2 though not very deep (e.g. file listings)
-
**John: **saturated with rapids
- **Dennis: **just gave talk to NASA about netcdf cloud work with
Ward
- Josh: likely to have tileDB versions of the zarr images for
-
Add core stuff here (30 minutes)
- Matthias: Quantstack got contract on Monday. Will be joining
Friday meetings. Wednesdays are too late. Have them participate on the C implementation or green field? At least list in one place. People are confused that Zarr isn’t just Python. Something early in the docs. As well as some form of test suite to compare some set of files.
-
John: cf.
- Matthias: tests passing (though not on Windows due to
slashes). Talked to Sylvain. May need to rethink types not multiple of 8. Worth the complexity?
- Matthias: could use testers / mergers for various smallish
pull requests. Josh: any priority? Not really.
- Andrew: ditto on https://github.com/zarr-developers/zarr-python/pull/584
- Discussion about evaluating byte access patterns esp. With partial decompressions
- John: https://github.com/zarr-developers/zarr-python/blob/692f8e6b27c878c7dbad80f55366db7eea4f7d58/zarr/indexing.py
- John: tried memory mapping? Helpful? A: haven’t tried it. (still learning about blosc)
- Martin: came across case (in pangeo?) where blosc in a separate thread was slower than in the main thread. I.e. guessing performance will always be difficult.
- Matthias: Quantstack got contract on Monday. Will be joining