2024-03-20
Attending: Davis Bennett (DB), Sanket Verma (SV), Janos Zimmermann (JZ), Gābor Kovācs (GB)
TL;DR:
The team discussed the recent HTTP Extension meeting and potential community interest, async loading in Zarr-Python, and optimizing chunk sizes for S3. Other topics included PR reviews, tutorial examples for cloud storage, Repo-Review suggestions, fixing issues with Pytest, and updates on Zarr-Python V3 and N5 support.
Updates:
- Join ZulipChat: https://ossci.zulipchat.com/
- HTTP Extension meeting took place on 3/14
- Trying to figure out the best way forward, i.e. a ZEP or not
- Guaging interest and use cases from others in the community
Meeting Minutes:
- DB: Zarr-Python doesn’t use async while loading the chunks - but it’s fairly easily to parallelize the chunks as they are mostly files
- JZ: User should have ability/freedom to do parallelism on their own
- DB: No concrete proposals rn - scheduling the reading of chunks via cloud
- JZ: Using Zarr rn to write data to S3 - chunk size 1MB
- DB: It’s Dask fault - every dask task cost 1ms for task grapher - with million Zarr chunks you’ll be spending a lot of time on dask graph
- JZ: Rechunking
- DB: Rechunking on Dask array?
- JZ: Yes!
- DB: Issues you’ll face:
- no. of task will be v. v. large
- Complexity in the order of O(no. of chunks)
- JZ: Got speed of 1TB/S for local - got 200MB/S when switched to S3 - S3 is inserting keys and putting it S3 store
- DB: Makes sense to benchmark S3 until you timeout - at some point you start getting error code - once you’re there you can’t go any faster
- JZ: Currently trying to find the sweet size of chunks - will also look into Dask
- DB: Dask introduces some complexity - I use Dask in it’s primitive form - and write custom functions for other stuff
- Building on https://github.com/zarr-developers/zarr-python/pull/1713
- Should we also add similar examples for AWS and Azure?
- Or move the existing material from tutorial section to examples?
- DB: Basically fall under the tutorials sections
- SV: Will ask on the PR if any objections to put it under
tutorials.rst
- Dimitri is working on applying Repo-Review suggestions to Zarr-Python
main
- Any reason to not have in
V3
branch? - DB: If there’s nothing disrupting the day-to-day work it’s good to have those PRs - we can always bring the stuff later in V3
- Any reason to not have in
- Fixing https://github.com/zarr-developers/zarr-python/pull/1671
- PR by Davis: https://github.com/zarr-developers/zarr-python/pull/1714/ - any objections?
- DB: Looks good for now but we should also check what changes does Pytest 8.0.1 brings
- GB: Any new updates to Zarr-Python V3?
- DB: Will be funded until May to work on Zarr-Python by Earthmover
- DB: Do you use N5?
- GB: Little bit
- DB: There’s a proposal to remove N5 from Zarr-Python - you’d need to install N5py to work with N5
- DB: Saalfeld is making changes/developing N5 - adding major versions
- GB: N5 may support V3 - had a chat with Saalfeld