2024-03-20

Attending: Davis Bennett (DB), Sanket Verma (SV), Janos Zimmermann (JZ), Gābor Kovācs (GB)

TL;DR:

The team discussed the recent HTTP Extension meeting and potential community interest, async loading in Zarr-Python, and optimizing chunk sizes for S3. Other topics included PR reviews, tutorial examples for cloud storage, Repo-Review suggestions, fixing issues with Pytest, and updates on Zarr-Python V3 and N5 support.

Updates:

  • Join ZulipChat: https://ossci.zulipchat.com/
  • HTTP Extension meeting took place on 3/14
    • Trying to figure out the best way forward, i.e. a ZEP or not
    • Guaging interest and use cases from others in the community

Meeting Minutes:

  • DB: Zarr-Python doesn’t use async while loading the chunks - but it’s fairly easily to parallelize the chunks as they are mostly files
    • JZ: User should have ability/freedom to do parallelism on their own
    • DB: No concrete proposals rn - scheduling the reading of chunks via cloud
    • JZ: Using Zarr rn to write data to S3 - chunk size 1MB
    • DB: It’s Dask fault - every dask task cost 1ms for task grapher - with million Zarr chunks you’ll be spending a lot of time on dask graph
    • JZ: Rechunking
    • DB: Rechunking on Dask array?
    • JZ: Yes!
    • DB: Issues you’ll face:
      • no. of task will be v. v. large
      • Complexity in the order of O(no. of chunks)
    • JZ: Got speed of 1TB/S for local - got 200MB/S when switched to S3 - S3 is inserting keys and putting it S3 store
    • DB: Makes sense to benchmark S3 until you timeout - at some point you start getting error code - once you’re there you can’t go any faster
    • JZ: Currently trying to find the sweet size of chunks - will also look into Dask
    • DB: Dask introduces some complexity - I use Dask in it’s primitive form - and write custom functions for other stuff
  • Building on https://github.com/zarr-developers/zarr-python/pull/1713
    • Should we also add similar examples for AWS and Azure?
    • Or move the existing material from tutorial section to examples?
    • DB: Basically fall under the tutorials sections
    • SV: Will ask on the PR if any objections to put it under tutorials.rst
  • Dimitri is working on applying Repo-Review suggestions to Zarr-Python main
    • Any reason to not have in V3 branch?
    • DB: If there’s nothing disrupting the day-to-day work it’s good to have those PRs - we can always bring the stuff later in V3
  • Fixing https://github.com/zarr-developers/zarr-python/pull/1671
  • GB: Any new updates to Zarr-Python V3?
    • DB: Will be funded until May to work on Zarr-Python by Earthmover
    • DB: Do you use N5?
    • GB: Little bit
    • DB: There’s a proposal to remove N5 from Zarr-Python - you’d need to install N5py to work with N5
    • DB: Saalfeld is making changes/developing N5 - adding major versions
    • GB: N5 may support V3 - had a chat with Saalfeld