20th October, 2021

Attending: Josh Moore, Davis Bennett, Dennis Heimbigner, Ward Fisher, Martin Durant, Ryan Williams, Gregory Lee

Davis:
- interested in
  
  https://github.com/zarr-developers/zarr-python/issues/843
- also have a branch for fixing how writing of chunks reads
  - has a race condition in test_sync. Use Juan’s timeout.
  - PR to be opened.
Josh:
- Investigating xarray interaction improvements with b-open
- Also investigating sharding prototype API with scalable minds.
  - May want to make use of a translation layer like kerchunk
Martin
- zarr is still the primary use for
  
  kerchunk
  - Talking to people, astro data will work only if very regular
  - List of chunk sizes in dask array
  - Could imagine implementing the dask scheme in zarr (breaks assumptions)
  - Cause: Detectors of complicated sizes; mosaics with overlaps. Also in the time domain. (e.g. in geo & climate where 365 v 366 days causes a problem)
  - Dennis: switching to **variable chunk size **would allow Zarr to support lazy rechunking – take a given chunk and over time divide it into smaller chunks (i.e. attractive). Martin: may need to move chunk names after re-indexing. cf. tiledb’s macrochunks & microchunks.
- numcodecs
  - interested to use entrypoints for declaring codecs
  - kerchunk desires this for GRIB files (c-compiled thing, so writes to temporary local file; using codecs for this)
  - imagecodecs from cgohlke
  - Josh: still need a unique identifier
  - Martin: in dask you would need to use an environment file for the right libraries.
- Sparse?
  - Josh: see Eric below. Expect to see a PR eventually.
  - Davis: how is that not a codec?
  - Martin: more a convention concern. Working with the awkward arrays.
- PyData Global!
  - Talk called “All you need is Zarr”
  - many in the HDF5 community are interested in Zarr+Kerchunk to get cloud friendly storage. Basically a translation layer. Likely to need more codecs.
  - Josh: strong relationship to the sharding work.
  - Martin: metadata tricks can get you a lot without too many downsides. (Even though you’ll need to scan your data)
  - May be pushing on this
Ward
- Unidata moving forward with solicitation (v3 + parallelization)
- Letter of collaboration? Josh: sure. Template → Logo → Good.
Martin: pangeo meeting pointed to a concern of there being no R implementation
- see
  
  https://github.com/zarr-developers/community/issues/18
- Dennis talking to someone at the R group. Thinking about using
  
  nczarr. (also for MATLAB)
- Contact has been made.
- Internal issue for tracking. Will try to move it more publicly.
Greg: preliminaries for V3 implementation
- https://github.com/zarr-developers/zarr-python/pull/789
  
  (BaseStore)
- https://github.com/zarr-developers/zarr-python/pull/839
  
  (Metadata handling)
- Josh: Integrating them into an 2.11 alpha release
- Greg update
Davis: re: dimension_separator travail
- Move keys from strings to tuple of strings
- Playing around with that. But breaks the ability to use
  
  dictionaries for storage
- i.e. perhaps using dict for storage is important
- Advantage of splitting prefix & chunk key is nice.
- How sacred is dictionary as a store?
- Josh: or a Key object?
- Dennis: nczarr uses tuples internally with on-the-fly conversion
  
  of tuple ot string.
  - Gotcha was converting from string to tuple (need heuristics)
  - Davis: when is that needed? in S3 you ask for all strings below a certain point. accessing this individually requires parsing them to tuples.
  - Josh: the “Hierarchy”-style with a cached test of .zarray existence would also work.
  - Dennis: that’s preferable.