25th August, 2021

Attendees: Josh Moore, John K., Eric Perlman, Ward, Hailey, Greg Lee, Ryan A.

Josh: zarr-python 2.9 (without Python 3.6)
- (The peasants rejoice)
Greg & V3: going back to store PR (based on newly merged PRs)
- recursive imports, etc. Needs some fixing up.
- Zarr test suite is largely working on v3 (fixed with changing
  
  the defaults)
- Edge case of treating root/data/ as a group
- (PRs to be split up logically for opening)
- Majority of uses are: zarr.open, dask.from_zarr, xarray, etc.
- Ryan: less painful than Python 2 to 3 …
  - Use configuration options for xarray at least
- Ryan: also consolidated metadata is used a lot
  - Boosted cloud performance strongly!
  - https://github.com/zarr-developers/zarr-specs/issues/113
- Greg: also some tests are using datetime, but that’s not in the
  
  base spec.
Ryan & Filecoin/IPFS
- protocol labs reached out to pangeo
  
  (for climate data)
- “distributed content-addressable storage layer for the internet”
- all data is identified with a hash; decentralized
- filecoin incentives the miners to
  
  provide public storage
- potential alternative to S3 for internet-scale storage
- creating fsspec interface to IPFS. should “just work”
- https://github.com/zarr-developers/zarr-specs/issues/82
Ryan & Caterva
- two levels of chunking like tiledb
- looking into slicing into along n-dimensions.
- https://github.com/zarr-developers/zarr-python/issues/713#issuecomment-903755601
- Need a numcodecs wrapper (C-API)
- No explicit cloud solution
- Eric: similar to sharding in neuroglancer
- Ryan: could imagine (in 2 years) where this is the default codec
  
  (builds on blosc)
- Eric: if you throw out general nature, like append-only, then
  
  you can optimize.
- cf.
  
  https://blosc.github.io/caterva-scipy21/#/14
- Eric: blosc-1 v. blosc-2? John: e.g. 32-bit to 64-bit breaking
  
  changes
- Josh: identifying the changes (also for IPFS) that Zarr needs to
  
  add
- John: invite Francesc to this meeting
- Ryan: need to update numcodecs interface that we want a slice
  
  from an ND array. (needs understanding of caterva, blosc, cython, …)
- John: implement chunk store (?) on top of caterva and turn off
  
  filtering, …
  - “Smarter chunk for Zarr” (deprecating partial selection, etc.)
  - Ryan: instead of reading bytes, decompressing, co-oercing to numpy array, caterva array would be our chunk.
    - might need to coerce less eagerly.
    - then slicing could propagate through.
- John: technical problem first then the social problem.
Ryan & hierarchical support in xarray
- https://github.com/pydata/xarray/issues/4118#issuecomment-873179375
- Tom Nicholas has something working
- Good time to check in and see what’s going on.