22nd September, 2021

Attending: Josh Moore, Norman Rzepka, Erik Welch, Ward Fisher, John Kirkham, Greg Lee, Hailey Johnson

2.10 released 🎉
Josh: https://www.outreachy.org/ draft proposals → community issue (& twitter) soon
Norman (if he can make it): Sharded chunk storage
- from: https://scalableminds.com/
- built: https://webknossos.org/ for
  
  collab. looking at large 3D volumes
- presented: Slides on sharding
  - Caterva → Webknossos
  - Chunk file → Shard
  - Blocks → Chunks
- about: Format implementation
  
  (similar to neuroglancer)
- Interested in adopting Zarr as a primary format but need
  
  something like shards for the same performance in streaming the data while storing on a cluster system.
- Looking for how to get involved, spec it out, etc.
- John: familiar with Caterva
  
  (#713)? Read up about it
- Josh: cloud IO problem. Any solution? No, only on
  
  (local/distributed) filesystem
  - Neuroglancer access buckets directly (Python for disk)
  - Only need the file URI.
- Conversation on Caterva
  
  (15.Sep)
- Neuroglancer: Uses range queries to read the index (size known
  
  beforehand). Cached locally. Another range query to pick out the chunk data.
- Languages X Cloud providers+FS
- John: Raised an fsspec issue for support of range queries.
  
  Didn’t see existing support. So hopefully we can discuss more there:
  - https://github.com/intake/filesystem_spec/issues/766
Ward: V3 questions
- Meeting @ Unidata. NSF solicitation. Focus on continued
  
  maintainability of s/w.
- Adoption of V3 into netcdf could play a part of that.
- Josh: if need be can put a stamp on it, otherwise waiting on
  
  Alistair
  - However people are starting to code against it.
- Ward: have a couple of months before proposals are due.
- Greg: looked at xarray & dask test suites with v3
  - for dasks, it works but you need to specify “to_zarr(component=’’)”. None is ok for V2.
  - for xarray, lot of dtypes in their tests that aren’t supported (need to write to the metadata). complex, datatype, structured, object arrays are all in zarr tests. xarray has unicode and byte ← need adding.
  - Josh: possible “low hanging fruit” or “needs help” issue for outreachy contributors
  - see: https://github.com/zarr-developers/zarr-python/pull/789
Erik: “have a plan” (for sparse data in zarr and non-zarr)
- Just presented/led a discussion about designing sparse file
  
  format at HPEC
- https://docs.google.com/presentation/d/e/2PACX-1vRFN9gy5Rexzge63kxwakZvQqk1zVlWUjqPtPNXDlllP7jaZ_uQ9nD46yDfADEpMRfnQrnS4p3egaQH/pub