22nd September, 2021

Attending: Josh Moore, Norman Rzepka, Erik Welch, Ward Fisher, John Kirkham, Greg Lee, Hailey Johnson

  • 2.10 released 🎉

  • Josh: https://www.outreachy.org/ draft proposals → community issue (& twitter) soon

  • Norman (if he can make it): Sharded chunk storage

    • from: https://scalableminds.com/

    • built: https://webknossos.org/ for

      collab. looking at large 3D volumes

    • presented: Slides on sharding

      • Caterva → Webknossos
      • Chunk file → Shard
      • Blocks → Chunks
    • about: Format implementation

      (similar to neuroglancer)

    • Interested in adopting Zarr as a primary format but need

      something like shards for the same performance in streaming the data while storing on a cluster system.

    • Looking for how to get involved, spec it out, etc.

    • John: familiar with Caterva

      (#713)? Read up about it

    • Josh: cloud IO problem. Any solution? No, only on

      (local/distributed) filesystem

      • Neuroglancer access buckets directly (Python for disk)
      • Only need the file URI.
    • Conversation on Caterva

      (15.Sep)

    • Neuroglancer: Uses range queries to read the index (size known

      beforehand). Cached locally. Another range query to pick out the chunk data.

    • Languages X Cloud providers+FS

    • John: Raised an fsspec issue for support of range queries.

      Didn’t see existing support. So hopefully we can discuss more there:

  • Ward: V3 questions

    • Meeting @ Unidata. NSF solicitation. Focus on continued

      maintainability of s/w.

    • Adoption of V3 into netcdf could play a part of that.

    • Josh: if need be can put a stamp on it, otherwise waiting on

      Alistair

      • However people are starting to code against it.
    • Ward: have a couple of months before proposals are due.

    • Greg: looked at xarray & dask test suites with v3

      • for dasks, it works but you need to specify “to_zarr(component=’’)”. None is ok for V2.
      • for xarray, lot of dtypes in their tests that aren’t supported (need to write to the metadata). complex, datatype, structured, object arrays are all in zarr tests. xarray has unicode and byte ← need adding.
      • Josh: possible “low hanging fruit” or “needs help” issue for outreachy contributors
      • see: https://github.com/zarr-developers/zarr-python/pull/789
  • Erik: “have a plan” (for sparse data in zarr and non-zarr)