2023-03-09

Attending: Hailiang Zhang (HZ), Dieu My Nguyen (DMN), Josh Moore (JM), Ryan Abernathey (RA), Jeremy Maitin-Shepard (JMS)

TL;DR:

The meeting started with a discussion on Sharding as a transformer vs codec. Sharding is implemented as a transformer, and then we discussed the pros and cons of implementing Sharding as a codec. JMS had briefly discussed #219. After this, HZ gave a brief on ZEP0005 but couldn’t present his slides due to Zoom/screen sharing issues - tabled for the next community meeting.

Meeting Minutes:

  • JMS: sharding as transform vs. codec
    • https://github.com/zarr-developers/zarr-specs/issues/220
    • RA: been playing with sharding, trying to get the most out of it
      • Key question (impl or spec?): if sharding is a codec how does the outer layer which range it wants? (general problem in zarr-python for blosc)
        • requires passing context to context
        • transform is explicit; codec is less clear
    • JMS: in zarrita he has the codec take an indexing expression (optional?). defer some of that for ZEP2? arrays vs. bytes vs. additional concept of arrays.
    • RA: similar to Martin’s request
    • JMS: first codec is fine, but the next one is less clear. need to be more explicit about the interface.
    • RA: need to solve this. what information needs to be passed in between (implementors and at spec level)
    • RA: e.g. could be a codec that takes an HDF5 file (blosc2, etc.) missed a chance to build the right abstractions there.
    • JMS: codecs := array|bytestream in; array|bytestream out
    • JM: recursive zarrs all the way down?
    • JMS: concatenation of other arrays
    • RA: Norman’s justification. JMS’ proposal. re: how to integrate other things like referncing between arrays, shards defining own chunking, etc. (doesn’t change anything in ZEP1)
    • JMS: transforms as bytes, and codecs can access arrays
    • JMS: NB: MD wants low level store to be aware of array indexing
    • JM: always thought of codecs as the lowest thing that is unaware of arrasy
    • JMS: combined compression with filters (which can operate on arrays, transpose)
    • RA: sharding fundamentally breaks core abstraction between store / codec. at the impl. level, want an efficient/fast code to fetch chunks of shard, make smart decisions, close to the metal. but the naive thing isn’t fast. do the core abstractions break down. no longer using key/value store API. using offsets into storage.
    • JMS: don’t see byte range as breaking. addition to the interface.
    • RA: not just a file format, but a protocol for addressing chunks.
  • JMS: dimension names metadata
  • HZ/DMN: ZEP5 presentation (recorded)
    • https://zarr.dev/zeps/draft/ZEP0005.html
    • HZ: Tabling because of Zoom issues.
    • RA: re: expectations – very limited due to the numbers of people working on the spec. (it’s taken years) so … 6 months?
    • HZ: this is an extension, doesn’t blocking anything.