2024-06-13

Attending: Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS)

TL;DR:

The meeting discussed the timing for moving ZEPs 1 & 2 from “Accepted” to “Final,” potential changes to ZEP1 related to v3 codec metadata and variable chunking, and considerations around implementing a chunk manifest with URL support.

Updates:

Meeting Minutes:

  • discussion for a future meeting: when are ZEPs 1 & 2 no longer just “Accepted”
    • see https://zarr.dev/zeps/accepted_zeps/
    • when zarr-python v3 goes GA? some other time?
    • Sanket: after “accepted” is “final” for non-process
    • Davis: “active” isn’t connected in the flowchart
    • Sanket: https://github.com/zarr-developers/zeps/pull/59
    • Davis: defining ZEP with a ZEP seems problematic
    • Josh: certainly not necessary, but that requires “Yet Another Document”
    • Sanket: also definitely made a mistake of not taking into account the changes of ZEP0000
    • Davis: not a lot of ZEPs. writing the ZEP wasn’t a good use of my time.
  • jeremy joins
  • ZEP1 changes (“bugs”)
    • Davis: v3 codec metadata is cumbersome. could be json metadata rather than a list
      • could do this backwards compatible
      • as ZEP? Josh: suggest an issue first (like implicit group) then we can discuss
      • Jeremy: less likely to go in. need a high benefit (existing data out there, churn, etc.)
      • Davis: would argue that this is a wart in the spec and good to document that.
    • Davis: clarify relationship between the v3 spec and the codecs
      • current spec document is inconsistent
      • may impact implementations
      • Jeremy: intention was that whether in specs or in the “codecs” that there is a definition. i.e., no problem there and probably an editorial change.
    • Davis: variable chunking. extension defines a place to define the chunking (name=regular i.e. rectilinear)
      • minor version incremement that just uses variable chunks (“easier”?)
      • on the implementations, if there’s only one it’s easier in the long-run
      • Jeremy: that’s probably how the implementation will work. but hard to know if there are other types of chunking in the future. possibly geospatial.
      • Davis: propose not supporting the old version (one list of chunk sizes)
      • Jeremy: don’t think that’s workable (to always require full); but for each dimension, to allow an integer or a list. ok to have the identifier and not a lot of work to convert.
    • Jeremy: chunk manifest (tabled)
      • likely makes it necessary to have URL support
      • https://github.com/zarr-developers/zarr-specs/issues/287
      • examples use s3://...
      • Josh: concerned that it’s bigger than Zarr
        • other things: fsspec, intake, …
      • Davis: is this another way to do sharding? pros / cons to the codec approach? (serves same perhaps as the shard header)
      • Jeremy: except there’s the binary / plain split.
      • Davis: just that there’s more than one way to do something
      • Jeremy: not unusual that a complicated system has more than one way to do something
      • Davis: decision point for people to make. Not something we had in zarr v2
      • Jeremy: use shard if you’re writing it; use manifest if you have some stuff
      • Josh: also you could potentially choose to put a manifest in front of a (old) sharded