2024-06-13

Attending: Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS)

TL;DR:

The meeting discussed the timing for moving ZEPs 1 & 2 from “Accepted” to “Final,” potential changes to ZEP1 related to v3 codec metadata and variable chunking, and considerations around implementing a chunk manifest with URL support.

Updates:

Meeting Minutes:

discussion for a future meeting: when are ZEPs 1 & 2 no longer just “Accepted”
- see https://zarr.dev/zeps/accepted_zeps/
- when zarr-python v3 goes GA? some other time?
- Sanket: after “accepted” is “final” for non-process
- Davis: “active” isn’t connected in the flowchart
- Sanket: https://github.com/zarr-developers/zeps/pull/59
- Davis: defining ZEP with a ZEP seems problematic
- Josh: certainly not necessary, but that requires “Yet Another Document”
- Sanket: also definitely made a mistake of not taking into account the changes of ZEP0000
- Davis: not a lot of ZEPs. writing the ZEP wasn’t a good use of my time.
jeremy joins
ZEP1 changes (“bugs”)
- Davis: v3 codec metadata is cumbersome. could be json metadata rather than a list
  - could do this backwards compatible
  - as ZEP? Josh: suggest an issue first (like implicit group) then we can discuss
  - Jeremy: less likely to go in. need a high benefit (existing data out there, churn, etc.)
  - Davis: would argue that this is a wart in the spec and good to document that.
- Davis: clarify relationship between the v3 spec and the codecs
  - current spec document is inconsistent
  - may impact implementations
  - Jeremy: intention was that whether in specs or in the “codecs” that there is a definition. i.e., no problem there and probably an editorial change.
- Davis: variable chunking. extension defines a place to define the chunking (name=regular i.e. rectilinear)
  - minor version incremement that just uses variable chunks (“easier”?)
  - on the implementations, if there’s only one it’s easier in the long-run
  - Jeremy: that’s probably how the implementation will work. but hard to know if there are other types of chunking in the future. possibly geospatial.
  - Davis: propose not supporting the old version (one list of chunk sizes)
  - Jeremy: don’t think that’s workable (to always require full); but for each dimension, to allow an integer or a list. ok to have the identifier and not a lot of work to convert.
- Jeremy: chunk manifest (tabled)
  - likely makes it necessary to have URL support
  - https://github.com/zarr-developers/zarr-specs/issues/287
  - examples use s3://...
  - Josh: concerned that it’s bigger than Zarr
    - other things: fsspec, intake, …
  - Davis: is this another way to do sharding? pros / cons to the codec approach? (serves same perhaps as the shard header)
  - Jeremy: except there’s the binary / plain split.
  - Davis: just that there’s more than one way to do something
  - Jeremy: not unusual that a complicated system has more than one way to do something
  - Davis: decision point for people to make. Not something we had in zarr v2
  - Jeremy: use shard if you’re writing it; use manifest if you have some stuff
  - Josh: also you could potentially choose to put a manifest in front of a (old) sharded