2024-06-13
Attending: Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS)
TL;DR:
The meeting discussed the timing for moving ZEPs 1 & 2 from “Accepted” to “Final,” potential changes to ZEP1 related to v3 codec metadata and variable chunking, and considerations around implementing a chunk manifest with URL support.
Updates:
Meeting Minutes:
- discussion for a future meeting: when are ZEPs 1 & 2 no longer just “Accepted”
- see https://zarr.dev/zeps/accepted_zeps/
- when zarr-python v3 goes GA? some other time?
- Sanket: after “accepted” is “final” for non-process
- Davis: “active” isn’t connected in the flowchart
- Sanket: https://github.com/zarr-developers/zeps/pull/59
- Davis: defining ZEP with a ZEP seems problematic
- Josh: certainly not necessary, but that requires “Yet Another Document”
- Sanket: also definitely made a mistake of not taking into account the changes of ZEP0000
- Davis: not a lot of ZEPs. writing the ZEP wasn’t a good use of my time.
- jeremy joins
- ZEP1 changes (“bugs”)
- Davis: v3 codec metadata is cumbersome. could be json metadata rather than a list
- could do this backwards compatible
- as ZEP? Josh: suggest an issue first (like implicit group) then we can discuss
- Jeremy: less likely to go in. need a high benefit (existing data out there, churn, etc.)
- Davis: would argue that this is a wart in the spec and good to document that.
- Davis: clarify relationship between the v3 spec and the codecs
- current spec document is inconsistent
- may impact implementations
- Jeremy: intention was that whether in specs or in the “codecs” that there is a definition. i.e., no problem there and probably an editorial change.
- Davis: variable chunking. extension defines a place to define the chunking (
name=regular
i.e. rectilinear)- minor version incremement that just uses variable chunks (“easier”?)
- on the implementations, if there’s only one it’s easier in the long-run
- Jeremy: that’s probably how the implementation will work. but hard to know if there are other types of chunking in the future. possibly geospatial.
- Davis: propose not supporting the old version (one list of chunk sizes)
- Jeremy: don’t think that’s workable (to always require full); but for each dimension, to allow an integer or a list. ok to have the identifier and not a lot of work to convert.
- Jeremy: chunk manifest (tabled)
- likely makes it necessary to have URL support
- https://github.com/zarr-developers/zarr-specs/issues/287
- examples use
s3://...
- Josh: concerned that it’s bigger than Zarr
- other things: fsspec, intake, …
- Davis: is this another way to do sharding? pros / cons to the codec approach? (serves same perhaps as the shard header)
- Jeremy: except there’s the binary / plain split.
- Davis: just that there’s more than one way to do something
- Jeremy: not unusual that a complicated system has more than one way to do something
- Davis: decision point for people to make. Not something we had in zarr v2
- Jeremy: use shard if you’re writing it; use manifest if you have some stuff
- Josh: also you could potentially choose to put a manifest in front of a (old) sharded
- Davis: v3 codec metadata is cumbersome. could be json metadata rather than a list