2022-09-08
Attending: Sanket Verma (SV), Josh Moore (JM), Ward Fisher (WF), Jonathan Striebel (JS), Norman Rzepka (NR), Ryan Abernathey (RA), Dennis Heimbigner (DH), Jeremy Maitin-Shepard (JMS)
TL;DR
This was the first ZEP meeting ever. Representatives from the Zarr Implementations Council joined the meeting. Most discussions revolved around ZEP1. One of the critical decisions about ZEP1 was to accept it ‘Provisionally’ and move forward with the implementation in various programming languages.
Updates:
- SV: open (“draft”) ZEPs:
- SV: Author discussion on all the comments on ZEP0001
- Have proposed resolutions for a number of those
- Meeting again tomorrow to try to finish the list
- https://hackmd.io/sOos8rxrRvKCJPbbUKWtwA?view
- SV: Critically propose marking ZEP0001 as “provisionally accepted” after the above are handled and passed by the ZIC
- Implementations are free (and encouraged!) to start implementing.
- Any blocking changes could still be handled.
- Otherwise, “feature freeze”.
- Feedback?
- RA: process to get to provisionally accepted
- SV: draft == under review.
- on vote, can move to provisionally or accepted state.
- once implemented, moves to final.
- could move to “deferred” state if the ZIC vetoes
- WF: “ready to implement” jumped out (and caused anxiety but only since there’s too much to do)
- https://zarr.dev/zeps/active/ZEP0000.html#review-and-resolution
- JMS: no substantial changes since early draft
- JM: editors are preparing a rebuttal (Alistair’s paper model)
- JMS: not sure a paper model is best
- RA: not in the sense that there’s only one round and someone will decide. iterative
- good to have authors who are organizing.
- now in revision and we can continue until everyone is happy
- gone slowly for various reasons (availability, summer, and it’s our first time & massive)
- would be useful to go through the outstanding issues
- JS: in this cycle and not limited iterations is just the limited time.
- but for now, trying to make batched changes
Meeting Minutes:
- JS: review of memory order decision number-16 from list
- zarr’s goal is interoperability. therefore propose to keep C & F (benefit for community)
- could support read only, even with a transpose (if too slow, add a warning?)
- JMS: agree. but would like an arbitrary permutation.
- DH: good use case?
- JMS: dimension that represents time. order you display to the user is logical for them but need not be logical for compression/access patterns.
- JM/JS: core or extension?
- RA: that’s a key question
- NR: re: backwards compatibility C/F is in V2 therefore that would need to be in core. but arbitrary could be an extension.
- RA: but v3 is a chance to break backwards compatibility (explicitly not a goal)
- NR: upgrade path? so be able to upgrade without re-writing the chunks.
- RA: v2 will still be supported.
- WF: that would be the hope, but worry about netcdf & archival – assuming software will support it without it being expressed somewhere. aspirational sure but makes us nervous.
- e.g. will future software implement the v2 standard?
- RA: transform based solution? (but only if we support F) if we say the chunks should be backwards compatibility.
- WF/DH: no one has ever asked for arbitrary. Someone at NOA asks for things that would help their lab. Technical debt. (Won’t even request a pull request) See the trap that the HDF group fell into (single-writer-multiple-reader, several orders of magnitude that they are trying to recover from.)
- JMS: arbitrary seems most natural. pass to
numpy.transpose()
- WF: shocked at the assertion that there wouldn’t be a migration path
- JM: clarification – were only differentiating if binary transformation is needed
- can add requirement to v3 that implementations read v2
- WF: requirement of netcdf. can decide if that’s a requirement.
- DH: depends if it’s alot. operational definition - “too painful to copy v2 to v3”
- (for RA): petabytes of data
- JS: RA proposed transformer strategy - essentially rewriting metadata formalize it?
- DH: how comfortable are you not supporting older version?
- JM: for OME, got agreement but that’s a layer higher
- DH: will there be new implementations without V3 support?
- NR: think there will be
- JM: but it’s so easy to implement
- WF: people won’t do that…what do we do if a popular implementation doesn’t support v2?
- other packages?
- RA: recommend storage layer / translation?
- JM: agreed but that’s SHOULD (versus MUST)
- JMS: only way to force it is a standardization
- JM: agreed, but we can only do what the spec document allows us (i.e. labeling something as “compliant”)
- JS: it’s a new major version and people know what we mean. (as a user, I wouldn’t expect support for v2 if an implementation says “v3”)
- WF: convinced myself I’m worrying too much instead:
- WF: in 18 months how do you know which Zarr is used to open it.
- JM: metadata file is different (essentially the magic number). The proposal for
.zr3
was currently turned down.
- JM: metadata file is different (essentially the magic number). The proposal for
- SV: data type naming
- JM: dropping the python-ness
- JMS: helps provide a more nature scheme for some datatypes (and endianness as a codec)
- no argument against (just “convenient in Python”)
- JM: will need names
- JMS: in https://github.com/zarr-developers/zarr-specs/pull/155
- DH: netcdf ncchar type equivalent to 8-bit ascii, no equivalent in Zarr. Needed? NC uses it all the time. Why not in numpy?
- JMS: thought numpy has char.
- RA: revisit char question? JMS: different than varstring
- RA: where does the encoding go? DH: in an attribute. “ascii” (or “utf-8”)
- RA: used for? DH: you see a lot of flags stored that way.
- also historical: NC-3 didn’t have strings of any type. (arrays of chars workaround)
- JM: extension mechanism?
- DH: where the wheel hits the road
- JMS: just metadata?
- RA: disagree, influences …
- JMS: agreed, but doesn’t change how hard it is to implement
- JM: but need to feel confident that they are low cost so we can change when we discuss these things
- JMS: will changes start appearing?
- JM: Very soon!