2022-12-01
Attending: Sanket Verma (SV), Josh Moore (JM), Jonathan Striebel (JS), Jeremy Maitin-Shepard (JMS), Ward Fisher (WF), Dennis Heimbigner (DH), Ryan Abernathey (RA)
TL;DR:
RA started a discussion on drop /meta
prefix. See the issue here, which basically led to chain reaction of several conversations around topics related to each other. These discussions are mostly around some lingering issues around the finalisation of Zarr V3 spec.
RA, JMS and JS took some action items which can be seen at the bottom
Updates:
- Conversations (issues and feedback) on ZEP1 PR are now resolved. Check this. Thanks to Jonathan Striebel! 🙌🏻
- The conversations which needs additional input have been moved to separate issues
- Jeremy Maitin-Shepard promoted as one of the authors for ZEP0001
- Current status of ZEP1 can be viewed here
Meeting minutes:
- RA: suggest focusing on the meta/ prefix discussion
- https://github.com/zarr-developers/zarr-specs/issues/177
- JMS: not sure it’s solving a problem (optimally). nice feature of v2 is copying out an array
- JS: was for performance, use exclusion mechanism
- RA: never need to list chunks (even if implementations do…)
- NB: don’t like trying to open files to know things (404-based)
- JM: so we all agree? Yes. But what’s the default?
- RA: suggest: drop meta, use .json on the array
- Can then drop the root metadata?
- DH: there is dataset-level metadata (superblock)
- JS: discuss those separately?
- Agreed
- JS: so to that suggestion, how do you list all metadata?
- RA: don’t think we should plan for discovering all metadata (millions of arrays)
- JS:
- RA: listing recursively isn’t ok?
- JS: not with implicit groups
- RA: use storage transformer to get the previous behavior
- RA: data is out there so need to provide a mechanism
- DH: don’t think that’s fair
- RA: nice feature of this proposal if we could keep it.
- JM: how is conslidated metadata related?
- RA: that’s another problem.
- RA: had thought about explicitly list the children (stac catalog)
- DH: nczarr does that as well.
- RA: downside is the concurrency issue
- JS: good extension for groups (listing children)
- JS: but could also have consolidated per group
- JM: different commnunities here. some are definitely asking for listability
- DH: not lots of formats that are listable without tools. They are asking for something powerful.
- RA: so that’s the root feature of the separate hierarchies
- RA: we should look at some data. offer to write a script
- JS one other alternative: chunks in an extra subfolder
- JMS: k/v versus directory based will have different performance behavior
- RA: does this require us to give up on implicit groups?
- summary: foo/array.json and foo/chunks/0/0
- JMS: re: dropping of root metadata
- solution is perhaps storage transformer would need to write something in array metadata
- JS: but then everything in the array metadata and you’d need be able to read it
- JMS: don’t always need to access the metadata directly. more like a safety measure.
- JMS: could copy “global” metadata into each array
- JM: that works to the design goal of being able to freely copy an array
- DH: that assumes no global extensions.
- RA: need to think through this separately:
- portability (“invariance” of v2 that any group is standalone)
- RA: danger is that you can build zarrs through the command-line (need no library)
- DH: sure you want to do that?
- RA: it was at least part of the design of V2
- DH: as a principle, if that’s what you want it’s really critical to the v3 process
- DH: cf. intellij – fiddling in text file then going back (bypassing the tool)
- JM: perhaps CLI manipulations are inherently “extension-less” and therefore this is “safe”
- DH: one tool being a verification tool?
- JS: consolidated metadata as an example
- RA: primarily used as a way to allow easy listing
- but you know that you can’t touch the store. JS: you need a root to do fancy things
- action items:
- RA to do some performance benchmarking
- JMS to propose a new storage layout for v3
- next time: root metadata discussion issue (JS)