9th February, 2022
Attending: Jonathan Striebel, Davis Bennett, Eric Perlman, Josh Moore (JAM), Sanket Verma, Jeremy Maitin-Shepard, Hailey Johnson, Erik Welch (Anaconda → NVIDIA), John Kirkham, Dennis Heimbigner, Dave Mellert, Greggory Lee, Matt McCormick
-
Davis: gdoc → hackmd? Sure!
- Sanket: suggestions
-
Sanket:
-
updated webpage, blog update
-
-
Intros & various links here
-
Erik: Mr. Sparse
-
….
-
-
Jeremy:
- C(++) implementations: quick report to Dennis from
- v3 support for non-zero origin
(issue)
- Dennis: would attribute specifying the origin be enough? have benefited from cfconventions which standardize the meaning of attributes (in atmospheric/related domains)
- JMS: benefit of allowing indexing to be affected.
- DH: second system syndrome problem (link)
- DB: see the utility (working with cutouts) but the workflow puts us outside what’s in the core spec. zarr array shouldn’t have metadata that references another array. perhaps a nice formalization of transforms, since it defines a coordinate space.
- JMS: julia has offset arrays (numpy is always 0-origin’ed)
- DB: meaning of 0-origin is that there’s a coordinate space
- EP: like the functionality, but it can be at a different level.
- DH: specify the array it came from as well as the origin?
- zaJMS: have translate to origin
- DB: in xarray use piece of data to coordinate-aware indexing, two methods of getting into an array.
- JAM: prioritizing the many recent spec proposals
- DB: suggest talking to other consumers and see what they want.
- C(++) implementations: quick report to Dennis from
-
Addition of sharding in spec v3 (issue/py → issue/spec)
- Status: 2 prototype PRs (plus a [*minimal
implementation*](https://github.com/alimanfoo/zarrita/pull/40) in zarrita)
-
Moving forward with v3 would be fine.
- Try a PR as a on the v3 spec for a translation layer (i.e.
generically)
- sharding, checksumming, IPFS, etc.
-
JMS: don’t see the relationship between sharding & checksumming
- JAM: due to content-addressable storage
- DH: just as a compressor/filter that attaches the checksum
- JS: partial read would need to be handled somewhere that’s not in the compression
- JAM: kerchunk API of `key → (uri, offset, length)`
- JMS: for the write path it is more complicated
- DH: makes me nervous when we worry about limitations of the
underlying store w.r.t to the specification. spec should be storage-agnostic.
- DH: v2 is agnostic of relative location of chunk/metadata are laid out on disc. Don’t need to be “next” to one another. This is introducing pieces and that they are together.
- Be sure you want to get rid of the independence property
- JK: more simple way of describing it. (An ordering).
- DH: the proposal needs to specify the relationships between chunks that are supported.
- DB: agreed complicated but 100% worth it for some domains.
-
re-writing methods?
-
disallowed
-
only if uncompressed?
- JS: not yet, but doesn’t currently exist for simplicity
-
rewrite index
-
- Status: 2 prototype PRs (plus a [*minimal
-
v3 (Greg)
- In terms of the dtypes supported, I have not worked on those
protocol extensions related to that. Is that something I should spend time on? The other thing I could do is make a WIP PR to Dask and Xarray with minimal changes for how they could support v3 as currently implemented in that branch.
- In terms of the dtypes supported, I have not worked on those
-
‘zarr_implementations’: remote implementations: http/s3/etc
-
good point
-
EP to create an issue
-
JAM:
-
-
Tabled
- xarray:
dataclasses / datatree (issue) - (if Matt McCormick shows up)
- xarray: