9th February, 2022

Attending: Jonathan Striebel, Davis Bennett, Eric Perlman, Josh Moore (JAM), Sanket Verma, Jeremy Maitin-Shepard, Hailey Johnson, Erik Welch (Anaconda → NVIDIA), John Kirkham, Dennis Heimbigner, Dave Mellert, Greggory Lee, Matt McCormick

  • Davis: gdoc → hackmd? Sure!

    • Sanket: suggestions
  • Sanket:

  • Intros & various links here

  • Jeremy:

    • C(++) implementations: quick report to Dennis from

      https://imagesc.zulipchat.com

    • v3 support for non-zero origin

      (issue)

      • Dennis: would attribute specifying the origin be enough? have benefited from cfconventions which standardize the meaning of attributes (in atmospheric/related domains)
      • JMS: benefit of allowing indexing to be affected.
      • DH: second system syndrome problem (link)
      • DB: see the utility (working with cutouts) but the workflow puts us outside what’s in the core spec. zarr array shouldn’t have metadata that references another array. perhaps a nice formalization of transforms, since it defines a coordinate space.
      • JMS: julia has offset arrays (numpy is always 0-origin’ed)
      • DB: meaning of 0-origin is that there’s a coordinate space
      • EP: like the functionality, but it can be at a different level.
      • DH: specify the array it came from as well as the origin?
      • zaJMS: have translate to origin
      • DB: in xarray use piece of data to coordinate-aware indexing, two methods of getting into an array.
      • JAM: prioritizing the many recent spec proposals
      • DB: suggest talking to other consumers and see what they want.
  • Addition of sharding in spec v3 (issue/py → issue/spec)

    • Status: 2 prototype PRs (plus a [*minimal

      implementation*](https://github.com/alimanfoo/zarrita/pull/40) in zarrita)

    • Moving forward with v3 would be fine.

    • Try a PR as a on the v3 spec for a translation layer (i.e.

      generically)

      • sharding, checksumming, IPFS, etc.
    • JMS: don’t see the relationship between sharding & checksumming

      • JAM: due to content-addressable storage
      • DH: just as a compressor/filter that attaches the checksum
      • JS: partial read would need to be handled somewhere that’s not in the compression
      • JAM: kerchunk API of `key → (uri, offset, length)`
      • JMS: for the write path it is more complicated
    • DH: makes me nervous when we worry about limitations of the

      underlying store w.r.t to the specification. spec should be storage-agnostic.

      • DH: v2 is agnostic of relative location of chunk/metadata are laid out on disc. Don’t need to be “next” to one another. This is introducing pieces and that they are together.
      • Be sure you want to get rid of the independence property
      • JK: more simple way of describing it. (An ordering).
      • DH: the proposal needs to specify the relationships between chunks that are supported.
      • DB: agreed complicated but 100% worth it for some domains.
    • re-writing methods?

      • disallowed

      • only if uncompressed?

        • JS: not yet, but doesn’t currently exist for simplicity
      • rewrite index

  • v3 (Greg)

    • In terms of the dtypes supported, I have not worked on those

      protocol extensions related to that. Is that something I should spend time on? The other thing I could do is make a WIP PR to Dask and Xarray with minimal changes for how they could support v3 as currently implemented in that branch.

  • zarr_implementations’: remote implementations: http/s3/etc

    • good point

    • EP to create an issue

    • JAM:

  • Tabled