2022-10-19

Attending: Sanket Verma (SV), Eric Perlman (EP), Erik Welch (EW), Hailey Johnson (HJ), Ward Fisher (WF), Jonathon Striebel (JS), Martin Durant (MD)

TL;DR:

The Outreachy contribution phase has started, and the contribution PRs are coming in hard! JS eagerly seeks feedback on storage transformers PR; check here. There was a visible concern from the Zarr Community around the progress of ZEP0001. SV assured everyone that he’d communicate with the author and resolve any pending issues to move forward as quickly as possible. After that, there was a discussion on ZEP0003 (authored by Martin and Isaac).

Updates:

Agenda:

  • JS: Made 2 PR for storage transfomers
  • SV
    • Working with Alistair to get the feedback
    • Moving things forward ASAP taking in account the feedback and comments on V3 PR
  • MD: https://github.com/fsspec/kerchunk/pull/237
    • JMS: What functionality is not in V2 that’s in V3?
    • MD: Zarr chunks can be made by concatenating - storage layer can be used to do this - similar to sharding storage layer - sharding is done at some levels
    • Kerchunk would really like to do this
  • JS: Sharding has no problem (in principle)
  • JMS: Maybe create a fork zarr-specs repo
    • Better way to work forward and make progress
    • JS: Opening PR against the repo is fine
    • MD: lot of changes would be hard to merge back in the main PR
    • JS: Don’t personally mind of merge things - ZIC has to decide
    • JMS: Valuable to make incremental changes
    • MD: JS can make changes to zarr-python
  • JK: Make PR on Alistair’s PR and merge them
    • MD: Sounds good!
  • JMS: Make a place/fork and merge things over there and have working draft
  • SV: Try to resolve things with Alistair and set a plan to move forward
  • JS: Triage the PR which are on Alistair’s fork
  • JK: Have a single place to have all the discussion to and not diverge things
  • MD: wants to have variable length chunking
    • It’s a draft for now
    • Edit it before merging if it doesn’t makes sense
  • JMS: no objections to Martin’s proposal
    • V2 is obvious and V3 has multiple chunk grid types - not example of having multiple chunk types
    • MD: complete not separate
    • JMS: do we want to have those special types?
    • MD: can’t imagine how easy to put up these things - can have overlapping thing - values can be duplicated over multiple chunks - different things can over relate like a surface of sphere - Dask may have a reference implementation for Martin’s ZEP -
    • MD: Do you have partial chunk?
    • JK: How would the append work on this ZEP?
    • JMS: Chunk should not size after it’s created - if you append you should append the same size chunk
    • MD: Merging and joining chunks would be easy to do - don’t really need to think about it for now - can rewrite a whole new array - ability to manipulate the chunks - once it’s in the spec we can work on the API
    • JMS: Hadn’t had use for variable size chunk
    • MD: streaming data - maybe in those cases you would want to have variable size chunks
    • JS: Address is same? MD: Right!
    • JS: may not have use for variable size chunks - current complexity is enough - other cases: move data, resize your chunk and write that chunk
    • MD: or delete an index or 0
    • JMS: Don’t want to implement non-zero index
    • MD: now have a way to do it! Voila!
    • JMS: We’re using a special structure at Google for non-zero origin - boxloses or something
    • JS: If you use Zarr API
    • EP: Using Jeremy’s implementation and then using Zarr’s metadata - can use same indexing for arrays
    • JS: Fair for many things - don’t know a lot bout offset - you start at 0, 0, 0 - can ignore
    • EP: involved with OME data and can fit into other use cases as well