2022-03-09

Attending: Josh Moore (JM), Dennis Heimbigner (DH), Alan Liddell (AL), Dieu My Nguyen (DMN), Davis Bennett (DB), Brianna Pagán (BP), Isaac Virshup (IV), Sanket Verma (SV), Martin Durant (MD), Jeremy Maitin-Shepard (JMS)

TL;DR:

SV started the meeting by announcing that the Outreachy phase was a success! 🚀 After this, we briefly discussed why sharding should be codec, which JMS initiated. Then, DMN and BP had some questions about how you can continuously update your Zarr stores; everyone from the community has good insights on that. Next, DB showcased what he has been working on for some time, and IV brought up the discussion on strings as codecs or types. Lastly, MD informed everyone about his recent PR to the numcodecs repo and showcased his work on Rust Python FileSystem.

Updates:

  • Outreachy ended on 3/9
  • ZEP0001 entering the review period soon!
  • Sharding to a codec (JMS)
    • MD: push it down to the storage layer
    • IV: Documentation for transformers
    • SV: open issue with purl URLS

Open Agenda (add here 👇🏻):

  • Slow but some progress on geozarr spec (BP)
  • Anyone have contacts we (NASA) can chat with on how continuously updated zarr stores are handled? i.e. someone in the middle of making a calculation on a zarr store, but zarr store is updated… best practices? https://github.com/CCI-Tools/zarr-cache (BP + DMN)
  • DB: example of pydantic classes for OME-NGFF
  • IV: strings as codecs or dtypes
    • JM: v2 issue is because lack of extension points
    • JMS: implementation issue?
    • IV: potentially meta-array? object arrays as they are now
    • IV: dtype as the final codec (the buffer in memory)
    • JMS: different for sparse data types (when/if)
    • IV: sparse arrays aren’t compatible with v3 data type system (and that’s fine)
    • JMS: saw having each chunk a sparse array
    • IV: may over-complicate what zarr is. probably going to do sparse array == zarr group
    • how to represent a sparse array is hard to agree on
    • JMS: was attempting to move v3 towards dtype being more of an abstract representation
    • IV: what gets written per dtype with no compression
      • JMS: default for integer is little-endian
      • IV: on big-endian machine would want to skip the final codec
      • JMS: if you have an array after codecs then you apply default codec for dtype
  • MD: Added a PR in numcodecs: https://github.com/zarr-developers/numcodecs/pull/422