2022-11-30

Attending: Josh Moore (JM; aikido/books/coffee), John Kirkham (JK; hiking, reading), Sanket Verma (SV; video games & pixel art), Dennis Heimbigner (DH; drawing/guitar), Eric Perlman (EP; travel/bikes/trains), Martin Durant (MD; who has the time), Norman Rzepka (NR; kids & cooking), Ward Fisher (WF; video games/3d printers/wood & leather working), Isaac Virshup (IV; cooking < eating, bouldering), Davis Bennett (DB; birds & computer games), Brianna Pagán (BP; ultra running & my doggo) and welcoming: Dieu My Nguyen (DMN; NASA, replacing Brianna; CS in computational bio; travel, house plants, hiking)

TL;DR:

We have a new community member who joined today: Dieu My Nguyen from NASA. Thanks for attending the community meeting Dieu! 🙌🏻

Two Outreachy interns have been selected to work with Zarr for the next 3 months. BP asked if anyone is working on partial Zarr stores. After that, DB raised some opinions on the zarr-python slicing API, and lastly DH had some questions related to the V2 spec.

Updates:

Meeting Minutes:

  • BP: Question around if anyone is doing work with partial zarr stores or live archives in zarrs.
    • Growing in at least one dimension
    • Good chunking strategies but not
    • Pushing updates
    • MD: Zarr has been sort of archivey. Good to talk about “append, append, append”. Updating data is less common. See Ryan’s strategy.
    • BP: hackathon during AGU. Hub in LA if anyone wants to join.
    • DB: if you have a well-defined shape then it should be ok. Chunks are never partial on disk. (At least that’s not variable)
      • Changing origin would be expensive. (Rename everything)
    • JK: if you are only using part of a chunk, it will fill with fill_value. Resizing could lead to zeroes. (MD: yes, recent shock)
    • IV: ZEP3? BP: Yes. Should be contributing. Have done some work internally.
  • DB: zarr-python slicing API: aged well? alternatives?
    • critique points:
      • process of adding static types to PRs led to looking at the slicing API
      • https://github.com/zarr-developers/zarr-python/blob/main/zarr/indexing.py
      • numpy has expressive slicing (integer, tuple, slice object, tuple of slice object, tuple of arrays)
      • in Zarr, ever type in the polymorphism is a class. no interface in common
      • spit out 2 things: number of chunks and with same arity set of operations to do on them
        • maybe three levels of indexing, but perhaps order of operations depends on the compressor
      • MD: looked at numpy’s implementations or is it in C.
      • DB: assumed they were encumbered by organic growth
      • MD: likely, but lots of testing. Also: normalizing to in-memory indexing (set of) then might be just one implementation
      • DB: then imagined a package that solved the problem which could be shared with napari, etc.
      • DH: looked at HDF5? (difficult) DB: maybe h5py since they handle the polymorphism.
      • JM: possibly https://pypi.org/project/h5py-like/ too
      • JK, dask array slicing? np slicing does a bunch of stuff. led to vindex and oindex.
      • IV: views on arrays in Julia is really nice. JK: multidispatch helps (IV: and n-dim in the lang.)
      • JK: there was a performance issue on slicing, now fixed, but likely just a subset.
  • DH: questions about v2.
    • feedback for JS about the current nczarr extensions
    • shape of an array isn’t fixed? stated anywhere (JM: explained recent issues)
    • scalars (arrays of rank 0) are ok?
    • char type (in numpy)? (like a scalar string)
    • IV: strings? DH: fixed length is in, but varlength proposed and close to varlen arrays.