2022-11-30
Attending: Josh Moore (JM; aikido/books/coffee), John Kirkham (JK; hiking, reading), Sanket Verma (SV; video games & pixel art), Dennis Heimbigner (DH; drawing/guitar), Eric Perlman (EP; travel/bikes/trains), Martin Durant (MD; who has the time), Norman Rzepka (NR; kids & cooking), Ward Fisher (WF; video games/3d printers/wood & leather working), Isaac Virshup (IV; cooking < eating, bouldering), Davis Bennett (DB; birds & computer games), Brianna Pagán (BP; ultra running & my doggo) and welcoming: Dieu My Nguyen (DMN; NASA, replacing Brianna; CS in computational bio; travel, house plants, hiking)
TL;DR:
We have a new community member who joined today: Dieu My Nguyen from NASA. Thanks for attending the community meeting Dieu! 🙌🏻
Two Outreachy interns have been selected to work with Zarr for the next 3 months. BP asked if anyone is working on partial Zarr stores. After that, DB raised some opinions on the zarr-python slicing API, and lastly DH had some questions related to the V2 spec.
Updates:
- Outreachy Interns to work on Tutorials (AWA Brandon) and Testing Zip Stores (Weddy)! 🎉
- ZEP1 Project board and PRs: https://github.com/orgs/zarr-developers/projects/2/views/2
- PyData Global 2022:
Meeting Minutes:
- BP: Question around if anyone is doing work with partial zarr stores or live archives in zarrs.
- Growing in at least one dimension
- Good chunking strategies but not
- Pushing updates
- MD: Zarr has been sort of archivey. Good to talk about “append, append, append”. Updating data is less common. See Ryan’s strategy.
- BP: hackathon during AGU. Hub in LA if anyone wants to join.
- DB: if you have a well-defined shape then it should be ok. Chunks are never partial on disk. (At least that’s not variable)
- Changing origin would be expensive. (Rename everything)
- JK: if you are only using part of a chunk, it will fill with fill_value. Resizing could lead to zeroes. (MD: yes, recent shock)
- IV: ZEP3? BP: Yes. Should be contributing. Have done some work internally.
- DB: zarr-python slicing API: aged well? alternatives?
- critique points:
- process of adding static types to PRs led to looking at the slicing API
- https://github.com/zarr-developers/zarr-python/blob/main/zarr/indexing.py
- numpy has expressive slicing (integer, tuple, slice object, tuple of slice object, tuple of arrays)
- in Zarr, ever type in the polymorphism is a class. no interface in common
- spit out 2 things: number of chunks and with same arity set of operations to do on them
- maybe three levels of indexing, but perhaps order of operations depends on the compressor
- MD: looked at numpy’s implementations or is it in C.
- DB: assumed they were encumbered by organic growth
- MD: likely, but lots of testing. Also: normalizing to in-memory indexing (set of) then might be just one implementation
- DB: then imagined a package that solved the problem which could be shared with napari, etc.
- DH: looked at HDF5? (difficult) DB: maybe h5py since they handle the polymorphism.
- JM: possibly https://pypi.org/project/h5py-like/ too
- JK, dask array slicing? np slicing does a bunch of stuff. led to vindex and oindex.
- IV: views on arrays in Julia is really nice. JK: multidispatch helps (IV: and n-dim in the lang.)
- JK: there was a performance issue on slicing, now fixed, but likely just a subset.
- critique points:
- DH: questions about v2.
- feedback for JS about the current nczarr extensions
- shape of an array isn’t fixed? stated anywhere (JM: explained recent issues)
- scalars (arrays of rank 0) are ok?
- char type (in numpy)? (like a scalar string)
- IV: strings? DH: fixed length is in, but varlength proposed and close to varlen arrays.