2024-01-10
Attending: Sanket Verma (SV), Davis Bennett (DB), Alan Liddell (AL), Norman Rzepka (NR), Jeremy Maitin-Shepard (JMS), Gābor Kovācs (GB), Josh Moore (JM) very late
TL;DR:
Updates include holiday cleaning for Zarr-Python Core Devs, Norman Rzepka joining the team, progress on Zarr-Python Refactor and B&P WG, and plans for participation in GSoC. Discussion revolved around Zarr-Python V3.0 beta release, a new numcodecs release, and considerations for chunk layout in sharded arrays.
Updates:
- Happy New Year! 🥂
- Holiday cleaning for the Zarr-Python Core Devs teams - active, emeritus, and former
- Norman Rzepka will be joining Zarr-Python Core Devs 🎉
- Zarr-Python Refactor and B&P WG updates
- Davis is working on array API, Joe on storage layer and Norman on a new codec mechanism
- Aiming for late January release
- B&P group is looking to participate in GSoC this year - they have a couple of ideas
Meeting Minutes:
- NR: Zarr-Python V3.0 is going to be beta release
- NR: Trying to keep the momentum
- NR: Looking for feedback
- DB: Re-think the user facing API - currently the stuff is according to Zarr-Python V2.0
- AL: Is there a design doc?
- SV: https://github.com/zarr-developers/zarr-python/pull/1583/
- NR: Looking to add new features which are V3 specific
- New release before Zarr-Python V3.0 goes out?
- DB: Go for it!
- SV: https://github.com/zarr-developers/zarr-python/pull/1557?
- DB: Doesn’t break existing code
- JMS: Dropped 3.8 in Tensorstore
- NR: :thumbsup:
- DB: Also matters to maintain with the community
- New numcodecs release?
- DB: ZSTD bug was fixed, so getting a new release would be good
- AL: Plan to include Blosc2 in the next release?
- DB: Why do people want Blosc2?
- AL: People are asking for it
- JMS: At low-level Blosc1 and Blosc2 are similar
- JMS: We should not add new codecs into Zarr-Python V3.0
- NR: Open question on how numcodecs would operate with Zarr-Python V3.0?
- JMS: Numcodecs could serialise the metadata for V2 and V3 - share the compression code - add a new class separately
- NR: Reimplementing the codecs class in ZP V3.0 - should we keep numcodecs?
- JMS: Numcodecs use Zarr V3 metadata - pretty minimal - would be good to look at the existing dependencies of numcodecs
- SV: PyPi stats help?
- JMS: could help thinking through the JSON API
- DB: Would be good to clear some cobwebs on numcodecs
- JMS: No benefit of changing the V2 metadata
- NR: What ZP would do in the future with numcodecs?
- DB: How bad is it to have C compiled stuff with Python?
- JMS: Somewhat advantageous to have the C compiled code
- DB: Assuming numcodecs won’t break ZP in any way and numocodecs would follow along the changes in ZP in the future
- DB: Thinning stores from Zarr-Python - we could do something similar for numcodecs as well
- SV: https://github.com/zarr-developers/zarr-python/issues/1274
- DB: Martin suggested cramjam - a rust library
- SV: Visualise sharded array using Neuroglancer
- JMS: Yes!
- DB: Wanted to work on the documentation and wanted Janelia HHMI to pay for it
- SV: Any apetite for user facing API documentation?
- JMS: Yes! Is on to-do list
- NR: JMS - have you thought about doing configurable layout of how chunks are placed inside a shard?
- JMS: Mortan order is tricky and it’s one specific order - didn’t make significant progress here - would be interested in moving forward with this
- NR: Wanted to support a few orders - struggling how to design that in API? - have some presets could be a way to specify that - how to do runtime configuration?
- JMS: Putting it in metadata would be good way forward
- DB: Has anyone used Hilbert curve?
- NR: Has been using Mortan order
- JMS: May complicate things with Hilbert order
- NR: Would think about how the metadata chunk would look like - will write something at some point
- JMS: Multiple levels of sharding may help - can have extra level of index which could be complex
- DB: HDF5 has some filters which does data transformation - some filters have mask which tells which codec ran or not
- NR: It could be codec itself too! Could be useful or compactful
- JMS: Looking for fixed size or variable size chunks? NR: Fixed size.
- NR: In parallel writing you can’t update the index
- JMS: Blosc has something similar