2024-01-10

Attending: Sanket Verma (SV), Davis Bennett (DB), Alan Liddell (AL), Norman Rzepka (NR), Jeremy Maitin-Shepard (JMS), Gābor Kovācs (GB), Josh Moore (JM) very late

TL;DR:

Updates include holiday cleaning for Zarr-Python Core Devs, Norman Rzepka joining the team, progress on Zarr-Python Refactor and B&P WG, and plans for participation in GSoC. Discussion revolved around Zarr-Python V3.0 beta release, a new numcodecs release, and considerations for chunk layout in sharded arrays.

Updates:

  • Happy New Year! 🥂
  • Holiday cleaning for the Zarr-Python Core Devs teams - active, emeritus, and former
    • Norman Rzepka will be joining Zarr-Python Core Devs 🎉
  • Zarr-Python Refactor and B&P WG updates
    • Davis is working on array API, Joe on storage layer and Norman on a new codec mechanism
    • Aiming for late January release
    • B&P group is looking to participate in GSoC this year - they have a couple of ideas

Meeting Minutes:

  • NR: Zarr-Python V3.0 is going to be beta release
    • NR: Trying to keep the momentum
    • NR: Looking for feedback
    • DB: Re-think the user facing API - currently the stuff is according to Zarr-Python V2.0
    • AL: Is there a design doc?
  • New release before Zarr-Python V3.0 goes out?
  • New numcodecs release?
    • DB: ZSTD bug was fixed, so getting a new release would be good
    • AL: Plan to include Blosc2 in the next release?
    • DB: Why do people want Blosc2?
    • AL: People are asking for it
    • JMS: At low-level Blosc1 and Blosc2 are similar
  • JMS: We should not add new codecs into Zarr-Python V3.0
    • NR: Open question on how numcodecs would operate with Zarr-Python V3.0?
    • JMS: Numcodecs could serialise the metadata for V2 and V3 - share the compression code - add a new class separately
    • NR: Reimplementing the codecs class in ZP V3.0 - should we keep numcodecs?
    • JMS: Numcodecs use Zarr V3 metadata - pretty minimal - would be good to look at the existing dependencies of numcodecs
    • SV: PyPi stats help?
    • JMS: could help thinking through the JSON API
    • DB: Would be good to clear some cobwebs on numcodecs
    • JMS: No benefit of changing the V2 metadata
    • NR: What ZP would do in the future with numcodecs?
    • DB: How bad is it to have C compiled stuff with Python?
    • JMS: Somewhat advantageous to have the C compiled code
    • DB: Assuming numcodecs won’t break ZP in any way and numocodecs would follow along the changes in ZP in the future
  • DB: Thinning stores from Zarr-Python - we could do something similar for numcodecs as well
  • SV: Visualise sharded array using Neuroglancer
    • JMS: Yes!
    • DB: Wanted to work on the documentation and wanted Janelia HHMI to pay for it
    • SV: Any apetite for user facing API documentation?
    • JMS: Yes! Is on to-do list
  • NR: JMS - have you thought about doing configurable layout of how chunks are placed inside a shard?
    • JMS: Mortan order is tricky and it’s one specific order - didn’t make significant progress here - would be interested in moving forward with this
    • NR: Wanted to support a few orders - struggling how to design that in API? - have some presets could be a way to specify that - how to do runtime configuration?
    • JMS: Putting it in metadata would be good way forward
    • DB: Has anyone used Hilbert curve?
    • NR: Has been using Mortan order
    • JMS: May complicate things with Hilbert order
    • NR: Would think about how the metadata chunk would look like - will write something at some point
    • JMS: Multiple levels of sharding may help - can have extra level of index which could be complex
    • DB: HDF5 has some filters which does data transformation - some filters have mask which tells which codec ran or not
    • NR: It could be codec itself too! Could be useful or compactful
    • JMS: Looking for fixed size or variable size chunks? NR: Fixed size.
    • NR: In parallel writing you can’t update the index
    • JMS: Blosc has something similar