Attending: Davis Bennett (DB), Sanket Verma (SV), Ward Fisher (WF), Dennis Heimbigner (DH), Jeremy Maitin-Shepard (JMS), Norman Rzepka (NR), Eric Perlman (EP), Dieu My Nguyen (DMN), Virginia Scarlett (VS)


We started the meeting by discussing some possible performance improvements for sharding. Ryan Abernathey did some benchmarks a few weeks ago, which turned out well. Then, the community offered a few solutions to make it even better. Jonathan is working on a PR for the same. After that, DB wondered, “How can he only store the indexes in the cache using Zarr?”

SV asked what does everyone think about the all-contributors bot. He also questioned whether the community would like to participate if Zarr were to organise a hack week. Both ideas were received 👍🏻 from everyone.

Lastly, DB showed us what he’s working on, and JMS discussed the zarr-specs issue he submitted recently.


  • Zarr-Python 2.14.1 release with sharding and new docs theme (PyData-Sphinx)
  • SciPy 2023 deadline next week (03/01), if you want to collaborate on a tutorial with us, now’s the time

Meeting Notes:

  • DB: Sharding is slow - according to https://github.com/zarr-developers/zarr-python/issues/1343
    • NR: Jonathan is working on a PR
    • DB - Using mutable mapping for interface store for getitems and setitems is the probable cause - Zarr array API has some bash logic - which may also lead to slowing down
    • DH: What type of caching model is using?
    • NR: No caching is there! - individual chunks are loaded through byte range
    • DH: having cache would be fine
    • NR: maybe! depends on the use case
    • JMS: Zarr array API does have support for batch reading and writing - if you’re giving multiple shard from single chunk the overhead would be big - you need to tell the user to cache the index - maybe twice the read requests because of the index and the shard
    • NR: Having index in cache makes sense
  • DB: How to store the indexes in the cache? Maybe JSON? What would be good storage format? How about SQLite? (for my use case)
    • JMS: Unrelated to cache - you can list the chunk
    • DB: recursive and expensive
    • JMS: S3 can give you a flat list, may not solve the problem but the abstraction would help
    • JMS: are you doing re-encoding?
    • DB: yes
    • JMS: SQLite could be pretty reasonable solution
    • DB: Is there a clever way to do it using Zarr itself?
    • JMS: can be ordered using lexical graphs
    • DB: If your data is too big then your metadata becomes another type of data
  • SV: What do you think about using all-contributors and hosting a Zarr hack week to sync zarr-python implementation to V3?
    • All: Sounds good for all-contributors 👍🏻
    • EP: Having something online would be good and I’d participate
    • DB: Sounds good and I’d participate
    • WF and JMS: 👍🏻
  • DB: Defining abstract structure of zarr array like keys, properties, metadata - OME-Zarr has a try catch block - Have put some together with pydantic - it’s simple to generate zarr group - taking abstract tree representation and running it backwards to create Zarr groups
    • Repo: https://github.com/JaneliaSciComp/pydantic-ome-ngff
      • SV: Could be something like pip install ztree or something similar
    • DB: Trying to define protocol for HDF5 as well - could dump the hierarchy into Zarr or HDF5 container
    • DB: Structural sub typing stuff
    • SV: Would be good to show a demo?
    • DB: Yes!
  • JMS: https://github.com/zarr-developers/zarr-specs/issues/212
    • DB: Maybe similar to what vanilla N5 does!
    • JMS: Reasonable to add a metadata option - planning to add a extension ZEP for this