2024-05-15
Attending: Davis Bennett (DB), Sanket Verma (SV), Dennis Heimbigner (DH), Brianna Pagān (BP), Thomas Nicholas (TN), Jeremy Maitin-Shepard (JMS)
TL;DR:
The team discussed the new Zarr-Python 2.18.0 release, ongoing work on validators, and the potential integration of VirtualiZarr into zarr-developers. They also explored the challenges and future directions for supporting V2 and V3 implementations, codecs, and storage transformers.
Updates:
- Zarr-Python 2.18.0 out now: https://github.com/zarr-developers/zarr-python/releases/tag/v2.18.0
- One of the last few releases for Zarr Spec 2 - if there’s anything you want to get in, please reply/tag us in the PRs/issues
- New blog post by Joe Hamman: https://zarr.dev/blog/zarr-python-v3-update/
- Zarr-Python developers meeting new schedule - check here: https://zarr.dev/community-calls/
- Lachlan Deakin added support for sharding in his R implementation: https://github.com/LDeakin/zarrs/releases/tag/v0.13.1
Meeting Minutes:
- SV: How’s the implementation coming up?
- DH: ArraytoArray codecs is still needs to go there
- DB: Are you trying to support implicit groups?
- DH: Trying to add it
- DB: We’re trying to remove it
- DH: Focusing on adding support for V2 and V3
- DB: Poses a bit of problem than making things easier
- Zarr Validators - tool to check if the Zarr data conforms to the specification or not
- Similar to OME-NGFF Validators: https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr
- Brianna for geozarr: https://github.com/briannapagan/geozarr-validator/ need to update and will abstract, but realization that most of this is just zarr data model validating as-is with a few lines of geozarr specific checks.
- BP: Work on GeoZarr spec continues - roadblocked by cf conventions - how much we want to utilise it - parallely working on creating a validator - don’t want to replicate the cf conventions
- DB: Pydantic-Zarr has a validator - you can give a hierarchy and it validates it -
- DB: https://github.com/JaneliaSciComp/pydantic-ome-ngff
- BP: Thanks for sharing it - we’ll try to adapt from Pydantic-Zarr
- DB: https://github.com/JaneliaSciComp/ztree is an outdated implementation of Pydantic-Zarr
- BP: Difference b/w ZOM and Pydantic-Zar?
- DB: Spec doesn’t define hierarchies - and folks care about groups - storing the hierarchy structures in form of
JSON
document - DB: Having ZOM ZEP implementation in initial V3 release would help with the tests a lot
- TN: ZOM ZEP can be useful for VirtualiZarr
- Codecov not reports lagging in recent PRs - any ideas how to fix it?
- DB: Maybe because of fixtures and long list of files
- DB: https://github.com/zarr-developers/zarr-python/pull/1813
- TN: Interested in having VirtualiZarr under zarr-developers!
- DB: Sounds good!
- SV: We can transfer the repo to zarr-developers and provide TN required privleges
- TN: Storage transformers discussion
- DH: recap of the past discussion would be appreciated
- TN: starts recapping
- DH: Can this be handled by codecs in V3? - you already have ArraytoArray, ArraytoBytes, BytestoBytes…
- JMS: It could be! - would be a interesting to see
- DH: Maybe a experimentation first would be a good idea?
- DB: Using codec might not be desirable if you want to change the data
- DH: Codec would only to take care of the existing dimensions and nothing else
- JMS: ArraytoByte covers endian
- DH: Perhaps the virtual concept of storage transformers can be defined as a codec
- DB: V2 codecs don’t know anything about the storage - only converting bytes to nd-array and vice versa
- DH: But in V3 the concept of codecs has been expanded
- DH: Shards cannot be moved around or compressed because the sharding codecs doesn’t have a mechanism to track the shards - needs to be looked at
- TN: All of the manifests of array, metadata, groups are combined when you use Kerchunk but VirtualiZarr separates it and makes it easier for various tasks like concatenation