2023-08-09

Attending: Davis Bennett (DB), Sanket Verma (SV), Ryan Williams (RW), Dennis Heimbginer (DH), Ward Fisher (WF), Norman Rzepka (NR), Martin Durrant (MD), Alfonso Ladino Rincon (ALR)

TL;DR:

SV started the meeting by giving important updates, which included the essential summary of the Zarr-Python working groups. Ryan Williams joined the meeting after a while, so we did a round of intros along with favourite tourist spots. NR shared the voting status for ZEP0002, and MD discussed ZEP0003. After this, DB had some bright ideas for a major version bump for Zarr-Python, which might involve breaking changes!

Updates:

Zarr-Python working groups
- Benchmarking and performance: https://github.com/zarr-developers/zarr-python/discussions/1479
- Refactoring: https://github.com/zarr-developers/zarr-python/discussions/1480
POC implementation of ZEP0003
- https://github.com/zarr-developers/zarr-python/pull/1483
SciPy 2023 proceedings
- Talk slides: https://doi.org/10.25080/gerudo-f2bc6f59-035
- Tools update slides: https://doi.org/10.25080/gerudo-f2bc6f59-038
MD: SciPy poster DOI: https://doi.org/10.25080/gerudo-f2bc6f59-00b
WF: Creating async training materials for NetCDF and Zarr with collaboration with sibling organisation having USGS funding - concrete information coming soon - would love feedback from the Zarr community - and potential collaboration

Meeting Minutes:

Introductions with favourite tourist destinations
- SV: Himalayas
- DB: Ashville, North Carolina
- NR: Mediterranean
- RW: Governor Island
- WF: Saint Francis Kansas
- DH: Puget Sound
NR: Please review and vote on ZEP2
- Please vote - we have 2.5 months left for the votes
- First vote already in - Constantine Pape
- Been using sharding for scalableminds datasets
- Tensorstore is next in line
NR: For the OME community: How to adopt Zarr V3 for OME-Zarr
MD: ZEP3 implementation in Zarrita
- MD: Actually it’s easy to do in Zarr-Python - happy to do in Zarrita too
- NR: Did some refactoring - maybe trying again makes sense
- MD: Not a lot of change - just few lines of code and it just works for V2 - but may break for V3
- SV: Next steps?
- MD: Figure out why the tests are broken! - Work on some existing chunk properties which would be different for when the chunks are variable sized
- MD: Documentation 📖
- MD: Also collaborators are welcome 🤝🏻
DB: Anyone thinking to do a major Zarr-Python bump? Like breaking things?
- MD: Refactoring working group will be focused on that
- DB: Been playing around with codebase and looking for improvement - getting bool in there would be a breaking change
- DB: Mutable mapping alternative - for store M.M. takes string and returns bytes but what if we change the mapping from tuple of strings to strings? And then iterate over collection of keys - could simplify a lot of key fetching stuff - is a breaking change
- MD: In favour of this proposal and we should have a discussion - also involves public facing API changes
- MD: You have nominal support - will be more engaged in benchmarking and performance
- DB: meta_array arguments - dask and xarray takes them and it types information - and what type to return values as
- MD: Requires a lot of effort! May not be aligned with other functions in the existing codebase and hence lot of effort for implementation
- DB: Creating array in storage and access with type information
- DB: Creating group for a single array - don’t know if it’s useful - need more opinions - also need to look at how the array access properties are handled
- DB: Xarray were using get_items to pluck array out of the group and wanted to change wrtie_empty_chunks on the array but the API doesn’t let them do that
- DB: Proposed solution: Make write_empty_chunks as arguments for group creation - found it distasteful
- MD: Facing similar situations in intake - Python syntax is not much help here
- DB: Not using mutable mapping can help with this
- MD: you could have a context as well
- SV: Any cons for removing the M.M.?
- DB: Extra arguments - a custom get() function which works similar to get_item()
- MD: You may face resitant on this!
- MD: A new array.configure() method can help you configure the array
- DB: I don’t like it!
- MD: It is something you’re not after! ;)
- DB: We have more than one ways to get array in a group - would be good to conclude on a single of limited options
DB: FSStore caches object via FSSPEC?
- MD: Not necessarily, you can have a local copy of files cache - not there by default, you have to request it
- DB: attrs maintain a cache of it’s attribute
- DB: Maybe caching should be a business for individual stores? - could be a performance optimisation for Zarr-Python
- MD: In FSSPEC it’s done in a layered approach - also the file based caches are going under a re-write atm
- MD: You have to read the file, cache it and then point it in FSSPEC