2023-08-09
Attending: Davis Bennett (DB), Sanket Verma (SV), Ryan Williams (RW), Dennis Heimbginer (DH), Ward Fisher (WF), Norman Rzepka (NR), Martin Durrant (MD), Alfonso Ladino Rincon (ALR)
TL;DR:
SV started the meeting by giving important updates, which included the essential summary of the Zarr-Python working groups. Ryan Williams joined the meeting after a while, so we did a round of intros along with favourite tourist spots. NR shared the voting status for ZEP0002, and MD discussed ZEP0003. After this, DB had some bright ideas for a major version bump for Zarr-Python, which might involve breaking changes!
Updates:
- Zarr-Python working groups
- Benchmarking and performance: https://github.com/zarr-developers/zarr-python/discussions/1479
- Refactoring: https://github.com/zarr-developers/zarr-python/discussions/1480
- POC implementation of ZEP0003
- SciPy 2023 proceedings
- Talk slides: https://doi.org/10.25080/gerudo-f2bc6f59-035
- Tools update slides: https://doi.org/10.25080/gerudo-f2bc6f59-038
- MD: SciPy poster DOI: https://doi.org/10.25080/gerudo-f2bc6f59-00b
- WF: Creating async training materials for NetCDF and Zarr with collaboration with sibling organisation having USGS funding - concrete information coming soon - would love feedback from the Zarr community - and potential collaboration
Meeting Minutes:
- Introductions with favourite tourist destinations
- SV: Himalayas
- DB: Ashville, North Carolina
- NR: Mediterranean
- RW: Governor Island
- WF: Saint Francis Kansas
- DH: Puget Sound
- NR: Please review and vote on ZEP2
- Please vote - we have 2.5 months left for the votes
- First vote already in - Constantine Pape
- Been using sharding for scalableminds datasets
- Tensorstore is next in line
- NR: For the OME community: How to adopt Zarr V3 for OME-Zarr
- MD: ZEP3 implementation in Zarrita
- MD: Actually it’s easy to do in Zarr-Python - happy to do in Zarrita too
- NR: Did some refactoring - maybe trying again makes sense
- MD: Not a lot of change - just few lines of code and it just works for V2 - but may break for V3
- SV: Next steps?
- MD: Figure out why the tests are broken! - Work on some existing chunk properties which would be different for when the chunks are variable sized
- MD: Documentation 📖
- MD: Also collaborators are welcome 🤝🏻
- DB: Anyone thinking to do a major Zarr-Python bump? Like breaking things?
- MD: Refactoring working group will be focused on that
- DB: Been playing around with codebase and looking for improvement - getting
bool
in there would be a breaking change - DB: Mutable mapping alternative - for store M.M. takes string and returns bytes but what if we change the mapping from tuple of strings to strings? And then iterate over collection of keys - could simplify a lot of key fetching stuff - is a breaking change
- MD: In favour of this proposal and we should have a discussion - also involves public facing API changes
- MD: You have nominal support - will be more engaged in benchmarking and performance
- DB:
meta_array
arguments - dask and xarray takes them and it types information - and what type to return values as - MD: Requires a lot of effort! May not be aligned with other functions in the existing codebase and hence lot of effort for implementation
- DB: Creating array in storage and access with type information
- DB: Creating group for a single array - don’t know if it’s useful - need more opinions - also need to look at how the array access properties are handled
- DB: Xarray were using
get_items
to pluck array out of the group and wanted to changewrtie_empty_chunks
on the array but the API doesn’t let them do that - DB: Proposed solution: Make
write_empty_chunks
as arguments for group creation - found it distasteful - MD: Facing similar situations in intake - Python syntax is not much help here
- DB: Not using mutable mapping can help with this
- MD: you could have a context as well
- SV: Any cons for removing the M.M.?
- DB: Extra arguments - a custom
get()
function which works similar toget_item()
- MD: You may face resitant on this!
- MD: A new
array.configure()
method can help you configure the array - DB: I don’t like it!
- MD: It is something you’re not after! ;)
- DB: We have more than one ways to get array in a group - would be good to conclude on a single of limited options
- DB: FSStore caches object via FSSPEC?
- MD: Not necessarily, you can have a local copy of files cache - not there by default, you have to request it
- DB:
attrs
maintain a cache of it’s attribute - DB: Maybe caching should be a business for individual stores? - could be a performance optimisation for Zarr-Python
- MD: In FSSPEC it’s done in a layered approach - also the file based caches are going under a re-write atm
- MD: You have to read the file, cache it and then point it in FSSPEC