2023-08-09

Attending: Davis Bennett (DB), Sanket Verma (SV), Ryan Williams (RW), Dennis Heimbginer (DH), Ward Fisher (WF), Norman Rzepka (NR), Martin Durrant (MD), Alfonso Ladino Rincon (ALR)

TL;DR:

SV started the meeting by giving important updates, which included the essential summary of the Zarr-Python working groups. Ryan Williams joined the meeting after a while, so we did a round of intros along with favourite tourist spots. NR shared the voting status for ZEP0002, and MD discussed ZEP0003. After this, DB had some bright ideas for a major version bump for Zarr-Python, which might involve breaking changes!

Updates:

Meeting Minutes:

  • Introductions with favourite tourist destinations
    • SV: Himalayas
    • DB: Ashville, North Carolina
    • NR: Mediterranean
    • RW: Governor Island
    • WF: Saint Francis Kansas
    • DH: Puget Sound
  • NR: Please review and vote on ZEP2
    • Please vote - we have 2.5 months left for the votes
    • First vote already in - Constantine Pape
    • Been using sharding for scalableminds datasets
    • Tensorstore is next in line
  • NR: For the OME community: How to adopt Zarr V3 for OME-Zarr
  • MD: ZEP3 implementation in Zarrita
    • MD: Actually it’s easy to do in Zarr-Python - happy to do in Zarrita too
    • NR: Did some refactoring - maybe trying again makes sense
    • MD: Not a lot of change - just few lines of code and it just works for V2 - but may break for V3
    • SV: Next steps?
    • MD: Figure out why the tests are broken! - Work on some existing chunk properties which would be different for when the chunks are variable sized
    • MD: Documentation 📖
    • MD: Also collaborators are welcome 🤝🏻
  • DB: Anyone thinking to do a major Zarr-Python bump? Like breaking things?
    • MD: Refactoring working group will be focused on that
    • DB: Been playing around with codebase and looking for improvement - getting bool in there would be a breaking change
    • DB: Mutable mapping alternative - for store M.M. takes string and returns bytes but what if we change the mapping from tuple of strings to strings? And then iterate over collection of keys - could simplify a lot of key fetching stuff - is a breaking change
    • MD: In favour of this proposal and we should have a discussion - also involves public facing API changes
    • MD: You have nominal support - will be more engaged in benchmarking and performance
    • DB: meta_array arguments - dask and xarray takes them and it types information - and what type to return values as
    • MD: Requires a lot of effort! May not be aligned with other functions in the existing codebase and hence lot of effort for implementation
    • DB: Creating array in storage and access with type information
    • DB: Creating group for a single array - don’t know if it’s useful - need more opinions - also need to look at how the array access properties are handled
    • DB: Xarray were using get_items to pluck array out of the group and wanted to change wrtie_empty_chunks on the array but the API doesn’t let them do that
    • DB: Proposed solution: Make write_empty_chunks as arguments for group creation - found it distasteful
    • MD: Facing similar situations in intake - Python syntax is not much help here
    • DB: Not using mutable mapping can help with this
    • MD: you could have a context as well
    • SV: Any cons for removing the M.M.?
    • DB: Extra arguments - a custom get() function which works similar to get_item()
    • MD: You may face resitant on this!
    • MD: A new array.configure() method can help you configure the array
    • DB: I don’t like it!
    • MD: It is something you’re not after! ;)
    • DB: We have more than one ways to get array in a group - would be good to conclude on a single of limited options
  • DB: FSStore caches object via FSSPEC?
    • MD: Not necessarily, you can have a local copy of files cache - not there by default, you have to request it
    • DB: attrs maintain a cache of it’s attribute
    • DB: Maybe caching should be a business for individual stores? - could be a performance optimisation for Zarr-Python
    • MD: In FSSPEC it’s done in a layered approach - also the file based caches are going under a re-write atm
    • MD: You have to read the file, cache it and then point it in FSSPEC