2023-05-03

Attending: Josh Moore (JM), Dennis Heimbigner (DMH), Alan Liddell (AL), Davis Bennett (DB), Hailey Johnson (HJ), Ward Fisher (WF), Mark Kittisopikul (MK), Jeremy Maitin-Shepard (JMS)

Apologies: Sorry I can’t make it, ping me if there’s anything I should follow up on (Martin D)

TL;DR:

To start off our meeting, we asked everyone to share their favourite songs and give an update on their current work. Here are the songs:

SV updated us on the voting progress for ZEP0001. DH had some questions about the must_understand flag in the V3 spec. MK shared some insights on ZEP0002 and their recent work with Zarr.jl. JM alerted us to performance issues with Zarr-Python that Ryan Abernathey brought up. DB talked about representing the Zarr group (tree) using data structures. Finally, SV ended the meeting by discussing the process of converting rOS data to the Zarr format.

Meeting Minutes:

  • MK: see https://github.com/fsspec/kerchunk/pull/331
    • “Accelerating 65,536 chunks in HDF5 to Zarr by 800x via H5Dchunk_iter” ❤️
  • AL: see https://github.com/acquire-project/acquire-driver-zarr
  • SV: ZEP1 review, rendered at https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html
    • WF: question re: voting – via comment? yes.
    • DMH: place for minority report, i.e. can we add reservations? “yes, but a really bad idea”
    • JMS: complaint is with the complexity of partial reading?
    • DMH: no, can live with that (though hard in an interpreted lang.)
      • main problem is the decision to default to “must-understand” as true
      • disallowing annotation of zarr file with additional information is problematic
      • burden on all implementors
    • JMS: if you add metadata, is the overhead of marking it “must-understand=false” too much?
    • DMH: predict that “must-understand” does not have sufficient scope to annotate everything. can’t prove it yet.
      • generally don’t understand why to read-narrowly rather than read-widely
    • JM: original discussions revolved around namespacing, internalling, …
    • JMS: that was zattrs but this is about zarray (v3 equivalents)
    • WF (summarizing JMS): user-defined metadata is free-form and doesn’t require the “must-understand” (correct)
    • DMH: goal is to define nczarr as extensions but that’s still ill- or under-defined. worried that we will disallow experimentation. not able to change later.
    • SV: would be good to have an issue
  • MK: ZEP2
  • MK: Julia / Zarr.jl
  • JM: zarr-python
    • performance issues (via Ryan A.)
    • 2.15.0 release on its way
    • includes GPU-
  • DB: higher-level question
    • interest in an API to generate the entire tree?
    • xarray-datatree project (“hierarchy of xarrays”)
    • keys are absolute paths
    • e.g. with ome-ngff, we expect a specific structure
    • requires dictionary representation
    • see for example, https://github.com/JaneliaSciComp/pydantic-ome-ngff
    • JM: only know of conslidated metadata (needed in v3)
    • DB: how do you represent the array data?
    • DB: more and more, think that need a lazy, non-dask array repr.
    • JMS: in tensorstore, have a JSON representation of an array (abstract URL-like)
      • DB: lightweight, fits in a dict, can pass it around
    • prototype forthcoming…
    • MK: namedtuple?
  • SV: at PyCon DE and PyData Berlin 2023 https://www.ros.org/ folks asked about Zarr
    • AL: Acquire project has been considering integrating with ROS (camera abstractions, etc.)
      • Haven’t come across anyone using it.