2023-05-03
Attending: Josh Moore (JM), Dennis Heimbigner (DMH), Alan Liddell (AL), Davis Bennett (DB), Hailey Johnson (HJ), Ward Fisher (WF), Mark Kittisopikul (MK), Jeremy Maitin-Shepard (JMS)
Apologies: Sorry I can’t make it, ping me if there’s anything I should follow up on (Martin D)
TL;DR:
To start off our meeting, we asked everyone to share their favourite songs and give an update on their current work. Here are the songs:
- SV: Meteora Album by Linkin Park
- JMS: Lullabys and sleep songs and Secret Garden by Bruce Springsteen
- MK: Black Magic Woman by Santana
- DB: Pop by Gas and In Dreams by Roy Orbison
- JM: Ode to Joy by Beethoven
- HJ: Songs to listen while running
- AL: Subdivisions by Rush and Godzilla by Blue Öyster Cult
- WF: Mid 90s songs or songs played by Ward’s teens
- DH: St. James Infirmary
SV updated us on the voting progress for ZEP0001. DH had some questions about the must_understand
flag in the V3 spec. MK shared some insights on ZEP0002 and their recent work with Zarr.jl. JM alerted us to performance issues with Zarr-Python that Ryan Abernathey brought up. DB talked about representing the Zarr group (tree) using data structures. Finally, SV ended the meeting by discussing the process of converting rOS data to the Zarr format.
Meeting Minutes:
- MK: see https://github.com/fsspec/kerchunk/pull/331
- “Accelerating 65,536 chunks in HDF5 to Zarr by 800x via H5Dchunk_iter” ❤️
- AL: see https://github.com/acquire-project/acquire-driver-zarr
- SV: ZEP1 review, rendered at https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html
- WF: question re: voting – via comment? yes.
- DMH: place for minority report, i.e. can we add reservations? “yes, but a really bad idea”
- JMS: complaint is with the complexity of partial reading?
- DMH: no, can live with that (though hard in an interpreted lang.)
- main problem is the decision to default to “must-understand” as true
- disallowing annotation of zarr file with additional information is problematic
- burden on all implementors
- JMS: if you add metadata, is the overhead of marking it “must-understand=false” too much?
- DMH: predict that “must-understand” does not have sufficient scope to annotate everything. can’t prove it yet.
- generally don’t understand why to read-narrowly rather than read-widely
- JM: original discussions revolved around namespacing, internalling, …
- JMS: that was zattrs but this is about zarray (v3 equivalents)
- WF (summarizing JMS): user-defined metadata is free-form and doesn’t require the “must-understand” (correct)
- DMH: goal is to define nczarr as extensions but that’s still ill- or under-defined. worried that we will disallow experimentation. not able to change later.
- SV: would be good to have an issue
- MK: ZEP2
- see https://github.com/zarr-developers/zarr-specs/pull/152#issuecomment-1533335795
- discussions around checksums for sharding as well as variable sized chunk size
- JMS: shards with different chunks, etc. would lead to variable sizes. would be unfortunate for latency. could add a field to specify the size.
- MK: Julia / Zarr.jl
- some async performance was not up to zarr-python levels
- not using openssl, updated HTTP stack
- https://github.com/JuliaWeb/HTTP.jl/pull/1034#issuecomment-1525624843
- single task faster for mbedTLS (?)
- tl;dr - SSL implementation can affect through-put
- JM: zarr-python
- performance issues (via Ryan A.)
- 2.15.0 release on its way
- includes GPU-
- DB: higher-level question
- interest in an API to generate the entire tree?
- xarray-datatree project (“hierarchy of xarrays”)
- keys are absolute paths
- e.g. with ome-ngff, we expect a specific structure
- requires dictionary representation
- see for example, https://github.com/JaneliaSciComp/pydantic-ome-ngff
- JM: only know of conslidated metadata (needed in v3)
- DB: how do you represent the array data?
- DB: more and more, think that need a lazy, non-dask array repr.
- JMS: in tensorstore, have a JSON representation of an array (abstract URL-like)
- DB: lightweight, fits in a dict, can pass it around
- prototype forthcoming…
- MK: namedtuple?
- SV: at PyCon DE and PyData Berlin 2023 https://www.ros.org/ folks asked about Zarr
- AL: Acquire project has been considering integrating with ROS (camera abstractions, etc.)
- Haven’t come across anyone using it.
- AL: Acquire project has been considering integrating with ROS (camera abstractions, etc.)