2023-01-11

Attending: Jeremy Maitin-Shepard (JMS), Josh Moore (JM), John Kirkham (JK), Sanket Verma (SV), Brianna Pagán (BP), Martin Durant (MD)

TL;DR:

Happy New Year, folks! 🎉 Welcome to the first Zarr community meeting of 2023! The discussion started with John’s briefing about his recent visit to Allen Institute. After this, Brianna initiated a discussion on Geo-Zarr SPEC, which led to discussing various .zarr datasets NASA is using and how they are storing them. Brianna has been working on .zarr datasets, and her team will publish them shortly. Finally, we concluded the meeting by discussing async-zarr, new Zarr’s R Implementation and PR #1131 in Zarr-Python.

Updates:

Meeting Minutes:

  • Allen Institute Cell/Brain meeting (JK)
    • day of meetings (7)
    • deep learning, image processing (Zarr as input)
    • storage management for multiple groups (TIFFs to Zarrs)
    • invited people to this meeting
    • “what should (we) do with our workflow”
    • used poster from neuroscipy and some notebooks
    • also explanations of HDF5 (hierarchical storage)
    • interest in benchmarks
    • daily mouse brains! ergo throughput!
    • HHMI -> Allen -> CZI: write c++/rust (typed compiled) support
    • no one slide deck would have helped
      • maybe “basic getting started” from the poster
    • questions:
      • pyramidal / ome-zarr (extensions)
    • SV: showing Henning’s drawings at NASA
  • geo-zarr spec (BP)
    • want to push for a v1 in the next 6 months
    • organizing with Ryan, Chris Holmes (Planet) and folks from OGC
    • BP: had group email chain
    • branching / forking his repo
    • specs for making zarr stores compatible with e.g. xarray
    • MD: coordinate tranforms. own interpretations in e.g. gdal
    • BP: searchability of the zarr stores. keeping them inline with other collections
      • MD: including bounding box? yes. like stac. QC units.
      • BP: worried about the spec not being done and needing to re-publish
      • MD: have a ZEP/discussion place about “should this be handled by Zarr or not?”
      • SWG: steering working group for OGC to write the proposal
      • MD: could start on top of affine transform (but exists in cfconventions)
      • CRS as a short fall of cfconventions
    • SV: https://search.earthdata.nasa.gov/search ?
  • async-zarr: https://github.com/martindurant/async-zarr
    • Blog by Martin: http://martindurant.github.io/blog/async-zarr/
    • Steps needed to release it as a package
      • Transferring the repo under /zarr-developers?
      • Writing tests?
      • Adding Github actions?
      • Testing the browser is tricky, but not something for MD (i.e. requires effort from someone else) but useful independently for the two use cases
  • FYI: https://github.com/grimbough/Rarr
  • getitems: https://github.com/zarr-developers/zarr-python/pull/1131
    • JK: might be useful for other types of arrays
    • JMS: worried that every line of code needs to change. do it as core?
    • JK plugin pieces - store & compressors