2023-01-11
Attending: Jeremy Maitin-Shepard (JMS), Josh Moore (JM), John Kirkham (JK), Sanket Verma (SV), Brianna Pagán (BP), Martin Durant (MD)
TL;DR:
Happy New Year, folks! 🎉 Welcome to the first Zarr community meeting of 2023! The discussion started with John’s briefing about his recent visit to Allen Institute. After this, Brianna initiated a discussion on Geo-Zarr SPEC, which led to discussing various .zarr datasets NASA is using and how they are storing them. Brianna has been working on .zarr datasets, and her team will publish them shortly. Finally, we concluded the meeting by discussing async-zarr, new Zarr’s R Implementation and PR #1131 in Zarr-Python.
Updates:
- Happy New Year! 🥂
- next zarr release
- release notes: https://hackmd.io/_P8Q0_cFT-6ymtYJSm9wnA?view#Final-Release-Notes-for-21342140
- for John: https://github.com/zarr-developers/zarr-python/pull/1285
- Mads added 2 commits
- followed by getting 1096 and 1111 out
Meeting Minutes:
- Allen Institute Cell/Brain meeting (JK)
- day of meetings (7)
- deep learning, image processing (Zarr as input)
- storage management for multiple groups (TIFFs to Zarrs)
- invited people to this meeting
- “what should (we) do with our workflow”
- used poster from neuroscipy and some notebooks
- also explanations of HDF5 (hierarchical storage)
- interest in benchmarks
- daily mouse brains! ergo throughput!
- HHMI -> Allen -> CZI: write c++/rust (typed compiled) support
- Nathan Clack: https://github.com/nclack
- no one slide deck would have helped
- maybe “basic getting started” from the poster
- questions:
- pyramidal / ome-zarr (extensions)
- SV: showing Henning’s drawings at NASA
- geo-zarr spec (BP)
- want to push for a v1 in the next 6 months
- organizing with Ryan, Chris Holmes (Planet) and folks from OGC
- BP: had group email chain
- branching / forking his repo
- specs for making zarr stores compatible with e.g. xarray
- MD: coordinate tranforms. own interpretations in e.g. gdal
- BP: searchability of the zarr stores. keeping them inline with other collections
- MD: including bounding box? yes. like stac. QC units.
- BP: worried about the spec not being done and needing to re-publish
- MD: have a ZEP/discussion place about “should this be handled by Zarr or not?”
- SWG: steering working group for OGC to write the proposal
- MD: could start on top of affine transform (but exists in cfconventions)
- CRS as a short fall of cfconventions
- SV: https://search.earthdata.nasa.gov/search ?
- BP: yes. where it’s in AWS.
- first time having duplicate datasets (by format)
- what needs to be updated in the Common Metadata Repository
- Jennifer Wei discussing with SV, don’t see Zarrs
- BP: don’t have the ability to make them available
- giovanni-in-the-cloud: on-prem caches converted to zarr this year
- earthdata is the GUI of CMR (looking at AWS)
- MD: doing auth signing? LONG CONVERSATION. (HTTP Tokens via proxy)
- MD: anaconda interesting in being conda frontend for data via intake
- do search, pull credentials, redirect, etc.
- BP: people are to moving to zarr stores with or without them. archives are alive. “static zarr store is not true”
- 14000 collections. tie into CMR API.
- https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
- MD: intake no listing
- BP: eartaccess is voluntary maintained (APIs across NASA: https://github.com/nsidc/earthaccess)
- “store” vs “collection”
- store is the .zarr directory
- collection is a product
- files under that are granules
- NC and TIFF can be archived via granule
- (pangeo-forge is calling this a store)
- want to associate the store with the collection
- STAC calls those … Need a glossary.
- JK: https://numpy.org/doc/stable/user/numpy-for-matlab-users.html
- async-zarr: https://github.com/martindurant/async-zarr
- Blog by Martin: http://martindurant.github.io/blog/async-zarr/
- Steps needed to release it as a package
- Transferring the repo under /zarr-developers?
- Writing tests?
- Adding Github actions?
- Testing the browser is tricky, but not something for MD (i.e. requires effort from someone else) but useful independently for the two use cases
- FYI: https://github.com/grimbough/Rarr
- getitems: https://github.com/zarr-developers/zarr-python/pull/1131
- JK: might be useful for other types of arrays
- JMS: worried that every line of code needs to change. do it as core?
- JK plugin pieces - store & compressors