10th March, 2021
Attendees: Josh Moore, Norman Fomferra, Matthias Bussonnier, Matt McCormick, Sabine Embacher, Haily Johnson, Ward Fisher, John Kirkham, Ryan Williams
-
Misc
- How many people on Python 3.9? Josh … Matthias: 3.10 is coming.
-
Introductions (noting new folks only)
-
Hailey (Unidata). Zarr for netcdf
-
Norman (tech lead, bcdev): working with pydata tech. stack,
incl. xarray. Using zarr all day.
-
-
Community
- Unidata java branch: didn’t have distributed file reading. Just
now working (Don’t follow Hailey’s branch). AWS S3, … and zip files.
- jzarr supports: blosc (native) and zlib (in java)
- https://maven-nar.github.io/
-
Javascript-land: experiments towards NGFF (Matt)
- working on supporting other dimension order
- supporting also ML libraries, …
- Found that it was also needed to pre-compute the range of values
- https://github.com/Kitware/itk-vtk-viewer
- https://github.com/gzuidhof/zarr.js
- https://github.com/ome/ngff
- Matt to create an ngff issue.
-
nczarr release (Ward) i.e. C-land
- non-zarr issues should be fixed. hard to find. easy to fix.
- few bit issues, but hopeful to have 4.8.0 released this week
- survey of computing & development environments where zarr is being used
- netcdf support linux/win/mac/32bit/64bit for archiving. file written on each needs to be readable by the others. (…arm!…)
- https://github.com/zarr-developers/zarr_implementations
- spreadsheet: https://docs.google.com/spreadsheets/u/2/d/1MpqyzIX_DIEiQwtpSg-9IntICip4ESCErrW_Dn5S7Qk/edit?usp=drive_web&ouid=103669694493794058959
-
Norman: zarr-specs / multiscales use-case
-
Matt: downsampling reduces the range so useful to have the original range above. xarray.to_zarr isn’t multiscale, but adding the representation. allows to use geospatial tools in a seamless way.
-
Norman: multi-uses cases for multiscale representations. developing xcube toolset. as well as xcube viewer.
-
Coupled problem (with ESA) – climate change initiative. 20 essential variables & ~500TB data. published by THREADSS – produce million of chunks, many HTTP requests. Would like to ideally have chunks that are 10s - 100s of MBs. But clients need max few MB.
- Matt: dealing with the same things. Would like to work
together on xcube.
- Josh:
- Matt: Create index files of all the different chunks. Access with HTTP range requests. (Running into the same problem with security URLs)
- https://twitter.com/rsignell/status/1338550348538138624
- https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.ZipStore
- https://zarr.readthedocs.io/en/stable/tutorial.html#consolidating-metadata
- https://zarr.readthedocs.io/en/stable/tutorial.html#io-with-fsspec
- Matthias:
https://github.com/zarr-developers/zarr-python/pull/667 partial read might help as well to read only part of a big tile.
- Matthias: Zarr v3 should also do less requests for deep
hierarchies. Also concurrent get()s: multiget(). Norman: async? Matthias: zarr isn’t completely async-compatible yet.
- https://github.com/zarr-developers/zarr-python/pull/606 (multiget, end of September.)
- Hailey: have a service to group all the requests
together.
- Matt: dealing with the same things. Would like to work
-
Norman: making use of the fact that chunks can be missing (v2). No change.
-
- Unidata java branch: didn’t have distributed file reading. Just
-
Core concerns
-
v3 spec status
- Matthias’ branches
- Alistair & editor’s draft
-
Funding (EOSS 4)
- Note that EOSS4 is now in 2 steps process with first a Letter of intent 1000 words. (deadline March 30)
-
-
European Meeting
-
Java: report on Java & zarr_implementations
-
Saalfeld: Composite Types
- Talked to Sabine. and someone else.
- All over the place. Completely broken.
- How to deal with variable length things.
- Josh: haven’t tested variable length
-
Hierarchy of specs
- Zarr
- (Extensions)
- Xarray
- (something else? like cfconventions.org)
- OME-NGFF
-