2024-02-22
Attending: Sanket Verma (SV), Ward Fisher (WF), Josh Moore (JM), Martin Durant (MD), Tom Nicholas (TN)
TL;DR:
The meeting covered the use of LLMs in training, feedback on the Zarr-Specs website redesign, progress on the V3 refactor, and discussions on integrating kerchunk into Zarr, focusing on chunk manifest standardization and virtualized array concatenation.
Updates:
- HTTP Extension meeting: https://docs.google.com/document/d/14TJfrjbfU1R2REjrZ35GjV74MJ18j_m6geWdj0oB83Y/edit?usp=sharing
Meeting Notes:
- LLMs and how WF is using them in trainings
- Feedback for new design for Zarr-Specs website (combines ZEP and Zarr-Specs together)
- MD: How’s V3 refactor work going on these days?
- JM: Quite good progress taking place these days
- SV: V3 PRs can be found here - https://github.com/zarr-developers/zarr-python/pulls?q=is%3Apr+is%3Aopen+label%3AV3
- MD: https://zarr.dev/zeps/draft/ZEP0003.html
- TN: Been discussing → https://github.com/zarr-developers/zarr-specs/issues/287
- interested in integrating kerchunk into zarr, especially two ZEPs
- (1) chunk manifest (Joe) - standardizing what chunk json files do
- (2) concatenation - https://github.com/zarr-developers/zarr-specs/issues/288
-
- manifest: opinion that it’s an incredible idea that is very popular
- fsspec relationship makes things complicated
- move to the zarr spec for other implementations?
- goal is readable in any language
- difficult position
- three things to think about
- read byte ranges
- write JSON
- combine module
- roadmap:
- standardize json for the chunks. manifest file?
- JM: storage in zarr array itself
- JM: log file anytime you read a full file into memory
- Josh: virtual zarr (access pattern)
- manifest: opinion that it’s an incredible idea that is very popular
-
- concatenation
- multi-zarr-to-zarr leads to a loop
- more sense to think of concat of virtualized arrays objects
- see kerchunk array notebook
- read in byte ranges with kerchunk. array class which only stores byte-offset arrays in memory
- can be done in xarray. concat-classes can be put into xarray and can use higher-order API
- JM: store that xarray as a zarr :smile: (but need additional metadata for realizing the array)
- TN: part of notebook that isn’t done. exactly.
- common case in geo. multiple NC files, concat those array.
- possibly compression options change over time.
- prevents it from being one zarr array
- JM: or just always serialize to the chunk manifest
- JM: i.e. where do we stop? (when does Zarr become Turing Complete?)
- TN: thought at concat (clear use case). but jeremy thought indexing (also clear use case)
- JM: starting to sound like transforms (https://github.com/ome/ngff/pull/138#issuecomment-1948424000)
- WF: periodically get requests for operations on the data
- no one has come close to making the argument for adding that into the storage
- so many math libraries that would do it better
- TN: no computation since you don’t need the values. can do some subset of concat & indexing without values.
- concatenation
- TN: have now become a zarr producer :tada:
- JM: cross-language motivation
- SV: pyramiding ZEP discussions