15th December, 2021
Attending: Josh Moore, Eric Perlman, Tobias Kölling, Ryan Williams, John Kirkham
-
Hiring/money
-
no outreachy
-
dozen or so applications for comm. mgr
-
scalable minds working on sharding
-
b-open for xarray, multiscales, and more extension-like stuff
-
-
State of the world, Eric? “why doesn’t Zarr…”
-
(overall goal of replacing TIFF stacks with chunked zarr)
-
axes (OME-Zarr)
-
sharded
-
EP: read-only? still open
-
EP: simple read-only would be a great start. (~20 files)
- some overlap with HDF5. (Josh: some benchmarking)
-
JM: kerchunk in front 20 HDF5s
-
EP: JSON too large
-
TK: hierarchy of indices so you don’t have to load all
-
TK: and recursive shards/zarr?
-
JM: Jeremy said it would be difficult for the client to make use of more than 2
-
EP: dask lazy-arrays of dask lazy-arrays of ….
-
JM: worried that something is in the python implementation as opposed to the “protocol” (shape + chunks + dimension_separator := all keys for chunks)
-
EP: “logical encoding” versus actual “path”
-
JK: indexing a point in an octree (paths on each dimension)
- separate issues. (optimal) access patterns for
particular use
-
two different things going on:
- how deeply to split
- how it gets implemented
- almost like a compression
- separate issues. (optimal) access patterns for
-
EP: logical key v. physical key
-
Image 5
Image 6
-
…. lots of talking (unfortunately not recorded)
-
JM: love the hierarchical “smart-mutable mapping” but Zarr protocol avoids knowing how to split byte streams into mutable mappings
- This gets us to the issue of (offset, length)
-
JK: spent lots of time with dask serialization protocol
-
“have a header on the byte string” “how many do I have?”
- shift all the metadata somewhere else? .zidx (for
the whole array?)
- mostly thinking of archival data (frequent read)
-
-
JM: Allowing Array._chunk_key to return (key, offset, length)
-
-
-
Tabled: netcdf-java (AGU permitting) - Josh