25th August, 2021
Attendees: Josh Moore, John K., Eric Perlman, Ward, Hailey, Greg Lee, Ryan A.
-
Josh: zarr-python 2.9 (without Python 3.6)
- (The peasants rejoice)
-
Greg & V3: going back to store PR (based on newly merged PRs)
-
recursive imports, etc. Needs some fixing up.
- Zarr test suite is largely working on v3 (fixed with changing
the defaults)
-
Edge case of treating root/data/ as a group
-
(PRs to be split up logically for opening)
-
Majority of uses are: zarr.open, dask.from_zarr, xarray, etc.
-
Ryan: less painful than Python 2 to 3 …
- Use configuration options for xarray at least
-
Ryan: also consolidated metadata is used a lot
- Boosted cloud performance strongly!
- https://github.com/zarr-developers/zarr-specs/issues/113
- Greg: also some tests are using datetime, but that’s not in the
base spec.
-
-
Ryan & Filecoin/IPFS
- protocol labs reached out to pangeo
(for climate data)
-
“distributed content-addressable storage layer for the internet”
-
all data is identified with a hash; decentralized
- filecoin incentives the miners to
provide public storage
-
potential alternative to S3 for internet-scale storage
-
creating fsspec interface to IPFS. should “just work”
- https://github.com/zarr-developers/zarr-specs/issues/82
- protocol labs reached out to pangeo
-
Ryan & Caterva
-
two levels of chunking like tiledb
-
looking into slicing into along n-dimensions.
-
https://github.com/zarr-developers/zarr-python/issues/713#issuecomment-903755601
-
Need a numcodecs wrapper (C-API)
-
No explicit cloud solution
-
Eric: similar to sharding in neuroglancer
- Ryan: could imagine (in 2 years) where this is the default codec
(builds on blosc)
- Eric: if you throw out general nature, like append-only, then
you can optimize.
- cf.
- Eric: blosc-1 v. blosc-2? John: e.g. 32-bit to 64-bit breaking
changes
- Josh: identifying the changes (also for IPFS) that Zarr needs to
add
-
John: invite Francesc to this meeting
- Ryan: need to update numcodecs interface that we want a slice
from an ND array. (needs understanding of caterva, blosc, cython, …)
- John: implement chunk store (?) on top of caterva and turn off
filtering, …
-
“Smarter chunk for Zarr” (deprecating partial selection, etc.)
-
Ryan: instead of reading bytes, decompressing, co-oercing to numpy array, caterva array would be our chunk.
-
might need to coerce less eagerly.
-
then slicing could propagate through.
-
-
- John: technical problem first then the social problem.
-
-
Ryan & hierarchical support in xarray
-
https://github.com/pydata/xarray/issues/4118#issuecomment-873179375
-
Tom Nicholas has something working
-
Good time to check in and see what’s going on.
-