25th August, 2021

Attendees: Josh Moore, John K., Eric Perlman, Ward, Hailey, Greg Lee, Ryan A.

  • Josh: zarr-python 2.9 (without Python 3.6)

    • (The peasants rejoice)
  • Greg & V3: going back to store PR (based on newly merged PRs)

    • recursive imports, etc. Needs some fixing up.

    • Zarr test suite is largely working on v3 (fixed with changing

      the defaults)

    • Edge case of treating root/data/ as a group

    • (PRs to be split up logically for opening)

    • Majority of uses are: zarr.open, dask.from_zarr, xarray, etc.

    • Ryan: less painful than Python 2 to 3 …

      • Use configuration options for xarray at least
    • Ryan: also consolidated metadata is used a lot

    • Greg: also some tests are using datetime, but that’s not in the

      base spec.

  • Ryan & Filecoin/IPFS

    • protocol labs reached out to pangeo

      (for climate data)

    • “distributed content-addressable storage layer for the internet”

    • all data is identified with a hash; decentralized

    • filecoin incentives the miners to

      provide public storage

    • potential alternative to S3 for internet-scale storage

    • creating fsspec interface to IPFS. should “just work”

    • https://github.com/zarr-developers/zarr-specs/issues/82
  • Ryan & Caterva

    • two levels of chunking like tiledb

    • looking into slicing into along n-dimensions.

    • https://github.com/zarr-developers/zarr-python/issues/713#issuecomment-903755601

    • Need a numcodecs wrapper (C-API)

    • No explicit cloud solution

    • Eric: similar to sharding in neuroglancer

    • Ryan: could imagine (in 2 years) where this is the default codec

      (builds on blosc)

    • Eric: if you throw out general nature, like append-only, then

      you can optimize.

    • cf.

      https://blosc.github.io/caterva-scipy21/#/14

    • Eric: blosc-1 v. blosc-2? John: e.g. 32-bit to 64-bit breaking

      changes

    • Josh: identifying the changes (also for IPFS) that Zarr needs to

      add

    • John: invite Francesc to this meeting

    • Ryan: need to update numcodecs interface that we want a slice

      from an ND array. (needs understanding of caterva, blosc, cython, …)

    • John: implement chunk store (?) on top of caterva and turn off

      filtering, …

      • “Smarter chunk for Zarr” (deprecating partial selection, etc.)

      • Ryan: instead of reading bytes, decompressing, co-oercing to numpy array, caterva array would be our chunk.

        • might need to coerce less eagerly.

        • then slicing could propagate through.

    • John: technical problem first then the social problem.
  • Ryan & hierarchical support in xarray