Skip to main content Link Search Menu Expand Document (external link)

15th December, 2021

Attending: Josh Moore, Eric Perlman, Tobias Kölling, Ryan Williams, John Kirkham

  • Hiring/money

    • no outreachy

    • dozen or so applications for comm. mgr

    • scalable minds working on sharding

    • b-open for xarray, multiscales, and more extension-like stuff

  • State of the world, Eric? “why doesn’t Zarr…”

    • (overall goal of replacing TIFF stacks with chunked zarr)

    • axes (OME-Zarr)

    • sharded

      • EP: read-only? still open

      • EP: simple read-only would be a great start. (~20 files)

        • some overlap with HDF5. (Josh: some benchmarking)
      • JM: kerchunk in front 20 HDF5s

      • EP: JSON too large

      • TK: hierarchy of indices so you don’t have to load all

      • TK: and recursive shards/zarr?

      • JM: Jeremy said it would be difficult for the client to make use of more than 2

      • EP: dask lazy-arrays of dask lazy-arrays of ….

      • JM: worried that something is in the python implementation as opposed to the “protocol” (shape + chunks + dimension_separator := all keys for chunks)

      • EP: “logical encoding” versus actual “path”

      • JK: indexing a point in an octree (paths on each dimension)

        • separate issues. (optimal) access patterns for

          particular use

        • two different things going on:

          • how deeply to split
          • how it gets implemented
        • almost like a compression
      • EP: logical key v. physical key

      • Image 5 Image 6

      • …. lots of talking (unfortunately not recorded)

      • JM: love the hierarchical “smart-mutable mapping” but Zarr protocol avoids knowing how to split byte streams into mutable mappings

        • This gets us to the issue of (offset, length)
      • JK: spent lots of time with dask serialization protocol

        • “have a header on the byte string” “how many do I have?”

        • shift all the metadata somewhere else? .zidx (for

          the whole array?)

        • mostly thinking of archival data (frequent read)
      • JM: Allowing Array._chunk_key to return (key, offset, length)

  • Tabled: netcdf-java (AGU permitting) - Josh

    • [*Maven

      Central*](https://search.maven.org/search?q=g:edu.ucar) v. Nexus, Filters,