29th July, 2020

Attending: John Kirkham, Matthias Bussonnier, Josh Moore, Ryan Abernathy,

Ward Fisher, Martin Durant, Nicholas Sofroniew, Alistair Miles,

Dennis Heimbigner, Andrew

Apologies:

  • Standing items

    • Introductions~~: new joinees should feel welcome to

      introduce themselves and their interest in zarr/n5.~~

    • Alistair time off, Report: All was good.

    • **Quantstack: **

      • Some administration work remaining. (John to ping, done)
      • Scope of project: C vs. C++
      • Which stores to implement
      • Some boundary issues between libraries (xarray+dask+zarr v xtensor+zarr)
      • Support not-multiple-of-8bit types?
      • Dynamic extensions should be find. All a question of tradeoffs (DLL vs. header-only library vs. …)
      • How much of numcodecs to implement? Cover next Friday?
  • Blosc project, applying for EOSS grant (Josh)

    • Blosc 2 codec

    • NDLZ codec

    • Caterva

      • What is the main motivation for separate development?
      • What functionalities do they offer (that zarr doesn’t)?
  • TileDB (Ryan)

    • Reached out for a conversation.

    • 30M Series-A funding round. 20 people, full-time.

    • Launching TileDB Cloud

    • Different motivations. Discussed avenues for collaboration.

    • Looking at xarray backend. Interested in netcdf compatibility.

    • Discussed how tiledb format works. Clever things. Don’t

      outsource compression. Have 2 tiers of chunking. 64KB at the low-level to align with L2 caches, etc. Also works around having too many files.

    • Strategy for the relationship.

    • Dennis: single implementation problem like HDF5?

      • Ryan: that’s the benefit of Zarr is the openness of the process and the multiple implementations.
      • Martin: simplicity is worth a lot if you are going to read all of your data in reasonable chunks in parallel
      • Alistair: would like to see benchmarks for different access patterns.
      • Ryan: to do it right requires dedicated funding (but would also like to see it done). CS PhD student? Worse to do it wrong rather than not at all.
      • To TileDB: “Go for it”
    • Nick: was also reached out to.

      • Perhaps reslicing is a benefit?
      • Martin: in place updating?
      • Ryan: Both analysis-optimized on-disk formats
      • Martin: look at some of their implementations, e.g. subranges. Possible, but depends on which compression formats.
      • Ryan: dask has dask v. spark page. Very honest. We should try to do the same: v HDF5 and v TileDB (+1)
      • Nick: see NASA comparison – https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20200001178.pdf
      • John: inclusivity of the spec. Engaging on spec development?
  • Add Community Topics Here