8th May, 2019

Time: 20:00 BST (15:00 EDT).

Joining instructions see above.

Intend to attend: Alistair, Josh, Fabian, Ryan W

Attending: JohnK, RyanW, Alistair, Fabian, Ward, JoshM, MartinD

Standing items

  • Introductions: new joinees should feel welcome to introduce themselves and their interest in zarr/n5.

TileDB - 20:03 UK

  • Compare their spec structure, e.g., ragged versus non-ragged have been split
  • Perhaps similar to the relationship with HDF5. We’re interested in providing neat things, e.g. a potential backend.
  • Chatting or even meeting up at some venue.
  • Martin: extracting parts of chunks and updating.
  • Performance / Indexing extension specs? (cF. consolidated metadata). Also multiple methods of mapping chunks to cloud objects.

Spec development - 20:17 UK

  • Core protocol v3.0 development process – have created a development branch called core-protocol-v3.0-dev, suggest we make PRs against that branch with specific sections
  • Core protocol v3.0 - conceptual model : https://github.com/zarr-developers/zarr-specs/pull/17 – has been merged into core-protocol-v3.0-dev branch but comments and discussion very welcome, can be revisited

  • Core protocol v3.0 - data types : https://github.com/zarr-developers/zarr-specs/pull/18 – comments and discussion welcome, particularly regarding division between core and extensions

    • Data types as extension point. (Nice)
  • Core protocol v3.0 - chunk grids : https://github.com/zarr-developers/zarr-specs/pull/22 – comments and discussion welcome, particularly regarding division between core and extensions

    • Grid as an extension point

    • E.g. rectilinear grid (i.e. log paper)

    • Or extension in the negative direction

    • Fabian: how do extensions interact? A: Probably can’t mix

      multiple grids (or multiple data types).

  • Extensions:

    • JohnK: how to manage them? Packaging them for distribution? A:

      imagined with e.g. zarr-py it might choose to implement the core spec and some set of extensions. Josh: specifying in the specs a way to degrade (when possible)? A: “must understand” extensions as a field.

    • But also `pip install`-ing. A: Would hope that most are just

      in zarr-py (given numpy) Fabian: from Julia perspective, it would be good to be able to document what is included and what is not. (Goal of course is to implement everything)

    • Josh: also layering specs on top of each other. (Implementors

      should naturally start with the lowest level spec) cF. image format drawing A: assuming NC spec is included as a protocol spec. (Currently working with the v2 spec?) Doing prep work for the C library. Would be good if v3 was available when code starts getting written. Try to minimize new API calls when adding new formats. Instead use “file flags”. nc_open goes to a dispatch table. Easier for programmers. Would like for this to be true with zarr. (modulo mechanism for passing credentials for accessing the cloud)

  • Keeping entire core fully implementable. (Satisfying)

    • Listing extensions, storage backends, codecs (complexity,


    • Fabian: are there core codecs? W: unidata dealing with this for

      netcdf. For extended data model+storage (HDF5 based, using libhdf5) have user-defined compressors via their codec support. Finding (opinion) that it’s great to give users that ability, but the less common, the less portal. (Caused issues with e.g. the threadless server) Need a way to query the required codec. (Part of the spec)

    • Josh: agreed, the balance between having data that will last and

      encouraging advancement in compression space (without creating N new formats)

    • A: communication status, etc. (But blosc kicked all of this off


    • Perhaps meeting up at

      https://www.iscb.org/ismbeccb2019 ?

  • Core protocol v3.0 - memory layouts : https://github.com/zarr-developers/zarr-specs/pull/24 – as above, comments and discussion welcome. (warning: 1 hour warning)

    • The layout within a chunk. E.g. dictionary-based encoding.

Multiscale use case