26th August, 2020

Attending: Matthias, Alistair (on mobile internet), , Josh Moore, John Kirkham, Martin Durant, Ryan Williams

  • Standing items

    • Introductions: new joinees should feel welcome to introduce

      themselves and their interest in zarr/n5.

Josh: Spent last week refactoring our library on zarr. We’re defining conventions on top of zarr. When have links from one place to another, need to avoid recursion. Took some rewriting.

John: Did you see issue Alistair filed about extensions? Have writable consolidated […]

Josh: Use case, generate a segmented image, want to write it back into the zarr, with metadata saying it belongs to an image.

John: When creating array with segmentation, know some attributes?

Josh: I can make it work. But if testing five different segmentation routines, they’d need to sync to get metadata updated correctly.

Matthias: Two related issues. In v3 attributes file is merged into array/group metadata. Another issue about really large attributes.

Josh: What Martin says makes sense in lots of use cases, consolidated, be careful when to update.

… heading towards use .zattrs a lot, including for links between arrays. Alistair: may simply have to activate locking (or handle it in the application)

Matthias does that point to needing a server?

Josh: Need upsert. Like in Mongo.

Martin: At the moment, metadata is a file. Content is opaque. But if used something like a database, could do something more fine-grained.

Josh: Could attributes be a higher-level interface?

Matthias: If listable store, can explode attributes, have one file per attribute. V3 also has no assumption of listing of groups and arrays.

Josh: Some more things to add in comparisons with tiledb. They have query methods over all metadata. Searchable metadata store.

Martin: Great if could have efficient ways to transfer between storage systems. JSON blobs transfer well, e.g., some databases …

Matthias: What is zarr format at rest? What is it when it’s being used? Assumptions about zarr when it’s in use, may be different from when it’s archived? E.g., when it’s in use, maybe some different files could be present. When hierarchy is closed, something is responsible for collapsing metadata.

Josh: Ideas around importing zarr data into other systems with a richer interface.

Matthias: E.g., if you turn on consolidated metadata, zarr refuses to touch individual files. Some tensions, recognising they exist is useful, silo discussion. Give use cases names.

Alistair: spectrum of patterns – write-once-(parallel)-read-many (closed); continuous-update (accumulation); and those in between:

  • Linking between arrays, needs multiple metadata writes into a single node, possibly in parallel
  • Searching / querying metadata (keeping index in sync)

Matthias: Metadata/linking problem involves append. Don’t know in advance how many items.

Josh: User is looking for primitives, deal with all metadata as a tree. Dennis brought up a different use case, consolidate to a group level, could also see having sub-node (array/group) trees, breaking them on the top-level keys.

Matthias: Going back to v2 versus v3. v3 has a single root.

Alistair: v3 doesn’t say anything about synchronization. Left completely to the libraries/applications.

Matthias: zarr archive >> … >> zarr dynamic >> … . “Everything that is consistency-based must go through a separate API” would be a good thing to document.

zpath? e.g. “/foo/bar/baz@spam/eggs” refers to the attribute “spam/eggs” on the hierarchy node “/foo/bar/baz”

i.e., in python API, could do either: