2022-10-06

Attending: Ward Fisher (WF), Josh Moore (JM), Jeremy Maitin-Shepard (JMS), Greg Lee (GL), Jonathan Striebel (JS)

TL;DR

JM shared that there were some good conversations around OME-Zarr yesterday. The summary is available here. WF shared that Kitware is looking for partners and a link to the sign-up form. GL shared that during the CZI Open-Science Summit 2022, he worked on writing tests for Xarray. After this, there was an extensive discussion on URL syntax initiated by JMS.

Updates:

Meeting Minutes:

  • URL syntax? (JMS)
    • helps to figure out the metadata location.
    • Josh: great idea. have several ongoing discussions at the NGFF level
      • current proposal would be to support URIs internally (relative, absolute, remote)
      • however, in V2
    • JMS: in v3 the root exists
      • though not entirely clear that the new metadata organization is necessary
      • designed for S3 where there’s no directory, but other problems exist
    • Josh: summarized previous discussions for Greg
    • GL, thoughts on the V3 situation?
    • GL: at the moment, you need helper methods to do that.
    • JM: one proposal was to have the metadata be the main directory which lets you then bootstrap the chunk loading
    • JMS: support multiple?
    • JM: conceivably. as extension or configuration.
    • JM: downside for consolidated metadata is that nothing exists in the metadata hierarchy
      • workaround of having a thin-hierarchy only with references to where the metadata exists
    • JS: losing the ability to be able to next any hierarchy. (everything is a root)
      • JM: are we proposing rolling it back completely
    • JMS: problem is the URI+rootpath metadata
    • JS: walking up the hierarchy would be an option (URL doesn’t actually point)
    • JMS: would be nicer if you don’t have to perform a search
    • Use case
      • URL case
      • Desktop double click on something
      • Similar issue: Zips :warning:
        • JMS: have an additional level
        • JM: except ZipStore v2 assumes the whole zip is a zgroup
        • JS: propose zip is a special case which is easier
          • JMS: unless you are mixing volumetric with a zarr then it wouldn’t be at the top-level
      • Btree (JMS): need to be able to compose multiple layers (similar to fsspec and double colons)
      • Remote chunk store (or point to V2 chunks)
      • Renaming folders (keep data with arrays)
    • Options
      • Keep “/meta”, clients must know
      • Drop “/meta”, direct URLs
      • ?param syntax
      • #param syntax
      • Separator syntax (e.g. “//”)
      • root dir ends in .zarr
      • fsspec :: separator
      • multiple protocols (git+ssh, zip+zarr)
    • further discussion
      • JS: without /meta and .zarr requirement, you still don’t know where the root is
      • JS: if you drop “/meta” then you can’t name anything “/data”
        • JMS: could use something more obfuscated
      • JS: why split?
        • JMS: if you are not using the filesystem (s3 or gcs) and you want to list all the metadata, it’s not (as) efficient
      • JM: “data” could be registered in the metadata so it’s a known (and configurable) thing
      • WF: NC anything with leading underscore is assumed reserved for the library
        • permitted to create them, but the spec says “please don’t”
        • JM: .z prefix
        • WF: utilities and tools can scrape everything with that
        • WF: also don’t have to put too much thought into new features
        • JMS: would prefer not a . prefix because of archiving tools, etc.
        • then _z?
      • JMS: root metadata file doesn’t really do anything
        • JS: creates ambiguity
        • JM: think it was largely for bootstrapping global plugins (e.g. transformers)
        • JS: perhaps V2 compatibility
        • JMS: not clear you would nest sharding with other transformers. it would be the thing applied to the chunks.
        • JS: the metadata needs to be somewhere, and for that can be at the array level
    • brief summary
      • zarrs are essentially a metadata hierarchy
      • that configure (possibly remote) chunk stores
      • and the root is identified with .zarr