2023-01-12

Attending: Jonathan Striebel (JS), Josh Moore (JM), Jeremy Maitin-Shepard (JMS), Ryan Abernathey (RA), Sanket Verma (SV), Ward Fisher (WF), Dennis Heimbigner (DH)

TL;DR:

SV started the meeting by asking everyone to review the his latest PR in the ZEPs repo which consolidates the discussion venues for the draft ZEP and minor additions to the webpage. After this the attendees started discussing the open issues for V3 (ZEP0001). We discussed on what should be the ideal /meta prefix; please see #177 for extensive discussions. Then we started chatting about having #192 which considers to remove entry point metadata.

Updates:

Meeting Minutes:

  • ZEP 1 issues that need attention:
    • prefix: https://github.com/zarr-developers/zarr-specs/issues/177
      • open question is the prefix for the chunk directory
      • potentially to-be-used for json files, etc.
      • also useful for extensions that add new folders
      • Dennis: zarr.chunks? Good to identify them. (since there are arbitrary delimiters)
        • with HDF5 people have experimented with accessing chunks directly, so need easy identification
      • Options:
        • _
        • __
        • _z_
        • z.
        • zarr.
        • _zarr_
      • Ryan: what’s the goal of the prefix?
        • JS: preventing node-name collision
        • JM: preventing extension collision
        • DH: DAP4 rule attempts to have self-assigned namespace, piggybacking on DNS (.ucar.edu)
        • RA: work through use cases, e.g. nczarr files
        • RA: won’t the ability to offload metadata into a separate document
        • DH: also apply it to keys within the attributes
        • RA: see ZEP4. special attributes is another discusion.
      • WF: A convention that _zarr are reserved is longer, but feels less prone to collision than _z
      • JMS: In general I think we want to reserve a prefix such as “_” for zarr itself and extensions, and then perhaps a subset of that should be reserved for just zarr itself (not extensions).
      • NB/DH: would suggest a top-level group
        • DH: Do you have sufficient metadata?
        • What does it mean to access a raw variable?
        • JM: there’s still a directory. metadata+chunks (but a place to put extension files as well)
        • JM: bidirectionality would need some work to make sure that a group doesn’t magically appear
        • RA: cf. how GDAL and Geo-tiff (etc) handle this
        • DH: two purposes of group – namespace and a place to store attributes that aren’t part of the variable
        • DH: example is group-level superblock marker
      • chat
        • RA: I worry that if we make _ a disallowed prefix, lots of datasets may not work
        • RA: I feel like there is plenty of data out there in the wild that has a _ as the first character in a variable
        • RA: Using ‘’ as a convention in netCDF is a soft limit, not a hard; it’s part of the convention that it’s reserved, but if users disregard that, they can use ‘’ with their own attributes. Whatever convention we decide upon can be phrased as guidance, without necessarily breaking extant datasets.
      • deciding
      • JS: compatiblity extension for v2 is valid or not
      • JM: could see zarr.json, zarr.chunks, zarr.extensions
    • no root group: https://github.com/zarr-developers/zarr-specs/issues/192
      • explicit groups can have transformers
      • do we always need to have a zarr.json for an array
      • if an array has none, how do we open it?
      • search up the hiearchy? too inefficient?
      • JMS: group level transformer ten you can’t make any assumptions about what’s underneath it (e.g. a redirect)
      • JMS: without searching, the group with the transform will become a “root”
      • RA: need URL syntax for group#path if we have transformers
      • JM: still need search up for desktop clients. no URL syntax for searching down
      • RA: formalize “entrypoint” what can and cannot be opened
        • shouldn’t be able to open a chunk (and figure out that it’s part of an array)
      • JS: alternatively, you MUST have a zarr.json
        • RA: like that one. for an entry point, there should be a zarr.json
        • That should be the one entrypoint definition.
        • DH: defines a leaf, everything below doesn’t exist externally.
        • RA: would help to look at hierarchies. (too abstract)
        • DH: a bit like posix mountpoints? driver is responsible for interpretation
    • URL syntax: https://github.com/zarr-developers/zarr-specs/issues/132
      • little bit via the previous conversation