Skip to main content Link Search Menu Expand Document (external link)

29th June, 2022

Attending: Davis Bennett (DV), Sanket Verma (SV), Josh Moore (JM), Jeremy Matiin-Shepard (JMS), Parth Tripathi (PT), Ward Fisher (WF), Hailey Johnson (HJ), Shivank Chaudhary (SC), Ryan Abernathey (RA) +30 min

Updates (SV):

Agenda:

  • ZEP1: https://github.com/zarr-developers/zeps/pull/1
    • Authored by Alistair and Jonathan
    • includes details on sharding & transformers
    • addresses pain points & lack of clarity in v2
    • Alistair to open spec changes against zarr-specs repo
    • see https://zarr.dev/zeps for these changes
    • comment on PR as desired
    • otherwise, merging very soon
    • further discussion to take place on the zarr-specs PR
  • Briefly (Josh), NFDI recommended for funding :tada:
    • https://twitter.com/notjustmoore/status/1541776908043567104
  • JMS spec discussions
    • NB: right forum? JM: just need to communicate thoughts back on the PR since there is no requirement to be at the community calls
    • Dimension labels
      • there seemed to be interest in writing it up as a spec
      • requirement that they are unique strings OR the empty string to say that they are unlabeled
      • DB: motivation for unlabeled? Currently all are unlabeled. DB: disagree they are all labeled with integers.
      • JMS: then strings are optional/additional alternatives.
      • DB: see it leading to issues. potentially: “if you add labels then you must add all”
      • JMS: case of automating inputs to outputs could lead to inventing fake labels but perhaps that’s preferable to empty
      • DB: drawback from type theory is that you want the unlabel to be a different type. JMS: Use Null? disallow "" anyway?
      • WF: dimensions are label and id parent? or conflating NC/H5
      • JMS: was just thinking within a given array. goal would be to not need to know it’s “dimension[2]”
      • DB: could see having arrays logically identical with different dimension ordering. want to enable use of, e.g., "z"
      • WF: “dim_$N” gets assigned automatically.
      • JM: need for buy-in from xarray and nczarr
      • JM: in .zattrs? .zarray? JMS: don’t really mind.
      • DB: err on the side of having zarrs more like numpy arrays
      • JM: names in numpy are part of the dtype
      • DB: backwards-compatible way to specify the defaults if they don’t exist
      • JMS: and added to the zarr-python library? Yes.
    • Single string to identify zarr root path + zarr array/dataset within root
      • SV: Greg left a comment today. See also shoyer issue
      • DB: an issue. problematic ergonomics
      • JMS: was hoping to find a resolution
      • JM: couple proposed
        • sensible defaults
      • DB: reason for separate hierarchy
        • JM: possible extensions (like consolidated)
        • JMS: range-requests to see full listings
      • RA: strongly believe that V3 doesn’t introduce such a breaking change
      • RA: NC uses path/to/file.h5/path/to/group
        • JM: would require an increased number of lookups for the root JSON
        • WF: correction – NC uses two strings
      • JMS: neuroglancer has a data source URL. can make up a convention but it would be nice to preserve the single-string semantics
      • RA: xarray only opens groups. more complicated for arrays.
      • RA: good to formalize the URI/URL semantics (good to specify your data with a string)
      • JMS: applies to groups, too.
      • RA: xarray supports extra path to a sub-group. also gaining datatree functionality.
        • DB: going into mainline? RA: Yes. DB: super cool.
      • DB: couldn’t you just pass the absolute.
        • JM: you don’t pass “data” or “meta”. only the logical group.
        • DB: that means that could completion won’t work. could irritate people.
        • DB: would pass the array. job of library is to find the array.
      • RA: use hash tag or standardized file ending (.zarr) to parse URL
        • DB: .zarr seems 100% reasonable (since slash is taken)
        • DB: recommendation for people who want to live their truth
        • JMS: would like to make this a MUST
      • DB: jpeg vs jpg vs …
        • RA: mimetype
      • JM: make the .json files the default?
        • RA: getting Zarr into STAC was problematic because it’s to a URL rather than a file. i.e. it fundamentally becomes a JSON file. Becomes a catalog.
        • DB: like it. Directories are not real, files are real.
        • JMS: could define a different ending?
        • RA: .json is good
        • JM: it’s .zarr.json which isn’t bad
        • DB: natural when moving from local file system to a KVS
        • RA: opens up absolute paths to chunks potentially
        • JMS: with more changes to the spec, yeah.
        • JM: consolidated metadata will be problematic.
  • DB: PR
    • mypy issues
    • annotations breaks linter
    • JM: generally :+1: for type annotations, also ok to start looking at dropping 3.7 now
  • Tabled
    • Support for inf/nan/binary data in attributes
    • Endianness
    • Zarr’s website
      • What do you feel about our current website?
      • What would you like to see in the new website?
      • Any ideas for good Jekyll/any static website generator themes?