29th June, 2022
Attending: Davis Bennett (DV), Sanket Verma (SV), Josh Moore (JM), Jeremy Matiin-Shepard (JMS), Parth Tripathi (PT), Ward Fisher (WF), Hailey Johnson (HJ), Shivank Chaudhary (SC), Ryan Abernathey (RA) +30 min
Updates (SV):
- Zarr-Python 2.12.0 has been released! 🎉
Agenda:
- ZEP1: https://github.com/zarr-developers/zeps/pull/1
- Authored by Alistair and Jonathan
- includes details on sharding & transformers
- addresses pain points & lack of clarity in v2
- Alistair to open spec changes against zarr-specs repo
- see https://zarr.dev/zeps for these changes
- comment on PR as desired
- otherwise, merging very soon
- further discussion to take place on the zarr-specs PR
- Briefly (Josh), NFDI recommended for funding :tada:
- https://twitter.com/notjustmoore/status/1541776908043567104
- JMS spec discussions
- NB: right forum? JM: just need to communicate thoughts back on the PR since there is no requirement to be at the community calls
- Dimension labels
- there seemed to be interest in writing it up as a spec
- requirement that they are unique strings OR the empty string to say that they are unlabeled
- DB: motivation for unlabeled? Currently all are unlabeled. DB: disagree they are all labeled with integers.
- JMS: then strings are optional/additional alternatives.
- DB: see it leading to issues. potentially: “if you add labels then you must add all”
- JMS: case of automating inputs to outputs could lead to inventing fake labels but perhaps that’s preferable to empty
- DB: drawback from type theory is that you want the unlabel to be a different type. JMS: Use Null? disallow
""
anyway? - WF: dimensions are label and id parent? or conflating NC/H5
- JMS: was just thinking within a given array. goal would be to not need to know it’s “dimension[2]”
- DB: could see having arrays logically identical with different dimension ordering. want to enable use of, e.g.,
"z"
- WF: “dim_$N” gets assigned automatically.
- JM: need for buy-in from xarray and nczarr
- JM: in .zattrs? .zarray? JMS: don’t really mind.
- DB: err on the side of having zarrs more like numpy arrays
- JM: names in numpy are part of the dtype
- DB: backwards-compatible way to specify the defaults if they don’t exist
- JMS: and added to the zarr-python library? Yes.
- Single string to identify zarr root path + zarr array/dataset within root
- SV: Greg left a comment today. See also shoyer issue
- DB: an issue. problematic ergonomics
- JMS: was hoping to find a resolution
- JM: couple proposed
- sensible defaults
- DB: reason for separate hierarchy
- JM: possible extensions (like consolidated)
- JMS: range-requests to see full listings
- RA: strongly believe that V3 doesn’t introduce such a breaking change
- RA: NC uses path/to/file.h5/path/to/group
- JM: would require an increased number of lookups for the root JSON
- WF: correction – NC uses two strings
- JMS: neuroglancer has a data source URL. can make up a convention but it would be nice to preserve the single-string semantics
- RA: xarray only opens groups. more complicated for arrays.
- RA: good to formalize the URI/URL semantics (good to specify your data with a string)
- JMS: applies to groups, too.
- RA: xarray supports extra path to a sub-group. also gaining datatree functionality.
- DB: going into mainline? RA: Yes. DB: super cool.
- DB: couldn’t you just pass the absolute.
- JM: you don’t pass “data” or “meta”. only the logical group.
- DB: that means that could completion won’t work. could irritate people.
- DB: would pass the array. job of library is to find the array.
- RA: use hash tag or standardized file ending (.zarr) to parse URL
- DB: .zarr seems 100% reasonable (since slash is taken)
- DB: recommendation for people who want to live their truth
- JMS: would like to make this a MUST
- DB: jpeg vs jpg vs …
- RA: mimetype
- JM: make the .json files the default?
- RA: getting Zarr into STAC was problematic because it’s to a URL rather than a file. i.e. it fundamentally becomes a JSON file. Becomes a catalog.
- DB: like it. Directories are not real, files are real.
- JMS: could define a different ending?
- RA: .json is good
- JM: it’s .zarr.json which isn’t bad
- DB: natural when moving from local file system to a KVS
- RA: opens up absolute paths to chunks potentially
- JMS: with more changes to the spec, yeah.
- JM: consolidated metadata will be problematic.
- DB: PR
- mypy issues
- annotations breaks linter
- JM: generally :+1: for type annotations, also ok to start looking at dropping 3.7 now
- Tabled
- Support for inf/nan/binary data in attributes
- Endianness
- Zarr’s website
- What do you feel about our current website?
- What would you like to see in the new website?
- Any ideas for good Jekyll/any static website generator themes?