21st April, 2021
Attendees: Ward, Josh, Matthias, Dennis, Jackson, RyanW., Greg
-
EOSS 4: received invitation (May 19)
- Ward: chair Unidata DEI. Developer staff is above industry
standard.
-
Matthias: NumFOCUS overhead goes to inclusive ecosystem
- Quansight numbers? Unclear. Split overhead.
- Ward: chair Unidata DEI. Developer staff is above industry
-
Genomic data in Zarr → Alistair
- Josh: see
- Josh: see
-
Ward: nczarr 4.8 released (!)
-
Represents a different data model? Wasn’t the intent
-
But has been released. Kudos to Dennis.
-
4.8.1 to be released ASAP with xarray support.
-
Josh: similar to xarray.
-
Dennis: added notes to the documentation about compatibility
-
great lengths that existing zarr implementation should read any nczarr file. Works to some degree. Can also read any existing zarr file.
-
Pure superset for supporting netcdf 4 features
-
Limitation: zarr can read nczarr if willing to:
-
ignore keys that are unknown
-
ignore objects that are unknown
-
-
Treading on thin ice. Hopefully people will ignore
-
-
Josh: good to clarify these holes in the spec on the repo
- Dennis: looking at the extensions if there’s a way to adapt, but
good to have those conversations.
- The HDF Group (THG) is hosting a presentation on May 14 to lay
out their own cloud efforts. Registration is free, information here: https://zoom.us/meeting/register/tJYud-qgrjwjG91UF7SM7PiRU_QQQZ3jE6nG
-
-
(v2) Nested arrays
-
Dennis: trying to implement. One minor bug but working ok.
-
Those who chimed in were happy with it.
-
-
(v3) Matthias
-
Getting dimension_separator in
-
Rebasing base store work on that
-
Moving forward!
-
-
Ryan: next step on contract round 1
-
Report due at the end of the month
-
Ask for another extension? Yeah, let’s go for it.
-
-
European add on: Tobias Koelling, Stephan Saalfeld
-
Timezones
- Saalfeld: caching JSON is a huge win. Also think about file
listing.
Anything that incurs latency.- Containers are very full, multiscale.
- Tree discovery is very slow.
- Async helps, but caching metadata is really useful.
- Josh: tricky semantics that you
- S. for a read instance on one computer, changes will be consistent
- Josh: timeout? by size?
- S: not yet. Really small. Also assume that these readers are short-lived.
- Tobias: consolidating data in atmospheric sciences (EUREC4A
ATOMIC)
- want to store data distributed (e.g. IPFS)
- could imagine zarr on IPFS
- https://pypi.org/project/ipfsspec/
- https://github.com/d70-t/ipfszarr
- an ipfs zarr dataset!
-