11th August, 2021
Attendees: Ward Fisher, Josh Moore, Matt McCormick, Tobias Koelling, Hailey Johnson, Dennis Heimbigner, Erik Welch, Greg Lee, Ryan Williams
Misc.: abstracts submitted to AGU
- Erik? Interest in sparse? (Gitter) ndimensional
Image 4
Should be simple; needs C.
Would want to have a graph on top of sparse
- (both of those are hard)
Looking for a binary format for sparse dataset
Interest? Can always do own thing but …
- Timelines: meeting at HPC this year (connecting to other
- next spring will cover more of it
Josh: do any of the GitHub issues cover it?
- EW: still higher-level at the moment.
Greg: was looking at WSI slide data (75% background)
- There’s a PR to not save those chunks
- EW: scipy.sparse has block-compressed storage
Erik: start with v2 or v3?
- Josh: as with multiscales probably best to start with v3
screensharing poster
UCAR interns presented posters a week or two ago
- fsspec-reference-maker:
pre-processing HDF to make it look like Zarr
Rich Signell & Martin Durant
- <img src=”Pictures/10000201000005700000048F94793AAF08BAF2DD.png”
style=”width:3.8465in;height:3.2244in” />
Josh: also presented to HUG. :thumbsup:
- Will be interesting to see what other languages we can get it implemented in
- Ward: resource limited. focusing on compression. but glad others
are doing.
Tobias: HTTP proxy server which does range requests
- See Trevor’s implementations
- reference maker is maybe an opportunity to standardize
some interesting calls with IPFS
IPFS devs are interested in getting involved.
Not really from the zarr-python side. (more fsspec)
but there will be a question for other languages.
Matt: thanks for IPFS package. (good to have better support)
looking at CAR (tar file for IPFS)
similar to reference spec. pointers to similar locations.
Tobias: CAR is serialization format for multiple blocks.
each block may not be larger than 1 MB.
but there is a backend (badgerds) to store multiple
blocks in one file
Matt: number of inodes
Tobias: I’ve currently 120GB IPFS data on laptop, trying to ramp up amount of data. 100 TBs+ should be fine according to IPFS/filecoin people
Josh: https://webknossos.org/ wanting a sharded format
- but also Caterva (https://blosc.github.io/caterva-scipy21/#/16)
Tobias: block size limit is for blocks in flight (for
verification) but not for on disk.
looking into adding xarray & dimension separator
xarray not working
Ward: no specific CS expertise here but should be able to link to shared objs
Hailey: faster to get started with netcdf-c
Matt: good support for native loading. but needs managing.
starting on such a project. compile netcdf-c to WASM
helps with cross-platform issues.
- Dennis: piece missing is something like libc
(fileio, etc.)
- Matt: emascripten is one library. system calls is WASI
ideas?*](https://github.com/zarr-developers/zarr-python/pull/725#issuecomment-894486877) PR 725
- trying to make zarr-python more numpy like