11th August, 2021
Attendees: Ward Fisher, Josh Moore, Matt McCormick, Tobias Koelling, Hailey Johnson, Dennis Heimbigner, Erik Welch, Greg Lee, Ryan Williams
-
Misc.: abstracts submitted to AGU
-
Introductions
- Erik? Interest in sparse? (Gitter) ndimensional
Image 4
-
-
Should be simple; needs C.
-
Would want to have a graph on top of sparse
- (both of those are hard)
-
Looking for a binary format for sparse dataset
-
Interest? Can always do own thing but …
- Timelines: meeting at HPC this year (connecting to other
researchers)
- next spring will cover more of it
-
Josh: do any of the GitHub issues cover it?
- EW: still higher-level at the moment.
-
Greg: was looking at WSI slide data (75% background)
- There’s a PR to not save those chunks
- EW: scipy.sparse has block-compressed storage
-
Erik: start with v2 or v3?
- Josh: as with multiscales probably best to start with v3
-
-
Ward
-
screensharing poster
-
UCAR interns presented posters a week or two ago
- fsspec-reference-maker:
pre-processing HDF to make it look like Zarr
-
Rich Signell & Martin Durant
- <img src=”Pictures/10000201000005700000048F94793AAF08BAF2DD.png”
style=”width:3.8465in;height:3.2244in” />
-
Josh: also presented to HUG. :thumbsup:
- Will be interesting to see what other languages we can get it implemented in
- Ward: resource limited. focusing on compression. but glad others
are doing.
-
Tobias: HTTP proxy server which does range requests
- See Trevor’s implementations
- reference maker is maybe an opportunity to standardize
-
-
Tobias:
-
some interesting calls with IPFS
-
IPFS devs are interested in getting involved.
-
Not really from the zarr-python side. (more fsspec)
-
but there will be a question for other languages.
-
Matt: thanks for IPFS package. (good to have better support)
-
looking at CAR (tar file for IPFS)
-
similar to reference spec. pointers to similar locations.
-
Tobias: CAR is serialization format for multiple blocks.
-
each block may not be larger than 1 MB.
-
but there is a backend (badgerds) to store multiple
blocks in one file
-
-
Matt: number of inodes
-
Tobias: I’ve currently 120GB IPFS data on laptop, trying to ramp up amount of data. 100 TBs+ should be fine according to IPFS/filecoin people
-
-
Josh: https://webknossos.org/ wanting a sharded format
- but also Caterva (https://blosc.github.io/caterva-scipy21/#/16)
-
Tobias: block size limit is for blocks in flight (for
verification) but not for on disk.
-
-
Matt
-
looking into adding xarray & dimension separator
-
xarray not working
-
see
-
-
Josh:
-
CSharp?
-
Ward: no specific CS expertise here but should be able to link to shared objs
-
Hailey: faster to get started with netcdf-c
-
Matt: good support for native loading. but needs managing.
-
starting on such a project. compile netcdf-c to WASM
-
helps with cross-platform issues.
- Dennis: piece missing is something like libc
(fileio, etc.)
- Matt: emascripten is one library. system calls is WASI
-
-
-
[*deadlock
ideas?*](https://github.com/zarr-developers/zarr-python/pull/725#issuecomment-894486877) PR 725
- trying to make zarr-python more numpy like
-