Skip to main content Link Search Menu Expand Document (external link)

11th August, 2021

Attendees: Ward Fisher, Josh Moore, Matt McCormick, Tobias Koelling, Hailey Johnson, Dennis Heimbigner, Erik Welch, Greg Lee, Ryan Williams

Image 4

    • Should be simple; needs C.

    • Would want to have a graph on top of sparse

      • (both of those are hard)
    • Looking for a binary format for sparse dataset

    • Interest? Can always do own thing but …

    • Timelines: meeting at HPC this year (connecting to other

      researchers)

      • next spring will cover more of it
    • Josh: do any of the GitHub issues cover it?

      • EW: still higher-level at the moment.
    • Greg: was looking at WSI slide data (75% background)

      • There’s a PR to not save those chunks
      • EW: scipy.sparse has block-compressed storage
    • Erik: start with v2 or v3?

      • Josh: as with multiscales probably best to start with v3
  • Ward

    • screensharing poster

    • UCAR interns presented posters a week or two ago

    • fsspec-reference-maker:

      pre-processing HDF to make it look like Zarr

    • Rich Signell & Martin Durant

    • <img src=”Pictures/10000201000005700000048F94793AAF08BAF2DD.png”

      style=”width:3.8465in;height:3.2244in” />

    • Josh: also presented to HUG. :thumbsup:

      • Will be interesting to see what other languages we can get it implemented in
    • Ward: resource limited. focusing on compression. but glad others

      are doing.

    • Tobias: HTTP proxy server which does range requests

      • See Trevor’s implementations
      • reference maker is maybe an opportunity to standardize
  • Tobias:

    • some interesting calls with IPFS

    • IPFS devs are interested in getting involved.

    • Not really from the zarr-python side. (more fsspec)

    • but there will be a question for other languages.

    • Matt: thanks for IPFS package. (good to have better support)

      • looking at CAR (tar file for IPFS)

      • similar to reference spec. pointers to similar locations.

      • Tobias: CAR is serialization format for multiple blocks.

        • each block may not be larger than 1 MB.

        • but there is a backend (badgerds) to store multiple

          blocks in one file

      • Matt: number of inodes

      • Tobias: I’ve currently 120GB IPFS data on laptop, trying to ramp up amount of data. 100 TBs+ should be fine according to IPFS/filecoin people

    • Josh: https://webknossos.org/ wanting a sharded format

    • Tobias: block size limit is for blocks in flight (for

      verification) but not for on disk.

  • Matt

  • Josh:

    • CSharp?

      • Ward: no specific CS expertise here but should be able to link to shared objs

      • Hailey: faster to get started with netcdf-c

      • Matt: good support for native loading. but needs managing.

        • starting on such a project. compile netcdf-c to WASM

        • helps with cross-platform issues.

        • Dennis: piece missing is something like libc

          (fileio, etc.)

        • Matt: emascripten is one library. system calls is WASI
    • [*deadlock

      ideas?*](https://github.com/zarr-developers/zarr-python/pull/725#issuecomment-894486877) PR 725

      • trying to make zarr-python more numpy like