2024-02-08

Attending: Sanket Verma (SV), Josh Moore (JM), Jeremy Maitin-Shepard (JMS)

TL;DR:

Discussion revolves around refining storage transformer interfaces within ZEP1, exploring options for unified JSON representations, and considering the integration of Parquet with Zarr, alongside ZEP0 discussions focusing on streamlining processes while ensuring compatibility with existing bylaws.

Meeting Minutes:

  • https://github.com/zarr-developers/zarr-specs/issues/287
    • JM: The state we left ZEP1 and storage transformer, where does this fit in?
    • JMS: Wrap the key-value interface in the existing implementation
    • JMS: Kerchunk approach has 1 .JSON file and the proposed approach has 2 .JSON files
    • JMS: Specify any array in-line?
    • JM: May look like specifying kerchunk in Zarr which we may or may not want to do
    • JMS: Kerchunk approach has keys and values - not exactly readable
    • JMS: Various flavours of .JSONs can we somehow unify them? - Does it help to have a representation for inline arrays?
    • JM: Will comment on the Joe’s issue
    • JMS: Would be good to get Martin’s POV
    • JMS: Kerchunk parquet format is worth looking at
    • JM: Parquet folks are looking to combine parquet and Zarr - could look at the tabular data as 2D array
    • JMS: Do you need to download the whole parquet to access it?
    • JM: I think the offset works in parquet
    • JM: https://spatialdata.scverse.org/en/latest/
    • JMS: https://github.com/google/neuroglancer/blob/master/src/datasource/precomputed/annotations.md - created annotations in Tensorstore - spatial query has multi-index grid - sorta same like a sparse-array
    • JMS: general missing feature of a cloud database (Josh: cf. work on a graph/zarr version in Spain)
    • JM: Will try to get together SpatialData and JMS for discussions to prevent duplicative efforts
    • JM: Having URLs as indices and if not generate them on the fly and if you have write access then write to it
    • JMS: Annotations doesn’t end up being too large
    • JM: Duckdb is worth looking at - https://duckdb.org/
    • JMS: Cloud database need regular maintenance
    • JM: https://datamonkeysite.com/2022/08/27/running-a-serverless-duckdb-on-google-cloud/
    • JM: Building index on the cloud or locally?
  • ZEP0 discussions - https://github.com/zarr-developers/zeps/issues/55
    • JM: Let’s open a PR and go ahead!?
    • JMS: Yes!
    • JM: In favour of having a lighweight process would be helpful but if we reach to a point where we have contention then we should go back to the bylaws
    • JMS: If the future ZEPs overlap then there would be a problem
    • JM: Footnote is useful for future records