2023-02-08

Attending: Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Virginia Scarlett (VS), John Kirkham (JK), Dieu My Nguyen (DMN), Hailey Johnson (HJ), Isaac Virshup (IV), Martin Durant (MD)

TL;DR:

SV started the meeting by letting everyone know he is working on adding a webpage on our website to showcase Zarr adopters. So if you’re using Zarr in any way and want to showcase your logo, please follow the instructions in the issue here: https://github.com/zarr-developers/community/issues/60.

He also asked around if submitting a paper in JOSS for Zarr-Python would be a good idea, and overall there was a positive response. After this, we discussed submitting tutorials for SciPy 2023 and adding sharding in the next release. The meeting ended with MD sharing what he was building during the HackWeek at Anaconda and DB asking about the performance difference with/without sharding.

Updates:

Meeting Minutes:

  • JOSS (SV)
    • DB: takes time but can’t hurt.
    • IV: JOSS review took over a year for anndata
  • SciPy 2023 Tutorials (SV)
    • Deadline: Feb 22 (Conference in July)
    • Talk focuses on the evolution of the spec since 2019
    • Duration: 4 hours
    • Basics but also different domains (geospatial, bioimaging, …)
    • HJ: planning on being there
    • DMN: hybrid? Unsure. Hope to attend. Earth science data.
  • SV: New Zarr-Python release (2.14) including sharding and new theme?
    • small issues before release
  • MD: https://github.com/martindurant/rfsspec
    • hack week at anaconda. async fetching of list of urls, start/stop ranges
    • can load zarr data using rust native concurrency (not asyncio; python overhead)
    • fsspec currently works around asyncio re-entrance issues by using a dedicated io thread
    • DB: perf. difference? worst case the same. with lots of threads, could see speed up but unclear how much (maybe 2x)
      • JK: with sharding? sure.