2023-06-28

Attending: Josh Moore (JM), Sanket Verma (SV), Davis Bennett (DB), Ward Fisher (WF), Patty Fricke (PF), Jeremy Maitin-Shepard (JMS), Norman Rzepka (NR)

TL;DR:

We started the meeting by going around the table and asking what the last movie/TV show everyone watched was! Some excellent recommendations here.

PF asked about the ZEP0003 progress. SV has been working on a blog titled ‘Zarr vs HDF5’ and asked the community what they’d like to see in the blog. DB’s Pydantic-Zarr will be publicised shortly. WF shared the development at Unidata regarding NetCDF, and Zarr is one of the focus in the upcoming months.

Updates:

Meeting Minutes:

  • Introductions with the recent movie/tv show you watched 🍿🎬🥤
  • PF: ZEP0003 progress
    • SV: Not a lot
    • PF: reading in CSVs (each file a timepoint) with xarray
    • JMS: previous use cases were looking to move chunk size over time (t=1 to begin; then later t>1)
    • DB: delicate point – anticipating a change of the metadata. performing a rechunk live could be trickier. code samples would be useful.
    • SV: Please ping in the discussion post to get involved
  • SV: Zarr vs. HDF5 blog post
    • What would you like to have in there?
    • WF: Audience conflate NetCDF and HDF5
      • They are not same data model
      • Storage layers - cloud ready - big and cool community
      • Zarr doesn’t need a server to run - lot of good momentum
      • There isn’t no implementation and it would be complex to come up with one
      • NetCDF and Zarr has straightforward API
      • HDF5 parallel support with the help of DOE funding
      • Supercomputer - Uses HDF5 based MPI
    • DB: HDF5 has C implementation
      • Comparitively cheap to spin up implementations - maybe a bad thing?
      • Much more complicated to implement
      • Difficult to do massively HDF5 parallel writing
      • Zarr could have a HDF5 backend
    • JMS: Key point - parallel write support
      • Single threaded - fetching chunks from a single thread
    • WF: HPC/Supercomputing community almost exclusively using MPI
  • SV: DV: Pydantic-Zarr - ready to publicise it?
    • DV: Wait, until I polish the readme
    • DB: Any apetite for having this in Zarr-Python? May need to change the creation API
    • JM: There are some PRs which should be merged first in order to introduce the Pydantic change into Zarr-Python - we don’t want you to wait on another PR
    • DB: I think the code is computationaly cheap and it’d be easy to integrate it
  • JM: Looking forward for SciPy 2023 - if anyone around please come and meet us
  • WF: NetCDF roadmap
    • Working on high-level roadmapping for 5+ year cycle
    • Will be adopting V3 very soon