2023-06-28
Attending: Josh Moore (JM), Sanket Verma (SV), Davis Bennett (DB), Ward Fisher (WF), Patty Fricke (PF), Jeremy Maitin-Shepard (JMS), Norman Rzepka (NR)
TL;DR:
We started the meeting by going around the table and asking what the last movie/TV show everyone watched was! Some excellent recommendations here.
PF asked about the ZEP0003 progress. SV has been working on a blog titled ‘Zarr vs HDF5’ and asked the community what they’d like to see in the blog. DB’s Pydantic-Zarr will be publicised shortly. WF shared the development at Unidata regarding NetCDF, and Zarr is one of the focus in the upcoming months.
Updates:
- New blog post: https://zarr.dev/blog/zarr-talks/ 🎉
Meeting Minutes:
- Introductions with the recent movie/tv show you watched 🍿🎬🥤
- SV: Black Mirror
- DB: A Room with a View
- WF: RRR
- JM: Heartstopper
- PF: Extraction 2
- NR: Lidia Poet
- JMS: John Wick Chapter 4
- PF: ZEP0003 progress
- SV: Not a lot
- PF: reading in CSVs (each file a timepoint) with xarray
- JMS: previous use cases were looking to move chunk size over time (t=1 to begin; then later t>1)
- DB: delicate point – anticipating a change of the metadata. performing a rechunk live could be trickier. code samples would be useful.
- SV: Please ping in the discussion post to get involved
- SV: Zarr vs. HDF5 blog post
- What would you like to have in there?
- WF: Audience conflate NetCDF and HDF5
- They are not same data model
- Storage layers - cloud ready - big and cool community
- Zarr doesn’t need a server to run - lot of good momentum
- There isn’t no implementation and it would be complex to come up with one
- NetCDF and Zarr has straightforward API
- HDF5 parallel support with the help of DOE funding
- Supercomputer - Uses HDF5 based MPI
- DB: HDF5 has C implementation
- Comparitively cheap to spin up implementations - maybe a bad thing?
- Much more complicated to implement
- Difficult to do massively HDF5 parallel writing
- Zarr could have a HDF5 backend
- JMS: Key point - parallel write support
- Single threaded - fetching chunks from a single thread
- WF: HPC/Supercomputing community almost exclusively using MPI
- SV: DV: Pydantic-Zarr - ready to publicise it?
- DV: Wait, until I polish the readme
- DB: Any apetite for having this in Zarr-Python? May need to change the
creation
API - JM: There are some PRs which should be merged first in order to introduce the Pydantic change into Zarr-Python - we don’t want you to wait on another PR
- DB: I think the code is computationaly cheap and it’d be easy to integrate it
- JM: Looking forward for SciPy 2023 - if anyone around please come and meet us
- WF: NetCDF roadmap
- Working on high-level roadmapping for 5+ year cycle
- Will be adopting V3 very soon