20th April, 2022

Attending: Ryan Abernathey, Josh Moore, Eric Perlman, Sanket Verma, Jeremy Maitn-Shepard, Jonathan Striebel, Gregory Lee, Jim Pivarski, Ishan Bansal, Isaac Virshup, Parth Tripathi, Martin Durant, Ward Fisher, Dennis Heimbigner, Matthew McCormick

Updates:

GSoC deadline ended, we’ve 3 proposals this year! April 19-May 12 we can decide how many slots we can take.
Cloud Native Outreach Event went great. Videos will be live shortly!
If you have any videos to share, let us know!
- intros: https://www.youtube.com/playlist?list=PLvkeNUPrCU04Xvcph4ErxsRkZq28Oucr7
- applications: https://www.youtube.com/playlist?list=PLvkeNUPrCU05qHkZso_T74yoayqLFHzkI
Using https://github.com/orgs/zarr-developers/discussions
- higher level of repo discussions (specifically show up on the “community” repository.)
ZEP final update!
- JM: implementation council to be invited
- JS: great to have the implementors on board to not fragment the landscape
- MD: some may not implement though, right? JM: true. multiple states of votes:
  - will-implement, may-implement, wont-implement, breaks-us-veto
- MD: no clear status of what’s up to date
- RA: veto power since would be bad to lead to forks. worth discussing that provision.
  - MD: since we aim for consensus anyway (and veto is used rarely ) should work fine
  - JS: don’t want to end in a place where the spec says something that will never be implemented.
  - JS: separation on veto for core or extension. JM: agreed, focus all ZEPs on core for the moment
  - SV: extensions are V3, which isn’t done, so it’s all core.
  - JMS: only V3? JM: what about C/F order? RA: don’t have to limit it (but we want to focus on V3)
    - MD: agreed, the place to expose breakages
- RA: core vs. extension
  - is core something that everyone must (eventually) implement
  - MD: some things are already optional like filter
  - MD: extensions were originally synonymous with conventions but dataset is openable without
  - RA: convention is distinct from optional extension (cf. variable length chunks)
  - JMS: another way of seeing extensions is the evolution of the spec. signalling to implementations that they are seeing new data. “must understand”
  - JP: agree about the disctinction. can’t-read-data vs. might-need-a-library. have wanted to frame this as an extension that labels a convention, like an annotation.
  - IV: would be useful to specify convention. if you don’t have a way to store the metadata, then goes into the .zattrs
    - good to have a field in the structural metadata to specify conventions
    - JS: separate from convention. orthogonal questions.
  - RA: hierarchy (or ontology)
  - JP: A different example: I’ve seen HDF5 data files, from gravitational waves, that are valid HDF5 but can only be “understood” by the LIGO collaboration’s code. It would have been good to have a label on that HDF5 file warning haphazard users.
  - Unidata?
    - DH: is wont-implement a veto? or if they say wont-implement and breaks, then is veto? That’s a lot of power.
    - JMS: non-zero origin & data-orders other than C are both examples that cause issues (with e.g. Julia). potential vetos.
    - DH: solvable, but they are saying the cost is high.
    - RA: take sharding. major enhancement but pita to implement. is it core? need to show in ZEP? higher bar for core proposal?
    - DH: have been looking at how to implement. it will be a challenge. decided with Ward that it’s worth doing.
    - RA: meta goal is to have that discussion before the ship has sailed.
    - WF: feels a lot like an internal NetCDF conversation. What is an NC file? vs. what’s in the doc
      - NC file has to be more than a file written by netcdf-c library (the first party implementation)
      - goal of tech. spec. is to take it and write software in any lang. that can write/read a NC
      - needs to specify permissible deviations
    - MD: how to go from v3 to v3.1? (sharding or variable length chunks)
      - WF: have made many mistakes…. (e.g always refer to specific versions, NetCDF 3…)
      - note: unidata doesn’t yet have the iron clad backwards compatibility for nczarr
      - v3 to v3.1 could potentially not be backwards compatible
      - behavior versus definition (this message may self-destruct…)
  - MD: parquet example. people mention v2 but that doesn’t really exist
    - still features that aren’t implemented!
  - JMS: similar in HTML , https://caniuse.com/ – we need the same
  - JS: agreed, important to know. needed to read the data or not.
    - Even more core, must-understand flag & warnings about not being supported. All MUST have this for V3.
    - sharding storage-transformer proposal: sharding could be an extension that uses transformers.
    - then impl. council decides
  - WF: NetCDF isn’t a great analog here; we have no forward-compatibility promise, and the solution when an old version cannot read a newer file is to suggest they upgrade to the latest version. But this is because there are not a lot of independent implementations in the wild.
    - NetCDF is also fortunate to have a number of independently-developed utilities and tools (NetCDF Operators (NCO), and pnetcdf spring to mind). Perhaps a zdump (similar to ncdump or h5dump), provided by the core project, that could provide summary information for a file? This information could then be used to determine if a specific dataset could be read by an implementation in question.
  - RA: useful concept here – netcdf is focused on interoperability & preservation. Parquet is for performance. Zarr is mainly high-performance copy of data. But for sharing, might make different choices. Use different extensions then. i.e. have it both ways. Still need the minimal, most-operable version. And need to be clear & upfront about that.
  - MD: perhaps a “maximal-flag” setting? IV: perhaps flags. JMS: agreed.
    - MD: perhaps “conversative” to cover several of these
  - JM: https://xmpp.org/extensions/xep-0115.html
  - JP: version/flag-objects within spec, then could have storage-typed and performance-typed objects
  - IV: do that in AnnData. Sparse array v1 or v2. (i.e. at the object level)
  - RA: are we at the point that the core is the necessary stuff in v3 and we can go with that?
  - JMS: expect to add non-optional features in the future
  - JM: various data types are probably missing now
  - JS: storage transformer falls under this too
  - JMS: (…Josh missed a comment from JMS here…)
  - JS: not clear how optional the extensions are.
  - JM: strip extensions and add it back later? Agreed.