2023-06-01
Attending: Ward Fisher (WF), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS), Ryan Abernathey (RA), Norman Rzepka(NR), Josh Moore (JM)
TL;DR:
The meeting discussed ongoing work on ZEP 4, focusing on conventions for diverse domains like bio-imaging, geospatial, and genomics. There were considerations about whether these conventions should be general or domain-specific and how to handle legacy data. Additionally, discussions covered topics such as the organization of conventions on the website, the separation of codecs, challenges in introducing transformations to the codec pipeline, and the exploration of fallback data types, emphasizing the need for broader community discussions on certain aspects.
Updates:
Meeting Minutes:
- SV: ZEP 4 → https://github.com/zarr-developers/zeps/pull/28
- RA: Still working on it
- SV: How can we help?
- RA: Need to decide if it’s going to be a general convention (for bio-imaging, geospatial, genomics etc.), or a convention of Geospatial domain all
- RA: Also need to decide if the dataset are conforming to a convention or not - lot of legacy data out there which doesn’t conform to it
- RA: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html
- WF: Convention doesn’t need to be broad, cf convention are based on NetCDF model - but there’s nothing in the NetCDF library or code that mentions the cf conventions!
- RA: The existing conventions are broad, and it’s difficult to place cf in a specific place
- WF: Agree with you
- RA: Define what’s the process to put a new convention for the community
- JMS: You have group level attribute and array level attribute?
- RA: Mostly yes!
- WF: There
- RA: Getting all convention on the website would be a good way for cross domain and community collaboration - conventions can be composable - conventions could not be universal
- WF: Conventions move slowly - take time to adopt to new things - took a good amount of time to solve SST (Sea Surface Temperature)
- RA: Will get the another draft out soon
- JMS: Not feasible to namespace an attribute?
- RA: It would require a deep re-factoring for the software we use! - It would break Xarray, GDAL, NetCDF - Zarr-Python doesn’t care about the attributes
- WF: Namespacing would definitely break the NetCDF library!
- RA: JMS, how do you handle conventions?
- JMS: Generally, doesn’t invent new conventions, and implement existing conventions - the data formats I invented, I defined those conventions - these existing conventions doesn’t lack certain things
- NR: In the process of adopting Zarr V3, currently using
OME-Key
- WF: Attributes are strictly defined - defined to be interpretable not changeable
- JMS: Maybe the best idea is to say clearly that X datasets use the conventions and multiscales
- JMS: https://github.com/zarr-developers/zarr-specs/issues/242
- NR: No need to separate them - my opinion: to keep it as it is - maybe Chris’s comment is coming out of the Rust world and separating the codecs will be convenient for him
- JMS: https://github.com/zarr-developers/zarr-specs/pull/236#issuecomment-1539188066
- JM: Cares about the codec pipeline - adding some transformations in the middle would be tricky!
- NR: Feel the same! - Sharding as a single codec makes sense but adding anything would make it complicated -
- JM: Current partial codec — define the metadata format for shard codec would be great!
- NR: Adding 2 partial codecs would make it tricky to implement
- JMS: Blosc is kind-of sharding codec - not clear if using sharding as a partial read codec is good idea!
- JMS: How do you have partial writes for the codec?
- JMS: Fallback data types
- JMS: Need to have a broader discussion
- NR: Also, do extensions data type need to have a fallback?
- JMS: No, it’s optional
- NR: The definition of fallback - like a tuple - having a datatype and the fallback value
- JMS: Maybe it’s the way to go!