2022-08-24
Attending: Sanket Verma (SV), Jeremy-Maitin-Shephard (JMS), Davis Benett (DB), Eric Perlman (EP), Ward Fisher (WF), Martin Durrant (MD), Hailiang Zhang (HZ), Ryan Abernathy (RA), John A. Kirkham (JK)
TL;DR:
The Zarr community discussed two open PRs in the zarr-specs repo. Support a list of codecs
in place of compressor
and Change data type names and endianness. The discussion was extensive, covering many good points, and the overall community favours both of these PRs for getting merged. After this, Hailiang Zhang from NASA Goddard asked a few questions about the ZEP extension he and his colleagues are working on. They are making progress on the ZEP and will submit a draft in the upcoming weeks. At last, there was a discussion on working on and finishing the pending ZEP1. John A. Kirkham proposed an idea that everyone was in favour of. Also, the community would like to step up and help in the completion of ZEPs whenever and wherever needed.
Updates:
- Zarr is attending CZI and NumFOCUS Summit 2022, if you’re there feel free to say Hi! 👋🏻
- Jonathan Striebel is presenting poster on Zarr @ EuroScipy 2022 next week. If you’re attending the conference, please say, Hi 👋🏻!
- Final decision on ZEP1 most probably next week. Please leave your feedback now!
- Suggestions for themes/tech stack for website revamping
Meeting minutes:
- JMS: https://github.com/zarr-developers/zarr-specs/pull/153
- Does anyone has any feedback on this?
- RA: Hard to change across all the specs - changing filters is possible as it deals with NP arrays - and compressors don’t do that
- DB: we can look at unified API for these type of changes
- MD: Codec should take context - and all the info like position, size of array and this could potentially solve the problem - by the time you compress the array - the codec could do and know what you’ve told
- JMS: each codec should have byte as output
- DB: for numcodecs this mean to promote/change function signature - no reason this could not be done
- MD: it says buffer and it can be amended
- MD: it can tell you where you are in the array - chop things where it is specified - For e.g. I want this key because it is this key in this chunk - also biased because worked on storage and helps me out
- JMS: first you read the chunk and then data is read - codec wants to make partial read - then codec could decide what to do from there
- MD: blosc takes care of this - codec will itself won’t interact with the storage layer - use case - kerchunk example - netcdf file - needs to know what file and size we are - doesn’t need to think about storage layer
- JMS: https://github.com/zarr-developers/zarr-specs/pull/155
- MD: Seems good to me 👍🏻
- JMS: rename the data types and keep the endianness - minimal change - using different names makes senses - you can add other names - and it makes sense to make more conventional names
- DB: in favour 👍🏻
- JMS: if big endian array - this change will return array with big endian
- RA: lil’ experience with this - ocean model gives big endian data - sometimes they don’t work - never wanted to have those types - just accepted because it was there - trade off: computational cost and cannot convert it to fly
- MD: if you can do it in places, temporary duplication of the memory can be done - all astropy data is endian
- RA: row major vs. column major - something which Zarr should take care of 👀
- HZ: Ryan and HZ’s colleagues(Briana and Mahabal from NASA Goddard) had discussion in meeting - a proposal to build chunk level statistics for performance - idea is like: each chunk will have some statistics to decide their characteristics and how are they performing
- Planning to submit a extension ZEP soon! 🙌🏻
- Not a single value - will be along certain dimensions - allows to be vector instead of scalar - along which dimensions it should be done!?
- It’s been a month - what’s the timeline of release of next major version spec? - No time, we’re still working on it!
- Statistics need to have some knowledge - https://github.com/zarr-developers/zarr-specs/issues/73 - this is helpful for us - couldn’t find this in V3 - did I miss something about this?
- Dimensions could be lat. long. time - the statistics - the dimensions could be switched
- MD: adding a codec could solve this!
- Thank you!
- JK: Get Alistair to finish ZEP1!
- Use comments and make an individual PR for those comments - Nice idea!
- MD: Different uses and perspectives - move towards a same goal - other groups have structured issues
- MD: make a list of things we can include and solve them
- JK: break larger problem into simpler ones and then solve then!
- Discussion on community to take charge for ZEPs
- Everybody seems to be in favour and ready to step up
- Hoping to close ZEP1 soon and move forward!
Open Agenda (add below 👇🏻):
- Zarr v3 spec open issues:
- by JMS
- Tabled:
- fill value required: https://github.com/zarr-developers/zarr-specs/pull/145
- C order vs F order vs arbitrary order
- Rename of array metadata files
- Dimension labels
- NaN/inf/other special values in user-defined attributes: https://github.com/zarr-developers/zarr-specs/issues/141
- Storage transformations: sharding, consolidated metadata
- Can storage transformers operate on the entire store or just arrays? https://github.com/zarr-developers/zarr-specs/pull/149#pullrequestreview-1078722828
- Tabled:
- by JMS