15th June, 2022
Attending: Sanket Verma (SV), Ryan Abernathey (RA), John Kirkham (JK), Jackson Maxfield Brown (JMB), Dennis Heimbigner (DH), John A. Kirkham (JK), Martin Durrant (MD)
Updates (SV):
- ZEP1
- ZEP acceptance criteria: https://zarr.dev/zeps/active/ZEP0000.html#how-does-a-zep-become-accepted
- GSoC 2022 coding period has officially started! Check the progress for Registry Codecs and Benchmarks.
Agenda:
-
MD: Weekly tracking of GSoC 2022 Kerchunk contributors here: https://github.com/fsspec/GSoC-kechunk-2022
- Non-zero origins: https://github.com/zarr-developers/zarr-specs/pull/144
- RA: JM’s proposal like a comments/suggestions to the ZEP1?
- JM: Yes, it’s like a comment but not a full suggestion
- JM: Having a non-zero origin as extension will be fine. Zarr doesn’t have a well defined coordiante space - if you add non-zero origin you need to add stuff when dealing with other types of arrays like reading or writing it to other file systems or arrays
- RA: uses Zarr also as a lower stack array - comfortable with the idea raw array space and coordinate space - and good with zarr doesn’t know about coordinate space - Xarray can build coordinate space - works with the metadata concept - Zarr can’t make Julia use index base - coordinate space is not suggestion, here we are changing the array index
- JM: certainly see Ryan’s argument - can use other libraries like Xarray for doing the index manipulation - also value of having array where you can talk about position
- DH: NetCDF coordinate system talks about latitude and longitude - introduce notion of coordinate variables - agree with Ryan’s - index level needs to be pure and standardised - whole variety of coordinate system that can be imposed later on - there are arbitrarily number of coordinate system that people use and bad to pick-up one here
- MD: agree with Ryan - in Xarray we can define coordinate system using other variables
- RA: JM also commented on the issue that the risk of not having in the core would be that client opening the Zarr arrays and would not able to access the array
- JM: unfortunate how Julia changes index - if you don’t talk about base index it doesn’t hurt anyone
- RA: HDF group is used to this - zero base indexing - language determines the exposition the array data -
Xarray
can do it because it has a data model in Xarray - diffuse this out in zarr - we have a primitive array storage system and on top of the we have various conventions of metadata and that’s the beauty - no explicit support is required for that - many tools can open that - RA: We can put a convention to address the issue - a page of conventions on the website, something like https://zarr.dev/conventions can document that - processing softwares can use those - Zarr ontology to other array ontology - if we put it up in a Zarr core why are we catering to microscopy group only and why not the Geo community!?
- MD: The word
convention
is super useful - if you have tools which can leverage the indexing - RA to JM: if we don’t support in the core array - it’s also about the implementation - have you thought about implementation?
- JM: very simple if
not
in core spec - pretty clear boundary on how transformation can be done in dense integer space of Zarr array - index by coordinate array and other method - different data types where indexes are latitude longitude - having a extra level of translational array - MD: Zarr array core design would need to behave like every language
- JM: if array is small - it’s in the memory and you can do a lot of stuff like read it store it and play around with it! - Zarr array and memory works in other ways!
- MD: naively do it in any language - use the language rules - you’d the do the selection as the array is in the memory
- JM: shifting the coordinate space - what about negative indexing? - How does Xarray handles it?
- MD: not possible - each variable has unique set of coordinate - the NetCDF conventions would not allow it - NetCDF conventions are far more rigid that anything - Xarray could certainly implement wide range of mapping - Xarray is born out of
NetCDF model
-
Negative Indexing
- JK: negative indexing - logical indexing and coordinate indexing problem - data exists somewhere and how we map that to meaningful coordinate
- MD: negative indexing is problem in Python and means a different thing over there
- JK: big change - specifying changes by coordinate having to list them in metadata and to update the metadata for all the previous arrays
- MD: the reference file systems could do the renaming - but it is complicated
-
Discussion on: https://github.com/zarr-developers/zarr-specs/issues/141
- DH: is the issue representing the floating point numbers?
- JM: the attribute model is
.json
- some way to intended as the number - DH:
.json
has that distinction - JM: Python implements the extension - generally extension doesn’t support that - in JS you need to write your own parser to take care of this - no way to represent this and we need to discuss this
- DH: binary and nan will be represent as bit pattern
- JK: would love to see how coordinate space stack would look like - interesting to have it in extension - if Xarray would be interested in that - recasting the coordinate? - coordinate space extension? - changes the metadata? - graduate the metadata and see how to it behaves when the coordinate system is changed?
- JM: a few things that needs to be discussed:
- Zarr attributes and .json and infinity values binary things - wonder if there’s a solution to that in zarr v3 (https://github.com/zarr-developers/zarr-specs/issues/141)
- Zarr V2 array creation has a easy way to create arrays - whereas you need to mention path in V3; Zarr v3 array creation is pain because of path - could be handled by the having
.v3
extension -//
or any other special character to handle it