15th June, 2022

Attending: Sanket Verma (SV), Ryan Abernathey (RA), John Kirkham (JK), Jackson Maxfield Brown (JMB), Dennis Heimbigner (DH), John A. Kirkham (JK), Martin Durrant (MD)

Updates (SV):

  • ZEP1
    • ZEP acceptance criteria: https://zarr.dev/zeps/active/ZEP0000.html#how-does-a-zep-become-accepted
  • GSoC 2022 coding period has officially started! Check the progress for Registry Codecs and Benchmarks.

Agenda:

  • MD: Weekly tracking of GSoC 2022 Kerchunk contributors here: https://github.com/fsspec/GSoC-kechunk-2022

  • Non-zero origins: https://github.com/zarr-developers/zarr-specs/pull/144
    • RA: JM’s proposal like a comments/suggestions to the ZEP1?
    • JM: Yes, it’s like a comment but not a full suggestion
    • JM: Having a non-zero origin as extension will be fine. Zarr doesn’t have a well defined coordiante space - if you add non-zero origin you need to add stuff when dealing with other types of arrays like reading or writing it to other file systems or arrays
    • RA: uses Zarr also as a lower stack array - comfortable with the idea raw array space and coordinate space - and good with zarr doesn’t know about coordinate space - Xarray can build coordinate space - works with the metadata concept - Zarr can’t make Julia use index base - coordinate space is not suggestion, here we are changing the array index
    • JM: certainly see Ryan’s argument - can use other libraries like Xarray for doing the index manipulation - also value of having array where you can talk about position
    • DH: NetCDF coordinate system talks about latitude and longitude - introduce notion of coordinate variables - agree with Ryan’s - index level needs to be pure and standardised - whole variety of coordinate system that can be imposed later on - there are arbitrarily number of coordinate system that people use and bad to pick-up one here
    • MD: agree with Ryan - in Xarray we can define coordinate system using other variables
    • RA: JM also commented on the issue that the risk of not having in the core would be that client opening the Zarr arrays and would not able to access the array
    • JM: unfortunate how Julia changes index - if you don’t talk about base index it doesn’t hurt anyone
    • RA: HDF group is used to this - zero base indexing - language determines the exposition the array data - Xarray can do it because it has a data model in Xarray - diffuse this out in zarr - we have a primitive array storage system and on top of the we have various conventions of metadata and that’s the beauty - no explicit support is required for that - many tools can open that
    • RA: We can put a convention to address the issue - a page of conventions on the website, something like https://zarr.dev/conventions can document that - processing softwares can use those - Zarr ontology to other array ontology - if we put it up in a Zarr core why are we catering to microscopy group only and why not the Geo community!?
    • MD: The word convention is super useful - if you have tools which can leverage the indexing
    • RA to JM: if we don’t support in the core array - it’s also about the implementation - have you thought about implementation?
    • JM: very simple if not in core spec - pretty clear boundary on how transformation can be done in dense integer space of Zarr array - index by coordinate array and other method - different data types where indexes are latitude longitude - having a extra level of translational array
    • MD: Zarr array core design would need to behave like every language
    • JM: if array is small - it’s in the memory and you can do a lot of stuff like read it store it and play around with it! - Zarr array and memory works in other ways!
    • MD: naively do it in any language - use the language rules - you’d the do the selection as the array is in the memory
    • JM: shifting the coordinate space - what about negative indexing? - How does Xarray handles it?
    • MD: not possible - each variable has unique set of coordinate - the NetCDF conventions would not allow it - NetCDF conventions are far more rigid that anything - Xarray could certainly implement wide range of mapping - Xarray is born out of NetCDF model
  • Negative Indexing

    • JK: negative indexing - logical indexing and coordinate indexing problem - data exists somewhere and how we map that to meaningful coordinate
    • MD: negative indexing is problem in Python and means a different thing over there
    • JK: big change - specifying changes by coordinate having to list them in metadata and to update the metadata for all the previous arrays
    • MD: the reference file systems could do the renaming - but it is complicated
  • Discussion on: https://github.com/zarr-developers/zarr-specs/issues/141

    • DH: is the issue representing the floating point numbers?
    • JM: the attribute model is .json - some way to intended as the number
    • DH: .json has that distinction
    • JM: Python implements the extension - generally extension doesn’t support that - in JS you need to write your own parser to take care of this - no way to represent this and we need to discuss this
    • DH: binary and nan will be represent as bit pattern
    • JK: would love to see how coordinate space stack would look like - interesting to have it in extension - if Xarray would be interested in that - recasting the coordinate? - coordinate space extension? - changes the metadata? - graduate the metadata and see how to it behaves when the coordinate system is changed?
  • JM: a few things that needs to be discussed:
    • Zarr attributes and .json and infinity values binary things - wonder if there’s a solution to that in zarr v3 (https://github.com/zarr-developers/zarr-specs/issues/141)
    • Zarr V2 array creation has a easy way to create arrays - whereas you need to mention path in V3; Zarr v3 array creation is pain because of path - could be handled by the having .v3 extension - // or any other special character to handle it