Attending: Davis Bennett (DB), Sanket Verma (SV), Josh Moore (JM), Norman Rzepka (NR), Eric Perlman (EP), Ryan Williams (RW), Frederic Leclercq (FL), Brianna Pagan (BP), Ward Fisher (WF)


We started the meeting by going with a round of introductions. Also, folks shared the last TV show they watched as an ice-breaker. DB has sent a PR to refactor the synchronisation API. To which WF responded, JM wondered how things were in the C land. DB presented thoughts on the Zarr group and the representation of Zarr chunks as dirty arrays.


Meeting Minutes:

  • Introductions with the last T.V show you watched
  • DB: Refactor Sync API PR: https://github.com/zarr-developers/zarr-python/pull/1495
    • Contributions are welcome!
    • JM: lessons from Bio-Formats – I would have started with immutability
    • NR: Zarrita is designed with immutability. only 2 methods for mutating an array (apart from changing the data): resizing and writing attributes
    • DB: Should all stores have caching: https://github.com/zarr-developers/zarr-python/issues/1500?
      • DB: Improve testing and performance - require a good amount of work but useful
      • NR: Current design is composable. testing matrix might be avoided through mocking
      • DB: When do you not want a cache? - Turn off cache whenever you want
      • NR: Use-case specific - not always useful to turn it off
  • BP: https://github.com/NASA-IMPACT/zarr-lakefs
  • JM: Using the MOMO card: http://www.meeting-facilitation.co.uk/blog/files/move-on-move-on.html
  • JM: How’s the C land?
    • WF: Amazon S3 - solving a few issues
    • WF: Submitted abstract about NCZarr at AGU w/ Ethan Davis - how can this help the cf community
    • WF: Also discussing about ZarrCon at Unidata
    • JM: Considering good places to host the conference
    • WF: NASA (Roses?) can help too!
    • BP: Thinking of having a GeoZarr hackathon during the AGU week
    • JM: Domain specific conferences or get everyone together?
    • NR and DB: Let’s bring everyone together and we can discuss important stuff like multiscales
  • DB: Why not allow child nodes of array? https://github.com/zarr-developers/zarr-python/discussions/1501
    • JM: Good to have this conversation with Ward and Dennis
    • DB: As an abstract questions - What would break if we have this? (disruptive for ecosystem)
    • JM: Third state apart from ON and OFF could be HAVE BOTH
  • DB: Formally represent Zarr as dirty arrays? - Using a compressor and roll it up with another compressor
    • JM: IPFS use case
    • DB: On-boarding someone to Zarr ecosystem - spending time to think about things - No record when writing Zarr w/ Dask fails
    • SV: Good to hear the onboarding process - Is there anything critical which should be brought up to refactor team?
    • DB: Convert N5 to Zarr - leave N5 behind - copying array method in Zarr breaks in N5 - any operation which is serial in Zarr is a trap (when data is in TB) - will be good to have a warning for large size
    • DB: Lazy representation of array before copying can also help
    • DB: Creating a hierarchy - using procedural way is not efficient - can write a new ZEP - acronym: ZOM (Zarr Object Model)
    • DB: In a perfect world, Zarr-Python would use threads and there would no-GIL Python ;)
    • SB: Would be good to see what the perfomance group tackles in the upcoming months!