2023-08-23

Attending: Davis Bennett (DB), Sanket Verma (SV), Josh Moore (JM), Norman Rzepka (NR), Eric Perlman (EP), Ryan Williams (RW), Frederic Leclercq (FL), Brianna Pagan (BP), Ward Fisher (WF)

TL;DR:

We started the meeting by going with a round of introductions. Also, folks shared the last TV show they watched as an ice-breaker. DB has sent a PR to refactor the synchronisation API. To which WF responded, JM wondered how things were in the C land. DB presented thoughts on the Zarr group and the representation of Zarr chunks as dirty arrays.

Updates:

Meeting Minutes:

  • Introductions with the last T.V show you watched
  • DB: Refactor Sync API PR: https://github.com/zarr-developers/zarr-python/pull/1495
    • Contributions are welcome!
    • JM: lessons from Bio-Formats – I would have started with immutability
    • NR: Zarrita is designed with immutability. only 2 methods for mutating an array (apart from changing the data): resizing and writing attributes
    • DB: Should all stores have caching: https://github.com/zarr-developers/zarr-python/issues/1500?
      • DB: Improve testing and performance - require a good amount of work but useful
      • NR: Current design is composable. testing matrix might be avoided through mocking
      • DB: When do you not want a cache? - Turn off cache whenever you want
      • NR: Use-case specific - not always useful to turn it off
  • BP: https://github.com/NASA-IMPACT/zarr-lakefs
  • JM: Using the MOMO card: http://www.meeting-facilitation.co.uk/blog/files/move-on-move-on.html
  • JM: How’s the C land?
    • WF: Amazon S3 - solving a few issues
    • WF: Submitted abstract about NCZarr at AGU w/ Ethan Davis - how can this help the cf community
    • WF: Also discussing about ZarrCon at Unidata
    • JM: Considering good places to host the conference
    • WF: NASA (Roses?) can help too!
    • BP: Thinking of having a GeoZarr hackathon during the AGU week
    • JM: Domain specific conferences or get everyone together?
    • NR and DB: Let’s bring everyone together and we can discuss important stuff like multiscales
  • DB: Why not allow child nodes of array? https://github.com/zarr-developers/zarr-python/discussions/1501
    • JM: Good to have this conversation with Ward and Dennis
    • DB: As an abstract questions - What would break if we have this? (disruptive for ecosystem)
    • JM: Third state apart from ON and OFF could be HAVE BOTH
  • DB: Formally represent Zarr as dirty arrays? - Using a compressor and roll it up with another compressor
    • JM: IPFS use case
    • DB: On-boarding someone to Zarr ecosystem - spending time to think about things - No record when writing Zarr w/ Dask fails
    • SV: Good to hear the onboarding process - Is there anything critical which should be brought up to refactor team?
    • DB: Convert N5 to Zarr - leave N5 behind - copying array method in Zarr breaks in N5 - any operation which is serial in Zarr is a trap (when data is in TB) - will be good to have a warning for large size
    • DB: Lazy representation of array before copying can also help
    • DB: Creating a hierarchy - using procedural way is not efficient - can write a new ZEP - acronym: ZOM (Zarr Object Model)
    • DB: In a perfect world, Zarr-Python would use threads and there would no-GIL Python ;)
    • SB: Would be good to see what the perfomance group tackles in the upcoming months!