2024-04-18

Attending: Josh Moore (JM), Vicent Immler (VI), Sanket Verma (SV), Ward Fisher (WF), Altay Sansal (AS), Jeremy Maitin-Shepard (JMS)

TL;DR:

The meeting covered the proposal to remove implicit groups in Zarr, progress on ZEP4 and ZEP3, and updates on V3 implementation. Additionally, discussions included async read optimizations for Zarr and the impact on performance, especially concerning large datasets and parallel data ingestion.

Updates:

Meeting Minutes:

  • Introductions w/ last gift you got
    • Sanket - cologne and clothes
    • Vincent - wooden board forged with family crescent
    • Ward - camping tent
    • Josh - pecan nuts
    • Altay - lead data scientist - lego
  • Removing Implicit groups
    • JM: Discussed at community meeting - needs to go back to root node to figure out the group
    • JM: Tensorstore doesn’t use Zarr groups at all
    • WF: Supposition from my side
    • WF: Dennis completed the V3 implementation!
    • JM: Are we closer to parity in V3 work - a question for Dennis!
    • VI: How does implicit groups affect performance?
    • JM: No, implicit groups means performance improvement
    • VI: Working on a new software implementation for students
    • JMS: No experience in working with groups
    • JM: Lot of callbacks
    • JMS: You’d definitely want to remove the looking upward
    • AL: Couldn’t see a use-case for parallel creation of groups
    • JMS: You’re ingesting lot of data in S3 and they read group metadata and have implicit groups
    • AL: .zattrs would have race condition?
    • JMS: Kind of a niche use-case
    • AL: Are Multi-processing locks concern metadata?
    • JMS: Multiple machine can leverage this!
    • AL: Removing would be a good idea!
  • AL: ZEP4 and ZEP3 progress
  • SV: AL, are you using V2 or V3?
    • AL: Using V2 and would love to move to V3 - have 20-30 PB data
    • AL: Want to work on dimension_names - what would be the best time to do it?
    • SV: After V3 release
  • VI: explains GSoC application
    • AS: Hacked Zarr to submit reads in a async manner to the machine to circumvent the problem
    • AS: Zarr V3 is going to be fully async so, it helps alleviates the problem
    • VI: Would be good to have a way to improve the read speeds for Zarr