2023-07-26

Attending: Davis Bennett (DB), Sanket Verma (SV), John Kirkham (JK), Ward Fisher (WF), Dennis Heimbigner (DH), Martin Durant (MD), Joe Kryzak (JK), Jeremy Maitin-Shepard (JMS), John Solly (JS)

TL;DR:

SV started the meeting by providing an update on how the SciPy 2023 went. MD is looking to start the ZEP0003 implementation soon. WF is looking for collaborations for the AGU session. JS initiated a discussion on IPLD, where everyone participated enthusiastically. After this, DB had some thoughts on formalising the Zarr data model, and JK asked what would be the best Zarr C++ implementation for his work at https://aligned.com.

Meeting Minutes:

  • SV: Discussions about SciPy 2023 and how it went and the weather! 🥵
  • MD: Variable chunking implementation
    • Zarrita or Zarr-Python
    • DB: Changing the data types?
    • MD: Yes!
    • DB: You just need to touch array logic
    • MD: Just cloned Zarrita and will start looking into it
    • SV: Zarr-Python will be having a refactoring soon!
    • MD: Yeah, don’t want to get in Ryan’s and anyone way
    • SV: Who’s going to work on the implementation? You? You and Isaac? Or someone else?
    • MD: Haven’t decided yet!
  • WF: AGU 2023 CF/NetCDF/Zarr Session
    • Session Description
    • Submission deadline: August 2nd
    • Conference: 2nd week December
    • Looking for collaborations - please reach out
  • JS: Zarr with IPLD - Anyone working on it?
    • https://discourse.pangeo.io/t/zarr-on-unixfsv1-vs-on-ipld/2500/2
    • MD: Zarr is predominant use case for Zarr
      • Check out the IPFS spec repo and that could be the best way to go about it
      • Immutability - on cloud storage you can write any file but IPFS doesn’t work like that
      • JS: Adding a complex overhead to solve the complications
      • JMS: Why you wanna build on IPLD not IPFS?
      • JS: IPLS breaks data into chunks - has DAG structure - maybe IPLD blocking (chunking) algorithm and Zarr chunking algorithm has similarity
      • MD: IPFS spec authors may have thought about this
      • JMS: You’re coming from a read/write store perspective? Or read only?
      • JS: Working at University of Maryland - funded by Filecoin and bridging the gap b/w geospatial with distributed file systems - building POC
      • MD: Interest by companies - one way is to contribute to OSS
      • JMS: Looking for read-write capability? - or looking to publish a large dataset?
      • JS: Having a native IPLD would make it much for efficient - You can download Zarr from IPFS
      • MD: Filecoin estuary mighe be useful
      • SV: Please open a issue here to continue the conversation
      • JS: https://pypi.org/project/ipfs-stac/
      • MD: Intake: https://github.com/intake/intake
  • DB: Formalizing the zarr data model
  • JK: Runs a company: https://www.aligned.com/
    • Want to connect Zarr and Python to OSX - Which C++ implementations should I use?
    • Tensorstore could be useful
    • JK: Will try it and report it back
  • WF: FYI → RSEs in eScience workshop is being held in Limassol, Cyprus, in October and the call for abstracts is still open: https://us-rse.org/rse-escience-2023/