Zarr EOSS4 Roadmap
Hola Zarr Community! 🙋🏻♂️
I hope my previous blog was a good read and worth your time. Just to shed some light on ZEPs, recently, ZEP1 was submitted by Alistair Miles and Jonathon Striebel and is currently under review by the Zarr Implementations Council and the Zarr community. Feel free to leave your thoughts on ZEP1 here. I’m pleased to see the ZEP process in work and hope it assists the Zarr community in systematically achieving critical milestones.
In early 2021, we submitted a proposal to the Chan Zuckerberg Initiative’s (CZI) Essential Open Source Software for Science (EOSS) grant program. The proposal aimed to accelerate Zarr’s development on issues often too significant to tackle through volunteers’ contributions. Some of the high-level goals we focused on using the grant were API unification across open-source projects like NumPy, Dask, Xarray, project maturity, and efficient community engagement. The Zarr Community along with the Zarr Steering Council spent almost a year working towards these goals, and we’re proud to say that we’ve made significant progress.
As promised in the last blog, I will talk about what we’ve accomplished so far apart from ZEPs and what the upcoming months for the Zarr project and the community look like. Also, I’ll shed some light on the deliverables we’ve completed under the CZI EOSS4 grant.
CZI EOSS4 Accomplishments 📝
Technical Deliverables 🧑🏻💻
API Unification
The Zarr format lets you store big-size arrays into small compressed chunks, making collaborations with various array-providing projects like NumPy and Dask a must. API unification plays a crucial role in interoperability. This will allow the OSS community to transparently choose between implementations making algorithms more generalisable and scalable.
We identified several discrepancies between Zarr and related projects (NumPy and Dask) and corrected them. Juan Nunez-Iglesias worked on adding support for fancy indexing, and Ben Jeffrey fixed indexing for scalar NumPy values. See zarr-python #725 and zarr-python #974 respectively.
Mads R.B. Kristensen worked on adding support for multiple array types. See numcodecs #305. If you know of other ways that we could make Zarr work more cleanly with Dask, NumPy or other array APIs, please let us know. (How?)
Xarray / NetCDF Interoperability
NetCDF (a long-time provider of stable file formats) and Xarray (N-D labelled arrays) have been updated to support each other’s representation of named dimensions. Mattia Almansi worked on adding support from Xarray’s side see xarray #6420 and Dennis Heimbigner worked from NetCDF’s side, see netcdf-c #2257. Also, both projects have agreed to discuss a common Zarr Specification, and a proposal is being drafted for a common standard for named dimensions.
Multiscale array representation
The ‘datatree’ library by Thomas Nicholas and supported by B-open can now be used to represent a pyramid of related arrays and has been proposed as a standard data structure. Also, bioimaging users from ITK have tested the data structure, and discussions have begun for integration into Napari.
These goals mainly focus on Zarr’s technical development, which revolves around working collaboratively with critical open-source projects in the array storage ecosystem. We will continue working towards strengthening the bridges of interoperability with other projects in the upcoming months.
Community Engagement 👩🏻🤝👨🏼👨🏿🤝👨🏻👩🏿🤝👩🏼
In this section, I’ll mainly be talking about the community engagement part of Zarr. For my part, I’ve focused on:
-
The first and foremost thing I did when I started my role was to relaunch the Zarr Blog over at the new URL: https://zarr.dev/blog. The newly launched blog post contains blog posts regarding releases, ZEPs and any further event/information vital for the Zarr community. I also worked on revamping Zarr’s webpage, which is at https://zarr.dev/. Currently, I’m asynchronously working on a new website for Zarr and if you have any thoughts feel free to share them with me.
-
Zarr is participating in Google Summer of Code for the first time this year. We made a list of exhaustive potential project lists, which can be seen here. After going through several applications, we shortlisted Shivank Chaudhary and Parth Tripathi to work on Building Codecs Registry and Benchmarking Zarr Implementations respectively. I believe participating in open-source programs led by organisations is an excellent way to invite and collaborate with new contributors.
-
We also worked on increasing participation in conferences and meet-ups. For example, I spoke about Zarr at Open Geospatial Conference along with Ryan Abernathey. I also presented at my local PyData chapter and was elated to see the engaging interaction with the community.
-
Zarr V2 is now an OGC Standard thanks to efforts led by Ryan Abernathey.
-
Apart from physically reaching out to the community, we also worked on our social media presence by actively tweeting and blogging about Zarr.
-
The Zarr community needed a structural process to handle incoming changes to the Zarr Specification and accelerate the development of Zarr Specification V3. This led to the inception of ZEPs and ZIC.
-
We made new stickers for the project, and I was thrilled when they were delivered. We’ve already distributed many of them and will give them in future meetings.
We achieved a few high-level goals that would help strengthen and bring the Zarr community close. Apart from these, I’ve also been assisting with Zarr-Python releases, managing community calls, regular maintenance of Zarr repositories and working closely with various Zarr Implementations.
What does the future look like? 🔮
I’m very excited and looking forward to Zarr’s future. Having a systematic process in place and a dedicated community manager has streamlined the technical and community development for Zarr and its various implementations. Since ZEP0001 is in its initial review phase, we believe that the implementation of Zarr V3 is the next potential and upcoming change. In upcoming months, we will be focusing on:
- Implementing Zarr Specification V3 across multiple programming languages
- Implementing Sharding w/ scalable minds GmbH
- Zarr User Survey 2022 to better understand the community’s needs
- Contracting with Python-based developers and organisations to add new features like IPFS and extensions like:
- Fsspec kerchunk support in additional languages
- Development of Sparse Arrays
- Improving visibility of the project by presenting at conferences and meet-ups
- Aggregating Zarr data from the community and showcasing them on our website
Conclusion 🙌🏻
In conclusion, our first year with CZI EOSS4 grant has achieved some important milestones. We solved some of the crucial technical and community problems which have paved a smooth path for further development. We believe the upcoming progress will be in streamlined and much more systematic manner.
As for me, it’s been six months since I started working with the wonderful humans of Zarr, and every day I get to learn something new in terms of community engagement, technical skills or as simple as talking and teaching about Zarr to a group of humans. I believe that the future of Zarr looks promising and there are many more exciting things yet to come!
Thanks for reading this blog post. If you’d like to contribute to Zarr in any manner feel free to ping me or drop a ‘Hi🙋🏻♂️’ over at our Gitter channel. Talk to you soon!
~Sanket Verma