Zarr Blog

Get Together OME Transforms 🧠

2024-07-15T00:00:00+00:00

Join the 3rd Get Your Brain Together Hackathon!

Join us for third edition of the Get Your Brain Together Hackathon! This exciting event invites neuroimage data generators, image registration researchers, and neurodata compute infrastructure providers to come together for a hands-on, collaborative experience. Register now and be part of this vibrant community working towards creating reproducible, open-source resources that unlock the mysteries of brain structure and function.

This hackathon will focus on advancing OME-Zarr spatial transformations.

Overview of OME-Zarr

OME-Zarr is a cloud-optimized bioimaging file format that enjoys international community support and widespread adoption in neuroscience. It supports large-scale bioimages with spatial metadata, making it a critical tool for scientific research. The current OME-Zarr standard is enhanced by the coordinate transformations draft, which introduces robust support for spatial transformations. This is particularly important for neuroimaging and other scientific imaging practices, as it facilitates:

Reproducibility and Consistency: Explicit support for spatial transformations ensures consistent application across various platforms and applications. This feature aligns with the FAIR principles, enabling independent researchers to verify results.
Integration with Analysis Workflows: By treating spatial transformations as a first-class entity within file formats, OME-Zarr allows seamless integration with diverse image analysis workflows, eliminating the need for additional conversion steps.
Efficiency and Accuracy: Embedding transformations within the file format minimizes the need for re-sampling, thereby reducing sampling errors and preserving analysis accuracy. This standardization is crucial for handling the massive data volumes generated by modern microscopy techniques.
Flexibility in Analysis: Native support for spatial transformations provides researchers with the flexibility to apply, modify, or reverse transformations as needed, facilitating longitudinal studies, multi-modal imaging, and comparative analysis.

Hackathon Agenda

The hackathon is structured into three key components:

Day 1: Tutorial sessions covering the application needs for coordinate transformations, mathematical principles, and current computational standards and tools available in the open-source ecosystem.
Day 2: Small working groups will review and propose enhancements to the current coordinate transformations draft and relevant neuroimaging additions.
Day 3: Hands-on activities where participants will implement and apply the proposed improvements to the standards.

Event Details

Dates: Friday, July 26th - Sunday, July 28th, 2024
Location: Hybrid event at the University of North Carolina-Chapel Hill, via Google Meet videoconferencing, Image.sc Island Gather.Town virtual space, and Image.sc Zulip Chat.
Cost: Registration is free!

Register now and add the event to your calendar!

~ Matt McCormick

NASA POWER 🤝🏻 Zarr

2024-06-11T00:00:00+00:00

Hi Zarr Community! 👋🏻

Zarr’s user, developer, and contributor base is growing every day across several scientific domains, including those responsible for mitigating climate change, solving complex biomedical issues, pushing the boundaries of AI development, and more.

National Aeronautics and Space Administration (NASA) is a prominent user and deeply invested in Zarr among the geospatial community. In this blog post, we’d like to highlight the NASA Prediction Of Worldwide Energy Resources (POWER) project, which has been using Zarr for its data storage needs. The POWER project is based at the NASA Langley Research Center (LaRC), which is located in Hampton, Virginia, USA.

Introduction to POWER and Zarr 🎙

The Prediction Of Worldwide Energy Resources (POWER) project is a cornerstone “Energy and Infrastructure” Earth Action Program project. The Project’s mission is to improve learning, decisions, and outcomes in the renewable energy, sustainable infrastructure, and agroclimatology user communities. For any location in the world, the project provides easily accessible, customized, and trusted NASA solar and meteorological data for past, current, and soon future climates. POWER improves the public and private capability for integrating these NASA Earth Observations (EO) and assimilation model data into their workflows by offering a diverse suite of tools and services to access this data. The project provides access to its Analysis Ready Data (ARD) via POWER’s Application Programming Interface (API), Data Access Viewer (DAV), geospatial services, and cloud-enabled data store. POWER offers no-cost, no-account-needed access to all of its tools, services, and data, which lowers the barrier to entry for many users across the globe. Additionally, through each of its tools and services, POWER’s multi-decadal, low-latency, high-accuracy, community-specific datasets are offered in user-customizable units and a wide variety of formats.

POWER is currently using Zarr as a backend data store for our API services that a lot of our users access to implement in their project’s needs. We have made the backend data stores directly available and freely accessible to the public with no use constraints through Amazon Web Services® (AWS®). We also plan to work with digital twins to integrate POWER data directly in for their data modelling. The POWER Project will continue leveraging the Zarr to store and serve data, which includes dynamic updates at Near Real Time (NRT) by the POWER Project’s data processing code base. Zarr enables the POWER Project to more efficiently support its user communities and impact decision-making for government agencies, non–profit organizations, universities, and private companies around the world.

🗄️ Check POWER’s Zarrs on AWS® Registry of Open Data: https://registry.opendata.aws/nasa-power/

A Brief History of Data Archives 📚

POWER’s meteorological parameters, such as temperature, humidity, precipitation, or wind are derived from the NASA’s Global Modeling and Assimilation Office (GMAO) Modern – Era Retrospective analysis for Research and Applications Version 2 (MERRA-2) assimilation model. MERRA – 2 is a version of NASA’s Goddard Earth Observing System (GEOS) Data Assimilation System. This data is available starting back in 1981.

The energy flux parameters, like solar irradiance and cloud properties, are derived from NASA’s GWEX SRB archive and NASA’s CERES SYN1deg and FLASHFlux projects. This data is available dating back to 1984.

POWER’s ability to provide historical data allows users to access variability, make decisions, and conduct analyses based on past and current information. For more information on POWER’s history, please see our documentation.

What format did POWER use before? 🔙

Previously, POWER used a NetCDF file that was structed support temporal access by chunking along the line of dimension, in conjunction with OPeNDAP software used as middleware to support the POWER API’s temporal requests efficiently. While this system met initial requirements to provide daily data, the team had to explore new formats to meet the growing demand for hourly data. NetCDF was no longer efficient for hourly data needs because NetCDFs is a condensed structure which prompted us to assess the use for Zarr since it is segmented.

Why POWER switched? 🔍

The POWER Project selected the Zarr format as our Analysis Ready Data (ARD) format as we were transitioning from a monolithic server architecture to a microservice-based hybrid cloud hosted architecture environment, with the foresight of fully transitioning to a cloud environment. To support the key component of our services endpoints, the efficient and fast distribution of time-series data, we wanted to remove any middleware software, improve data access, and implement higher levels of data compression.

Benefits after switching 💪🏻

Switching to Zarr enabled complete and direct access to the POWER data archives in an Analysis-Ready Cloud Optimized (ARCO) data store. Zarr also allows for asynchronous writing, JSON metadata, and a folder- based structure which allows us to add to the datastore faster and keep the data neatly sorted. Furthermore, POWER is able to utilize the higher level of data compression which reduces the speed for data acquisition. Lastly, being able to load small parts of the data improves efficiency, relevance and reduces costs.

For POWER’s future datastore, the Zarr’s enhanced compression was leveraged recently in initial testing which resulted in orders of magnitude smaller of total storage volume without losing any data precision. The result of this compression was saving on storage costs and increasing read/write performance of the system.

As a part of NASA’s Space Act Agreement with AWS®, this archive is hosted in S3 as part of the Open Data Registry, which provides the data freely to the public.

Future plans for Zarr 🔮

The POWER Project plans to include both a spatial and time series based chunked data structures to better meet user demand for data and fulfill data orders more quickly, to support more efficient access and analysis. To promote further understanding and enable effective search and discovery, the team will develop enhanced slice-based metadata.

POWER plans to use Zarr spatial data stores for ArcGIS Image Services in future data versions to cater to the needs of the GIS community.

~ Zoe Waring, NASA POWER team

Toward Zarr-Python 3.0

2024-05-09T00:00:00+00:00

We released Zarr-Python 2.18.0 this week. Although this release was quite light in terms of user-facing changes, it represents the beginning of a new phase for the project. In this post, we’ll walk through our plan for Zarr-Python 3.0 and what users of the library can expect in the coming months.

Zarr-Python 2.18

Before we get into the 3.0 release, we’ll first cover a few details about the 2.18 release series. The first thing to know is that we will continue to support 2.18 with bug fixes up until the release of 3.0. Additionally, we expect to use the 2.18 series to communicate changes in the Zarr-Python API, which will come in 3.0. For example, this week’s release included a number of new deprecation warnings for parts of the Zarr-Python API that we expect to remove in 3.0 (e.g. exotic stores, experimental v3 API).

What to expect with Zarr-Python 3.0

In mid-2023, we formed a working group to look at modernizing Zarr-Python and, crucially, adding support for the V3 specification. One of the early outcomes of this effort was a design document detailing the plan for a major refactor to the library. The goals for the refactor effort are to:

Provide a complete implementation of Zarr V3 through the Zarr-Python API,
Clear the way for exciting extensions / ZEPs (i.e. sharding, variable chunking, etc.),
Provide a developer API that can be used to implement and register V3 extensions,
Improve the performance of Zarr-Python by streamlining the interface between the Store layer and higher level APIs (e.g. Groups and Arrays),
Clean up the internal and user-facing APIs,
Improve code quality and robustness (e.g. achieve 100% type hint coverage), and
Align the Zarr-Python array API with the array API Standard.

In late 2023, we started working on the next version of the library, iterating on core concepts and restructuring the code base. While this effort continues today, here are a few highlights that we are particularly excited about:

New asynchronous APIs across the library, including at the Store, Group, Array, and Codec levels. The ability for Zarr-Python to leverage asynchronous computation will dramatically improve performance in the library, particularly for workloads that depend on data coming from high-latency stores. We expect most users will interact with these classes through a synchronous interface but the asynchronous alternatives will be available for users that can take advantage of them.
Complete spec-complaint implementation supporting both V2 and V3. Zarr-Python will support reading and writing in either format. Additionally, the V2 and V3 code paths will benefit from the new asynchronous interfaces as well as other performance improvements.
New plugin interface for codecs. Previously, codec support was required to run through Numcodecs. Going forward, additional codecs may be registered with Zarr-Python using the zarr.codecs Entry Point. While Numcodecs will continue to supply the Zarr-Python project with most codecs, the plugin interface will support the integration of codecs from other libraries.

Release plan

We are still working hard on the 3.0 development branch. You can follow our progress on our GitHub Project Board. In the coming weeks, we expect to move our development to the main branch of the library and make a series of pre-releases.

Get involved

It’s not too late to get involved with the 3.0 effort. The GitHub Project Board provides an up-to-date summary of outstanding issues. If you maintain a library that depends on Zarr-Python, the 3.0.0-alpha release will be a great time to start testing against the upcoming release. Finally, we continue to hold bi-weekly developer meetings to discuss and coordinate work on Zarr-Python. This is an open meeting so please come if you are interested in getting involved. Check out the Zarr community calendar here.

~Joe Hamman

Zarr Sprint Recap

2024-04-04T00:00:00+00:00

A few weeks ago, a group of us met up in person, at LEAP in New York City, and virtually to hack on the Zarr specifications and ecosystem.

In this blog, I give a very brief overview of each of the topic areas. More importantly, I link out to the open issues, pull requests, discussions, and meeting opportunities for continued development. You can follow these links to both better understand each effort and also to contribute yourself.

Please keep this work going by adding reviews and comments to online conversations and joining any relevant meetings/working sessions.

Zarr Specification

Zarr Specification refers to the specification for a chunked, compressed, N-dimensional array format primarily designed for storing large numerical arrays efficiently. It is commonly used in scientific computing, geospatial, bioimaging, and data analysis contexts.

The specification defines how data should be organized within a Zarr store, including details on chunking, compression, metadata, and other attributes necessary to efficiently store and retrieve array data. This specification helps ensure interoperability between different software implementations that support the Zarr format.

The latest version of the specification is V2, but Version 3 is in the works.

Chunk Manifest / Virtual Concatenation

In this breakout session, the group engaged in a long technical discussion about a way to define arrays in a Zarr store as concatenations of other arrays in the store. You can read a full ZEP-like description of the discussion here. Shoutout to Tom Nicholas for documenting this so well!

Zarr-Python

Joe Hamman led a group focusing on enabling support for V3 in Zarr-Python. This was part of an ongoing effort working toward Zarr-Python version 3.0 (roadmap). The focus of this group was on closing outstanding issues on the roadmap and testing the development branch in common geospatial applications. Zarr-Python has traditionally been the canonical implementation of Zarr, and it is therefore a current priority since this effort delivers immediate impact to the largest swath of users, including those that use Zarr through downstream libraries (e.g. Xarray, Dask, Anndata, etc.).

Geospatial Multi-Scales/Pyramids

In the Zarr pyramids breakout group, Thomas Maschler and Max Jones discussed the motivations for following the OGC TileMatrixSet 2.0 specification within the GeoZarr specification, which will be shared as a new issue to supersede GeoZarr Issue #30. They also discussed reading those TMS into rio-tiler using Xarray and started a refactor of ndpyramid to support the TMS specification.

Alternate backend for reading remote Zarr stores

Kyle Barron worked on a prototype for an alternate store for Zarr Python using new async Python bindings to Rust’s object-store project. You can see a prototype of object-store-based store implementation at zarr-python#1661.

GeoZarr Specification

Throughout the sprint, the GeoZarr focus group worked on examining the interoperability of GeoZarr and different existing tooling and store support. You can see the table here.

One of the biggest realizations was that ArcGIS has a lot of existing support for Zarr, which is really exciting news! For other tools, there is still work to be done, especially for GeoTIFF-like data being stored in Zarr, which translates to updates needed within the GeoZarr specification. For example, there are functionality issues tied to support or lack thereof for specific compression algorithms. The GeoZarr Steering Working Group is working on providing a list of supported compressions for commonly used tools. There is also work to be done on specifying the organizational structure of GeoZarr and understanding where requirements from CF diverge from the Zarr data model. For this, we are focusing efforts on involving folks with CF expertise to guide these conversations.

If you are interested in helping out, please join the next bi-weekly GeoZarr meeting every other Wednesday at 11 EST. The next will be April 17th and you can find the invite on the Zarr calendar or join directly from this link. Check out the notes from past meetings at the hackmd.

HTTP Extension

A final priority of the Zarr Sprint was to get efforts rolling on how to better visualize Zarr on the web.

Kevin Booth is the lead on this effort. Currently, he has added some sidecar files with links to reference parent, child, and root relationships in the Zarr to be able to use something like traverzarr, the first attempt at traversing a Zarr JSON as if it were a filesystem in developed by Xavier Nogueira during the sprint, to navigate a Zarr in a manner like the Spatio-Temporal Asset Catalog (STAC). A more detailed blog post with updates on this work to come in the next week.

This work continues to be worked on after the sprint. Cloud-Native Geospatial Foundation has started holding bi-weekly meetings to hack on this work. If you would like to be involved in this, email hello@cloudnativegeo.org to be added to the meeting invite, or find the meeting link at the Zarr calendar here.

More efforts to come!

It was great to get a group of people together to spend some dedicated time on Zarr, and the work is nowhere near done. Please help keep the momentum of these efforts going by responding to any GitHub Pull Requests, Issues, or Discussions that you have opinions on and joining any of the established Zarr meetings that are of interest to you. Again, the Zarr calendar can be found here.

~ Michelle Roby

Levelling Up: Zarr Community Transitions to Zulip

2024-02-27T00:00:00+00:00

Hi, Zarr Community! 👋🏻

We’ve got an exiciting announcement for you all! 😄

Community Update: Moving to Zulip 💬

We’re excited to announce that the Zarr community has made a move from Gitter to Zulip as our primary chat platform. This transition marks a new chapter for our community and offers several advantages for our members.

Join here → https://ossci.zulipchat.com/

Why Zulip? 🤔

Zulip offers a robust and versatile platform for communication and collaboration. Its threading model allows for organized and focused discussions, making it easier for community members to follow and participate in conversations effectively. Additionally, Zulip provides powerful search capabilities, ensuring that valuable information shared in the past remains accessible to all.

Zulip’s unique message sharing feature allows conversations to be easily shared around the web via unique links. In addition, Zulip’s indexing of all content by search engines ensures that the knowledge base is easily accessible to all users.

Hosting Thanks 🙏🏻

We extend our sincere gratitude to the good humans at the Open Source Science Initiative (OSSCi) for generously hosting the Zulip server. Their commitment to supporting open science and collaborative research is commendable, and we’re thrilled to partner with them on this endeavour.

Shoutout to Jonathan Starr for helping us! 🙌🏻

Building a Hub for Open Science 🧬

The OSSCi Zulip server will serve as a hub for various projects in the scientific Python ecosystem, starting with Zarr. By centralising communication within this platform, we aim to foster greater collaboration, knowledge sharing, and community building among like-minded individuals passionate about open science and research.

Official Chat Platform ™️

With this migration, the OSSCi Zulip server becomes the official chat platform for the Zarr community. We encourage all Zarr users, contributors, and enthusiasts to join us on Zulip to stay updated on the latest developments, seek assistance, and engage with fellow community members.

Your Feedback Matters 🔁

At Zarr, we value the input and ideas of our community members. We’re committed to continuously improving our platform and user experience. Therefore, we welcome any feedback, suggestions, or ideas you may have regarding the Zulip migration or any other aspect of our community. Your input helps us better serve the needs of our users and advance our shared goals.

Please create an issue in zarr-developers/community or join one of our community meetings if you’d like to chat with us!

Join Us on Zulip! 🔗

Ready to join the conversation? Head over to the OSSCi Zulip server and dive into discussions surrounding Zarr and other exciting projects in the scientific Python ecosystem.

With our shift from Gitter to Zulip, it’s worth mentioning that the majority of discussions on Zulip have involved the core developers of Zarr. Now, we’re extending our warm invitation to the wider community to join us on Zulip. Your involvement is crucial as we foster a more inclusive and vibrant community.

We look forward to connecting with you there! ✌🏻

~Sanket Verma

Zarr, as seen in the public 📣

2023-06-23T00:00:00+00:00

Hi Zarr Community! 👋🏻

Recently, I and several community members have been speaking at various conferences and events. There has been an exciting development in the Zarr ecosystem, like finalising V3 specification, submitting new ZEPs, initiating new implementations, etc.

While I’m mostly giving beginner talks on Zarr, which answers how, why, and what, the enthusiastic community members have been talking about other exciting stuff!

In this blog post, I highlight a few talks which were delivered in the past two months. Also, we’re maintaining a playlist on YouTube, which has a more extensive collection of talks from various domains and diverse speakers. Check the playlists: Zarr: Introductory Talks and Zarr: Projects, Uses, Research and Workflows.

PyCon DE and PyData Berlin 2023 🇩🇪

I went to Berlin, Germany, in April to speak at PyCon DE and PyData Berlin 2023. My talk was titled “The Beauty of Zarr”, where I emphasised the inner workings using some near illustrations by Trevor Manz. I highlighted how simple, convenient and hackable it is to use Zarr. After going through various explanations, I focused on some critical issues that Zarr eradicates because of its design and workings, i.e. chunking, compression, cloud-enabled etc.

Towards the end, I prepared a Jupyter notebook where I walked through Zarr 101 code to create, read, write and manipulate arrays. I also converted the Zarr pixelated logo from .png to .zarr format, which was a neat closing for my talk.

The slides and notebook can be accessed here.

Please watch the video here: 👇🏻

ESIP Meetings 🌏

Earth Science Information Partners (ESIP) is a community of data and information technology practitioners working together to coordinate earth science interoperability efforts. ESIP has various collaboration areas. ESIP Collaboration areas are made up of administrative committees and small working groups that are called clusters. Some of them are:

Agriculture & Climate
Open Science
Cloud Computing
Soli Ontology and Informatics
Data Management Training Clearinghouse
Council of Data Facilities

And many more.

The ESIP Cloud Computing Cluster organised a three-part series on Zarr titled “Zarr: The Next Generation” In every part, the Zarr Community members talked about several things ranging from V3 to conventions to ZEPs.

The first part took place on March 27th where:

Ryan Abernathey presented on the Zarr V3 Specification, i.e. ZEP0001
Martin Durant presented on the variable chunking, i.e. ZEP0003

The video recording of the session can be seen here: 👇🏻

The second part took place on April 24th where:

Briana Pagán spoke about the current state of GeoZarr specification and working group
Norman Rzepka spoke about the Sharding specification, i.e. ZEP0002

The video recording of the session can be seen here:

The third part took place on May 22nd where:

Hailiang Zhang presented the accumulation proposal, i.e. ZEP0005
Max Jones spoke about Kerchunk and Pangeo-Forge recipes developments

The video recording of the session can be seen here:

These meetings covered a great deal of recent developments in the Zarr ecosystem. The ZEPs mentioned above explained the V3 specification, sharding, and a couple of new exciting features the community is working on. The interesting thing to note here is that the ZEP0003 and ZEP0005 are something the community members wrote to support their use-case in their domain. This shows the openness and flexibility of the Zarr open-source community and how we support everyone. Though these ZEPs are still in the draft state, they’ll be finalised soon for adoption.

I will discuss about V3 specification in a separate blog post, so I’d not go into the details here. But it’s worth noticing GeoZarr specification and what Briana presented. GeoZarr is one of the conventions on top of Zarr specification, which support various use cases of the geospatial community on how they store their data and metadata. The GeoZarr SWG (Steering Working Group) has been working quickly despite the roadblocks (as mentioned by Briana). The progress and specification can be seen here.

Conclusion

These are some of the public engagements done by the Zarr Community members in the past months. If you spoke on Zarr recently or in the past and would like me to highlight your talk, please don’t hesitate to contact me. If you’re working on something interesting which involves Zarr and want to share it with the community, please say ‘Hi’ to me!

I’ll be talking to you all soon.

Until next time, peace! ✌🏻

~Sanket Verma

Summarising OME-Zarr Java @ OME2022

2023-03-14T00:00:00+00:00

Namaste Zarr Community! 🙏🏻

I hope you are doing great. Recently, there has been a lot of exciting development on the Zarr front. Some of them are new collaborations, conventions, research publications, submission of new ZEPs, etc. I’ll cover those in separate blog posts. Meanwhile, I want to talk about something interesting for the Java community in the Zarr ecosystem. ;)

OME hosted a 4-day event last November, and on one of the days, they discussed the future of the Java implementation of Zarr, i.e. Zarr-Java, extensively. The discussion was centred around the needs, current state and future work needed to have a solid foundational Zarr implementation in Java which is much needed by the community. This blog post aims to summarise the important sections of the meetings, which could be used as a reference for future work in the development of Zarr-Java.

Disclaimer: This blog post is a naive attempt by me to understand the vast Java and Zarr ecosystem and summarise it in a few sentences, so if you think I misinterpreted something, feel free to point it out. I’m more than happy to be told that I was wrong. :)

Thanks to Chris, Sebastien, and Norman on behalf of Glencoe Software, OME and Scalable Minds for putting the slides together, delivering and moderating both sessions.

The reason for bringing everyone together @ OME2022 👩🏻‍🤝‍👨🏼👨🏿‍🤝‍👨🏻👩🏿‍🤝‍👩🏼

First, I’m going to focus on the reason why this meeting took place. There is no single Java implementation of Zarr on which the whole Java community could rely. It might be too big of an ask, but having something like Zarr-Python for the Java community would be perfect. The JVM Zarr Community is fragmented, which increases friction in the community and further affects and delays the adoption of a single OSS for the whole community. Until now, the developers/research groups/companies of the Java ecosystem who need the Zarr package have been forking various implementations of Zarr and trying to get them to work according to their use case. It might help a single cause, but it certainly doesn’t help the larger community. Moreover, these forked libraries are unmaintained when the desired use case is achieved due to the lack of resources and developer support. Having multiple similar libraries also affects confidence and trust and puts the community in a state of ambiguity on which project to rely on.

As it is evident from above, there is a strong need for a community-wide accepted Zarr-Java project which covers all the essential baseline features from the Zarr specification. This also creates room to strengthen and improve the existing Zarr specification by introducing better cross-language engagement and participation.

There’s also a need to define baseline features for Zarr-Java that the reference implementations should support. The requirements, as shared during the sessions, are:

Baseline requirements:

Java 8+
Support for Zarr V2 Specification (including dimension separator)
Inspired by Zarr Python API foundational concepts (store, compression, chunk)
Data types: signed/unsigned integers 1 -> 8 bytes, 4 and 8-byte floating point
Stores: Filesystem, in-memory, HTTP, Amazon S3
Extensible compression: blsoc, zstd, lz4, zlib, bzip2, lzma at least
Chunk API
Basic Slice API

Nice to have features:

Synchronous and asynchronous API options
Framework to support sharding and Zarr V3 Specification

These requirements are fair ask and, if/when developed, will serve as a solid foundational block for the Zarr-Java ecosystem.

Moving on, let’s see what Zarr’s history of development in the Java sphere has been like for the past years.

What’s the history been like? 🕥🕣🕡

As you can see from the above timeline,

The earliest development started in October 2011 by scalableminds on webknossos which lets you annotate, visualise and share N-dimensional arrays. After that, folks at Janelia began working on Java NGFF (Next Generation File Format) via N5. Finally, the first conversation for having JVM Zarr Implementation started in 2018 in the zarr-developers/community, which can be seen here.

This led Ryan Williams from Zarr Steering Council to work on laseronlab/ndarray.scala. Zarr’s first pure Java implementation was not seen until 2019 by Brockmann Consult, which lives here at bcdev/jzarr. The efforts from the Brockamnn group are commendable as jzarr is one of the precise adoptions of the Zarr specification. Even though it’s been almost a year since the last commit, no other Zarr Java implementation comes close to what jzarr can achieve.

After this, various interesting projects showed up, as seen on the timeline, which included the adoption of Zarr Specification in some manner. Chris did an excellent job explaining these various projects in the morning session, which can be seen here. I’d highly recommend listening to him before going further.

Despite these outstanding efforts by exceptional groups and individuals, the community remained somewhat fragmented, and there is a strong need to unite and work on a collaborative project.

Current state of work 🗂️

The jzarr 0.3.5, jblosc 1.0.1 and Amazon S3 JSR-203 Java 7 NIO2 Implementation are the most stable, well-documented and cohesive OSS projects. The jzarr adaption of the Zarr specification is quite good, and most of the community is using it. But despite its merits, there are certain limitations; they are:

The S3 anonymous access is somewhat broken and doesn’t play nicely with S3-compatible storage
The project hasn’t been maintained properly in a long time
JZarr feature support and the community support to add new features like Sharding and V3 is also not quite good

There are other options the community could look at, like N5+N5-Zarr, Z5, ndarray.scala or NetCDF-Java. But when we deep dive into their codebase, existing framework, learning curve, and adoption of the Zarr specification, it seems like every other project falls short on one or many critical features which are absolutely needed. Again, Chris did a fantastic job explaining those, and you can listen to it here.

If you like great visuals instead, Josh Moore prepared a neat matrix of the current state of projects.

Here S denotes fully supported, P denotes partially supported, and N denotes not supported.

This clearly says the community needs a Zarr implementation that ticks all the boxes mentioned above. So let’s have a look at what is being proposed.

What’s coming next? 🔮

The proposal for moving forward looks something like this:

Work with Blosc, Ryan Williams and the Laserson Lab to bring lasersonlab/JBlosc under zarr-developers
Continue work providing object code for multiple platforms from native code
- https://github.com/glencoesoftware/c-blosc-windows-x86_64
- https://github.com/glencoesoftware/c-blosc-macos-x86_64
Work with Ryan Williams and the Laserson Lab to bring lasersonlab/Amazon-S3-FileSystem-NIO2 under zarr-developers
Start zarr-developers/zarr-java and bring the best ideas, concepts, and code from N5-Zarr, Zarr, and NetCDF-Java into a reference library

The idea here is to assemble people with a specific skill set and bring them to work together under the zarr-developers umbrella. There was a Q&A session after the presentation. Some of the crucial insights from the QnA are:

There are many opinions on several important stuff like design choices, what compression to use, consolidating arguments etc., and we’d like to hear from you and get as many hands as we can to work together. We welcome community participation and contributions
Forming a consensus on some critical design choices for Zarr-Java
Getting Zarr-Java in momentum is not only a developer’s task but also a matter of community participation and engagement
The Java community needs to participate in the discussions related to SPEC to help their cause, and if they don’t, they are going to be left behind
We’re mostly going to learn things by getting our hands dirty

Since this blog post aims to summarise the meetings, I can only cover a tiny portion of the Q&A sessions. So I’d encourage you to listen to the QnA sessions from morning and afternoon sessions to see what the community thinks of this effort and how excited they are.

Session recordings 🎬

You can watch the full recording of morning session here 👇🏻:

And afternoon session here 👇🏻:

That’s it from my side. I hope this post was helpful and summarised the discussions well. If there’s anything not clear, critics are welcome!

Keep watching this space as I try to cover the advancements of what we’ve discussed. As always, if you’d like to get involved, feel free to drop ‘Hi’ on our Gitter channel. Until next time. Peace! ✌🏻

~Sanket Verma

Welcoming Outreachy 2022 Interns

2022-12-12T00:00:00+00:00

Hi Zarr Community! 🙋🏻‍♂️

The holidays are just around the corner, and we wanted to share the good news about the Outreachy participation, which shows the drive and motivation of contributors, the trust and strength of the open-source community, and the resiliency of the Zarr project.

The Initiation 🏁

It was October’s first week, and our community Gitter channel saw a sudden wave of incoming messages from Outreachy participants (i.e. Outreachies). Initially, the messages mainly stated that they were excited to find the Zarr project and looked forward to contributing. I was happy to see the initial response from the Outreachies. I thought I did a good job writing the project descriptions over at the Outreachy portal that everyone liked and now considering working with us for three months.

Speed-bumps 🚧

But after a few days, the messages shifted the tone from “Hi, I’m excited to be here…” to “Hi, I have this technical issue…” and their intensity increased manifold. It started getting difficult to manage the Zarr community and incoming Outreachies on a single Gitter channel, and we thought of creating a separate Gitter channel for the Outreachies. We started interacting with all the participants and helping them out late at night.

🤝🏻

Everyone was going fine until we noticed that there should be a central place/guide that every aspiring intern could refer to when starting their open-source contribution journey to Zarr with Outreachy. We also thought it’d help us to prevent answering the same questions multiple times at multiple places (Gitter, Emails, Twitter etc.) After this, I wrote two blogs on helping the Outreachies during their contribution phase, which can be seen here:

The strength of open-source 💪🏻

I remember in one of the Zarr community meetings, all of us were happy to see the participation by Outreachies during the phase. During the meeting, I mentioned that Josh and I are the ones who are primarily engaging with the participants, and we’d love to see others from the community helping us. After that meeting, I could see some of the long-time contributors of Zarr jumping in and helping us in aiding the Outreachies by answering their questions, reviewing pull requests, providing suggestions and finally merging their PRs. I believe that’s the real spirit of Open-Source, and I was almost idyllic to see that.

After four weeks of the contribution phase, here are some stats:

30+ closed PRs across zarr-developers
15+ new issues created on GitHub to highlight hidden flaws
100+ new users are now part of the Zarr Community
200+ queries resolved via Gitter, Email, Twitter etc.

Finally… 🥺

It took us some time to pick the best of the many qualified applicants, and after going through the immensely talented pool of Outreachies, we finally selected two interns to work with Zarr for the December 2022 cohort. They are:

BRANDON AWA from Cameroon, and they’ll be working on Creating Tutorials for Zarr 🎉
Weddy Gikunda from Kenya and she’ll be working on Testing the support and interoperability of Zarr Zip Stores 🎉

Though it’s only been a week since they started their work, they’re on an excellent start. Weddy’s blog can be seen here, where she’ll be posting her updates. Similarly, Awa’s blog can be seen here, and I like the blog’s theme. Both of them have posted their first introductory blog post, and I was motivated to learn about their core values which drives them to work hard with integrity.

I’ll be cross-posting their upcoming blog posts regularly here, so please keep an eye on this page. I’m excited to help them build new things for Zarr and look forward to working with them. Until next time. Peace! ✌🏻

~Sanket Verma

State of Zarr V3 and ZEP0001

2022-12-02T00:00:00+00:00

The Zarr community is working on a new version of Zarr, V3, which was drafted and proposed via ZEP0001 by Alistair Miles. Due to time constraints, we, Jonathan Striebel and Jeremy Maitin-Shepard, are taking over the lead authorship of the V3 proposal.

The specification is now being finalized and needs your feedback before the Zarr Steering Council and the Zarr Implementation Council vote on it. In two weeks, (on the 19th of December), the spec will go into feature-freeze, meaning only changes (issues) that have been previously discussed will be incorporated. The documents in question are:

Zarr Enhancement Proposal for V3 (ZEP0001)

Motivation and Context
Current draft of the V3 Specification

The Actual Spec

If you are familiar with V2, you might want to start at the comparison with V2 section of the specs.

Please feel free to:

Add to the current discussions in issues on the project board,
Open pull requests for changes, or
Add new issues if your topic does not fit any existing ones.

~Jonathan Striebel, scalableminds and Jeremy Maitin-Shepard, Google

Outreachy Contributor Guide 2022 - Part 2

2022-10-20T00:00:00+00:00

Hi again, Outreachies! 🙋🏻‍♂️

It’s been more than a week since the Outreachy Contribution Phase opened, and we’ve been actively engaging with the applicants through our Gitter chat, GitHub issues, PRs etc.

If you haven’t read the first blog post, please check it here: https://zarr.dev/blog/outreachy-contributor-guide/.

We’re so happy and excited to see the enthusiasm of the applicants, and sometimes it’s been hard to keep up with all of the queries and messages (sort of a good thing). :)

We were going through the existing PRs and thought of finding an additional way to contribute to Zarr and engage them better with the Zarr community. So we came up with the idea of #beautifulzarr.

What is #beautifulzarr?

It is a repository under zarr-developers GitHub, which would host beautiful use cases, visualisations, code snippets, screenshots and Twitter/social media mentions of #zarr and Zarr data.

We couldn’t think of a better time to start the compilation of fantastic work the community is doing with the help of Zarr. We plan to build and evolve this repository over time and curate its contents on our homepage.

How can you contribute to #beautifulzarr?

Fork the repository, create a new folder inside the _data folder and name it your GitHub username. It should look like this _data/. Ex. _data/MSanKeys963/
Browse the vast internet 🌏 (WWW) and find how the community uses Zarr for their work. Try to capture their work in screenshots, code snippets (along with visualisations), use cases, and mentions on Twitter/social media (#zarr or similar). Add your result in a .md file in the folder created above. Ex. _data/MSanKeys963/README.md
Submit a GitHub Pull Request. The maintainers for this repo will review your PR and then merge it.

Here are some tips to get you started:

→ Tip 1: A few code snippets show how easy it is to visualise Zarr data. Check here. 💡

→ Tip 2: Check out the MSanKeys963’s folder and README.md to get some inspiration! 🌳

Once your PR is accepted and merged, you’ve successfully passed the Outreachy contribution phase. 🎉

Now you have to wait for the final results, or you can start working on additional issues to increase your chances of selection. Please refer to the first blog here for additional work. 🤞🏻

If you have any queries during the contribution phase, don’t hesitate to ask them in the Gitter. We’re here to help you!

Happy Contributing! ✌🏻

~Sanket Verma