This blog post aims to overview new features, especially newly added
experimental support for reading and writing to Zarr V3, the upcoming
format for storing N-dimensional chunked compressed data.
This blog also highlights other enhancements like
FSStore from an existing fsspec filesystem, performance
improvement for Zarr arrays when appending data to S3, bug fixes,
documentation and a maintenance fix.
Add support for reading and writing Zarr V3
Zarr Python 2.12 provides experimental infrastructure for reading and writing
the upcoming V3 spec of the Zarr format. Users wishing to prepare for the
migration can set the environment variable
ZARR_V3_EXPERIMENTAL_API to begin
experimenting, however data written with this API should not yet be considered
zarr._store.v3 package has the necessary classes and functions for
evaluating Zarr V3. Since the design is not finalised, the classes and
functions are not automatically imported into the regular Zarr namespace.
The pre-release can be installed via:
pip install --pre zarr.
How to create arrays using Zarr V3:
- First, you need to export the
ZARR_V3_EXPERIMENTAL_API=1to your shell:
Type this in your terminal:
- Here’s a small code snippet for creating V3 arrays:
>>>import zarr >>>z = zarr.create((10000, 10000), chunks=(100, 100), dtype='f8', compressor='default', path='path-where-you-want-zarr-v3-array', zarr_version=3)
- Further, you can use
z.infoto see details about the array you just created:
>>>z.info Name : path-where-you-want-zarr-v3-array Type : zarr.core.Array Data type : float64 Shape : (10000, 10000) Chunk shape : (100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : zarr._storage.v3.KVStoreV3 No. bytes : 800000000 (762.9M) No. bytes stored : 557 Storage ratio : 1436265.7 Chunks initialized : 0/10000
You can also check
Store type here (which indicates Zarr V3).
We see Chunks initialized: 0/10000 because we haven’t written anything to our arrays yet. The chunks will be initialized when we start writing data to the arrays.
There have been significant changes to Zarr’s Python codebase to implement V3 functionality. Highlights of the main changes include:
- A new function is added in
store.py, which verifies that a key conforms to the V3 specification.
- Added function in
store.pyto ensure internally that Zarr stores are always a class with a specific interface derived from
Store, which is slightly different from
metadatafiles from the data (arrays). Previously metadata and arrays were stored together in a consolidated group known as
- Changes in
convenience.pyto use Zarr V3. The default value is
None; it will attempt to infer the version from
storeif possible; otherwise, it will fall back to V2.
- Consolidating all metadata for groups and arrays within the given store into
a single resource and putting it under the given key. The changes can be seen
- Modification in
creation.py, which enables the creation of an array using Zarr V3. If
None, it will be inferred from
chunk_store; otherwise defaults to V2.
meta.pywith the new V3 data types links. The V3 data types are listed here.
- New tests added for all the new and modified features!
If you’re interested in browsing through all of the code changes, please refer to PR #898.
Appending performance improvement
The old implementation iterated through all the
old chunks and removed those
that didn’t exist in the
new chunks. As a result, it led to significant time
delays when appending data to Zarr arrays in cloud services like S3.
The new and improved implementation will iterate through each dimension and
only find and remove the chunk slices in
old but not in
new data. It also
introduced a mutable list to dynamically adjust the number of chunks along the
already-processed dimensions to avoid duplicate chunk removal.
- If you have created a fsspec filesystem outside of Zarr, you can now pass it
as a keyword argument to
- Added number encoder for
json.dumpsto support NumPy integers in
Bugs, Documentation and Maintenance
Details on these features as well as the full list of all changes in
2.12.0a1 are available on the release notes.
Before the pre-release version
2.12.0a1 there were releases
2.11.3 from Zarr
Python package. A special shout-out to all the contributors who made previous
Also, a huge thanks to the contributors who made the current version
2.12.0a1 possible! 🙌🏻
If you find the above features useful and end up using them, please mention @zarr_dev on Twitter and tweet using #ZarrData, and we’ll make sure to get it featured! ✌🏻