Skip to contents

This vignette demonstrates pizzarr reading a Zarr V3 store that was written by zarr-python 3.x — showing that the two implementations are interoperable.

The test store includes arrays with various data types, compression codecs, chunk layouts, fill values, and nested groups. It was generated with the script bundled in this project at: inst/extdata/fixtures/v3/generate-v3-zarr-python.py.

Summary

  • Data types: bool, uint8, int16, int32, float32, float64
  • Compression: uncompressed, gzip, zstd, blosc
  • Layouts: 1D, 2D, 3D; contiguous and ragged chunks
  • Fill values: default (0) and custom (-9999)
  • Groups: nested groups with attributes
  • Array metadata: per-array attributes (units, long_name)

Open the store

library(pizzarr)

store_path <- pizzarr_sample("fixtures/v3/zarr_python_v3.zarr")
root <- zarr_open(store_path)

Root-level attributes

zarr-python embedded these attributes when it created the store:

root$get_attrs()$to_list()
#> $generator
#> [1] "zarr-python"
#> 
#> $zarr_python_version
#> [1] "3.1.5"
#> 
#> $description
#> [1] "V3 test data for pizzarr interop testing"

Reading different data types

Integer (int32, gzip compressed)

a <- root$get_item("int32_1d")
a$get_shape()
#> [1] 6
a$as.array()
#> [1] 10 20 30 40 50 60

Float64 (zstd compressed)

a <- root$get_item("float64_1d")
a$as.array()
#> [1] 1.1 2.2 3.3 4.4 5.5

Boolean (uncompressed)

a <- root$get_item("bool_1d")
a$as.array()
#> [1]  TRUE FALSE  TRUE FALSE

Unsigned integer (uint8, chunked)

a <- root$get_item("uint8_1d")
a$as.array()
#> [1]   0 127 128 255

Float32 (blosc compressed)

a <- root$get_item("float32_1d")
a$as.array()
#> [1] -1.50  0.00  1.50  3.14

Multi-dimensional arrays

2D int16 (gzip)

a <- root$get_item("int16_2d")
a$get_shape()
#> [1] 4 3
a$as.array()
#>      [,1] [,2] [,3]
#> [1,]    0    1    2
#> [2,]    3    4    5
#> [3,]    6    7    8
#> [4,]    9   10   11

3D float64 (zstd)

a <- root$get_item("float64_3d")
a$get_shape()
#> [1] 2 3 4
a$as.array()
#> , , 1
#> 
#>      [,1] [,2] [,3]
#> [1,]    0    4    8
#> [2,]   12   16   20
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3]
#> [1,]    1    5    9
#> [2,]   13   17   21
#> 
#> , , 3
#> 
#>      [,1] [,2] [,3]
#> [1,]    2    6   10
#> [2,]   14   18   22
#> 
#> , , 4
#> 
#>      [,1] [,2] [,3]
#> [1,]    3    7   11
#> [2,]   15   19   23

Fill values

The with_fill array was created with fill_value = -9999.0 and only the first chunk was written. The second chunk returns fill values:

a <- root$get_item("with_fill")
a$as.array()
#> [1]     1     2     3 -9999 -9999 -9999

Ragged (non-aligned) chunks

When the array shape is not evenly divisible by the chunk shape, edge chunks are smaller. This 5x7 array uses 3x4 chunks:

a <- root$get_item("ragged_2d")
a$get_shape()
#> [1] 5 7
a$as.array()
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    0    1    2    3    4    5    6
#> [2,]    7    8    9   10   11   12   13
#> [3,]   14   15   16   17   18   19   20
#> [4,]   21   22   23   24   25   26   27
#> [5,]   28   29   30   31   32   33   34

Nested groups and array attributes

The store contains a sub-group with attributes and two arrays that carry their own metadata:

g <- root$get_item("var_group")
g$get_attrs()$to_list()
#> $description
#> [1] "Group with multiple variables"
#> 
#> $source
#> [1] "zarr-python 3.x test generation"
temp <- g$get_item("temperature")
temp$get_attrs()$to_list()
#> $units
#> [1] "K"
#> 
#> $long_name
#> [1] "Temperature"
temp$as.array()
#>       [,1]  [,2]  [,3]  [,4]
#> [1,] 280.1 281.2 282.3 283.4
#> [2,] 284.5 285.6 286.7 287.8
#> [3,] 288.9 290.0 291.1 292.2
pres <- g$get_item("pressure")
pres$get_attrs()$to_list()
#> $units
#> [1] "Pa"
#> 
#> $long_name
#> [1] "Pressure"
pres$as.array()
#>        [,1]   [,2]   [,3]   [,4]
#> [1,] 101325 101320 101315 101310
#> [2,] 101305 101300 101295 101290
#> [3,] 101285 101280 101275 101270