pizzarr ships in two tiers. The CRAN build is pure R — no Rust
compilation, no system dependencies. It handles local and HTTP Zarr
stores with sequential chunk I/O via lapply. The r-universe
build compiles in the zarrs
Rust crate via extendr, adding
parallel decompression, cloud-native store backends (S3, GCS), and
codecs beyond what R packages provide.
The split exists because CRAN’s macOS build machines ship a Rust
toolchain (rustc 1.84) that is too old for zarrs, which requires rustc
>= 1.91. r-universe builds against the latest stable toolchain, so it
can compile zarrs and distribute pre-built binaries. End users on either
tier install with install.packages() — no Rust toolchain
needed.
Checking availability
pizzarr_compiled_features() lists the feature flags
compiled into the zarrs backend. On the CRAN tier it returns
character(0) with a message; on the r-universe tier it
returns the compiled capabilities:
pizzarr_compiled_features()
#> [1] "zarrs" "filesystem" "http_sync" "gzip" "blosc"
#> [6] "zstd" "object_store" "s3" "gcs"The internal flag .pizzarr_env$zarrs_available is a
logical scalar set once at package load. Dispatch logic throughout
pizzarr checks this flag to decide whether to call into Rust or fall
through to the R-native path:
pizzarr:::.pizzarr_env$zarrs_available
#> [1] TRUEUpgrading to the zarrs tier
pizzarr_upgrade() prints the r-universe install command
when zarrs is not compiled in, or confirms that the backend is already
present:
pizzarr_upgrade()
#> zarrs backend is already available.The startup message that CRAN users see on
library(pizzarr) can be silenced with
options(pizzarr.suggest_runiverse = FALSE).
Probing store metadata
The examples below require the zarrs backend. When this vignette is built without it, the code chunks are not evaluated.
zarrs_node_exists() opens a filesystem store via the
Rust backend, probes for V2 and V3 metadata keys at a given path, and
returns a list with three fields: exists (logical),
node_type (character), and zarr_format
(integer or NULL). The store handle is cached on the Rust side —
subsequent calls to the same store path reuse it without re-opening.
V2 store
v2_root <- pizzarr_sample("fixtures/v2/data.zarr")
# Root group
zarrs_node_exists(v2_root, "")
#> $exists
#> [1] TRUE
#>
#> $node_type
#> [1] "group"
#>
#> $zarr_format
#> [1] 2
# An array within the store
zarrs_node_exists(v2_root, "1d.contiguous.lz4.i2")
#> $exists
#> [1] TRUE
#>
#> $node_type
#> [1] "array"
#>
#> $zarr_format
#> [1] 2
# A path that does not exist
zarrs_node_exists(v2_root, "does_not_exist")
#> $exists
#> [1] FALSE
#>
#> $node_type
#> [1] "none"
#>
#> $zarr_format
#> NULLV3 store
V2 and V3 detection is automatic. zarrs probes for
zarr.json first (V3), then falls back to
.zarray / .zgroup (V2):
v3_root <- pizzarr_sample("fixtures/v3/data.zarr")
zarrs_node_exists(v3_root, "")
#> $exists
#> [1] TRUE
#>
#> $node_type
#> [1] "group"
#>
#> $zarr_format
#> [1] 3Store cache management
The Rust backend holds open store handles in a process-global cache
keyed by normalized path. zarrs_close_store() removes a
handle from the cache and returns TRUE. A second call to
the same path returns FALSE — it was already removed:
zarrs_close_store(v2_root)
#> [1] TRUE
zarrs_close_store(v2_root)
#> [1] FALSE
zarrs_close_store(v3_root)
#> [1] TRUEArray metadata
zarrs_open_array_metadata() opens a zarrs array and
returns its metadata as a named list. The store handle is cached, so
repeated calls to the same store are fast. The returned list contains
shape, chunks, dtype,
r_type, fill_value_json,
zarr_format, and order.
V2 array
v2_root <- pizzarr_sample("fixtures/v2/data.zarr")
zarrs_open_array_metadata(v2_root, "1d.contiguous.raw.i2")
#> $shape
#> [1] 4
#>
#> $chunks
#> [1] 4
#>
#> $dtype
#> [1] "int16 / <i2"
#>
#> $r_type
#> [1] "integer"
#>
#> $fill_value_json
#> [1] "[0, 0]"
#>
#> $zarr_format
#> [1] 2
#>
#> $order
#> [1] "C"V3 array
V3 arrays work the same way. The zarr_format field
distinguishes V2 from V3:
v3_root <- pizzarr_sample("fixtures/v3/data.zarr")
zarrs_open_array_metadata(v3_root, "1d.contiguous.gzip.i2")
#> $shape
#> [1] 4
#>
#> $chunks
#> [1] 4
#>
#> $dtype
#> [1] "int16 / <i2"
#>
#> $r_type
#> [1] "integer"
#>
#> $fill_value_json
#> [1] "[0, 0]"
#>
#> $zarr_format
#> [1] 3
#>
#> $order
#> [1] "C"Data type classification
The r_type field maps zarrs data types to R-compatible
type families. zarrs numeric types are classified as
"double", "integer", or "logical"
based on what R can represent natively:
- double: float64 (zero-cost), float32 (widened), uint32/int64/uint64 (widened, precision risk > 2^53)
- integer: int32 (zero-cost), int8/int16/uint8/uint16 (widened)
- logical: bool
Unsupported types (strings, complex) report
"unsupported" and fall back to the R-native code path.
zarrs_close_store(v2_root)
#> [1] TRUE
zarrs_close_store(v3_root)
#> [1] TRUERuntime info and tuning
zarrs_runtime_info() reports the current zarrs
configuration — the codec concurrency target, thread pool size, how many
store handles are cached, and which features were compiled in:
zarrs_runtime_info()
#> $codec_concurrent_target
#> [1] 4
#>
#> $nthreads
#> [1] 4
#>
#> $store_cache_entries
#> [1] 0
#>
#> $tokio_active
#> [1] FALSE
#>
#> $compiled_features
#> [1] "zarrs" "filesystem" "http_sync" "gzip" "blosc"
#> [6] "zstd" "object_store" "s3" "gcs"pizzarr_config()
pizzarr_config() is the main interface for viewing and
changing concurrency settings. Called with no arguments it returns the
current state; with arguments it sets the specified values:
# View current settings
pizzarr_config()
#> $nthreads
#> [1] 4
#>
#> $concurrent_target
#> [1] 4
#>
#> $http_batch_range_requests
#> [1] TRUE
# Set codec concurrency to 2 parallel operations per read/write
pizzarr_config(concurrent_target = 2L)
zarrs_runtime_info()$codec_concurrent_target
#> [1] 2Three settings are available:
-
nthreads — rayon thread pool size. Set-once per R
session (the thread pool can only be initialised once). For reliable
session-level control, set the
PIZZARR_NTHREADSenvironment variable before starting R. - concurrent_target — how many codec operations zarrs runs in parallel within a single read or write call. Can be changed at any time.
- http_batch_range_requests — whether HTTP stores use multipart range requests (default TRUE). Set to FALSE for servers with incomplete multipart support. Takes effect on the next store open.
All three settings can also be configured via environment variables
(PIZZARR_NTHREADS, PIZZARR_CONCURRENT_TARGET,
PIZZARR_HTTP_BATCH_RANGE_REQUESTS) or R options
(pizzarr.nthreads, etc.), which are read at package load
time. Environment variables persist across sessions without needing
.Rprofile edits.
The lower-level zarrs_set_codec_concurrent_target()
function is still available for direct use:
zarrs_set_codec_concurrent_target(2L)
#> NULL
zarrs_runtime_info()$codec_concurrent_target
#> [1] 2Reading data via zarrs
When the zarrs backend is available and the selection is a contiguous
slice (step == 1), ZarrArray$get_item() dispatches reads to
zarrs automatically. zarrs handles chunk identification, parallel
decompression, and codec execution internally, bypassing pizzarr’s
R-native chunk loop. Scalar integer selections (e.g., selecting a single
row of a matrix) are also eligible — they become length-1 ranges on the
Rust side. Unsupported selections (step > 1 slices, fancy indexing,
MemoryStore) fall through to the R-native path transparently.
Basic read
d <- tempfile("zarrs_vignette_")
z <- zarr_create(store = d, shape = c(100L, 50L), chunks = c(10L, 10L),
dtype = "<f8")
z$set_item("...", array(as.double(seq_len(5000)), dim = c(100, 50)))
#> NULL
# Re-open and read a subset --- zarrs handles the chunk I/O
z2 <- zarr_open(store = d)
result <- z2$get_item(list(slice(1L, 10L), slice(1L, 5L)))
dim(result$data)
#> [1] 10 5Direct zarrs_get_subset call
For lower-level access, zarrs_get_subset() reads a
contiguous subset directly via the Rust backend. Ranges are 0-based with
exclusive stop, matching zarrs conventions:
result <- zarrs_get_subset(d, "", list(c(0L, 10L), c(0L, 5L)), NULL)
str(result)
#> List of 2
#> $ data : num [1:50] 1 2 3 4 5 6 7 8 9 10 ...
#> $ shape: int [1:2] 10 5Concurrency control
The optional concurrent_target parameter (or the
pizzarr.concurrent_target R option) controls how many
parallel codec operations zarrs uses within a single read call. Setting
it to 1L disables parallel decompression:
result <- zarrs_get_subset(d, "", list(c(0L, 10L), c(0L, 5L)), 1L)
length(result$data)
#> [1] 50
zarrs_close_store(d)
#> [1] TRUE
unlink(d, recursive = TRUE)Creating arrays via zarrs
When the zarrs backend is available and the store is a writable
filesystem path, zarr_create() dispatches array creation to
zarrs instead of building metadata JSON in R. zarrs validates the
metadata structure, writes it to the store, and the array is ready for
data. The dispatch is transparent — the same zarr_create()
call works on both tiers, and unsupported configurations (MemoryStore,
object dtypes, custom filters) fall through to the R-native path.
Transparent dispatch
The zarr_create() examples earlier in this vignette
already use this path when zarrs is available. The zarrs backend handles
V2 and V3 formats, all 11 numeric data types, and four codec
presets:
# V3 array with gzip compression
d <- tempfile("zarrs_create_vignette_")
z <- zarr_create(store = d, shape = c(20L, 10L), chunks = c(10L, 10L),
dtype = "<f8", zarr_format = 3L)
z
#> <ZarrArray> /
#> Shape : (20, 10)
#> Chunks : (10, 10)
#> Data type : <f8
#> Fill value : 0
#> Order : C
#> Read-only : FALSE
#> Compressor : ZstdCodec
#> Store type : DirectoryStore
#> Zarr format : 3
# Confirm V3 metadata was written
file.exists(file.path(d, "zarr.json"))
#> [1] TRUE
zarrs_close_store(d)
#> [1] FALSE
unlink(d, recursive = TRUE)Direct zarrs_create_array call
zarrs_create_array() provides lower-level access to the
Rust creation path. It accepts V3-style data type names
("float64", "int32", "bool",
etc.) and a codec preset string ("none",
"gzip", "blosc", or "zstd"). The
return value is the same metadata list as
zarrs_open_array_metadata():
d <- tempfile("zarrs_create_direct_")
dir.create(d)
meta <- zarrs_create_array(
store_url = d,
array_path = "",
shape = c(100L, 50L),
chunks = c(10L, 10L),
dtype = "float64",
codec_preset = "gzip",
fill_value = 0.0,
attributes_json = "{}",
zarr_format = 3L
)
str(meta)
#> List of 7
#> $ shape : int [1:2] 100 50
#> $ chunks : int [1:2] 10 10
#> $ dtype : chr "float64"
#> $ r_type : chr "double"
#> $ fill_value_json: chr "[0, 0, 0, 0, 0, 0, 0, 0]"
#> $ zarr_format : int 3
#> $ order : chr "C"The array is immediately usable for reads and writes:
zarrs_set_subset(d, "", list(c(0L, 10L), c(0L, 5L)),
as.double(1:50), NULL)
#> [1] TRUE
result <- zarrs_get_subset(d, "", list(c(0L, 10L), c(0L, 5L)), NULL)
head(result$data)
#> [1] 1 2 3 4 5 6
zarrs_close_store(d)
#> [1] TRUE
unlink(d, recursive = TRUE)Codec presets
The zarrs creation path supports four named codec presets. Custom codec configurations fall through to the R-native path.
| Preset | V2 compressor | V3 codec chain | Notes |
|---|---|---|---|
"none" |
null | bytes only | No compression |
"gzip" |
gzip, level 1 | bytes + gzip(1) | Fast, reasonable ratio |
"blosc" |
blosc, lz4, clevel 5 | bytes + blosc(lz4, 5) | Requires blosc feature |
"zstd" |
— | bytes + zstd(3) | V3 only; requires zstd feature |
One difference from the R-native path: zarrs uses the
"gzip" compressor id for V2 arrays, while zarr-python uses
"zlib". Both produce gzip-compatible output, and zarrs
reads either id when opening existing arrays.
Writing data via zarrs
The write path mirrors the read path. When the zarrs backend is
available and the selection qualifies (contiguous slices,
filesystem-backed store), ZarrArray$set_item() dispatches
writes to zarrs instead of iterating over chunks in R. zarrs encodes the
data, splits it across the affected chunks, and writes them to disk —
using its internal thread pool for parallel compression when multiple
chunks are involved.
Data type narrowing happens on the Rust side. R doubles narrow to the array’s stored type (float32, int64, uint32, etc.) and R integers narrow to smaller integer types (int16, int8, uint8, uint16) with range checking. An out-of-range value produces an error rather than silent truncation.
Basic write
d <- tempfile("zarrs_write_vignette_")
z <- zarr_create(store = d, shape = c(20L, 10L), chunks = c(10L, 10L),
dtype = "<f8")
# set_item dispatches to zarrs when eligible
z$set_item("...", array(as.double(1:200), dim = c(20, 10)))
#> NULL
# Read back to confirm
z2 <- zarr_open(store = d)
result <- z2$get_item(list(slice(1L, 5L), slice(1L, 3L)))
result$data
#> [,1] [,2] [,3]
#> [1,] 1 21 41
#> [2,] 2 22 42
#> [3,] 3 23 43
#> [4,] 4 24 44
#> [5,] 5 25 45Partial overwrite
Writing to a subset of an existing array works the same way. zarrs reads the affected chunks, merges the new data, and writes them back:
# Overwrite rows 3-7, columns 1-2
z$set_item(list(slice(3L, 7L), slice(1L, 2L)),
array(rep(-1.0, 10), dim = c(5, 2)))
#> NULL
result <- z2$get_item(list(slice(1L, 10L), slice(1L, 3L)))
result$data
#> [,1] [,2] [,3]
#> [1,] 1 21 41
#> [2,] 2 22 42
#> [3,] -1 -1 43
#> [4,] -1 -1 44
#> [5,] -1 -1 45
#> [6,] -1 -1 46
#> [7,] -1 -1 47
#> [8,] 8 28 48
#> [9,] 9 29 49
#> [10,] 10 30 50Direct zarrs_set_subset call
zarrs_set_subset() provides lower-level access to the
Rust write path. Data is a flat vector in R’s native F-order
(column-major) — the Rust backend handles the F-to-C order conversion
internally. The function returns TRUE on success:
# Write 10 values to the first row (0-based range [0, 1) x [0, 10))
zarrs_set_subset(d, "", list(c(0L, 1L), c(0L, 10L)),
as.double(101:110), NULL)
#> [1] TRUE
result <- zarrs_get_subset(d, "", list(c(0L, 1L), c(0L, 10L)), NULL)
result$data
#> [1] 101 102 103 104 105 106 107 108 109 110
zarrs_close_store(d)
#> [1] TRUE
unlink(d, recursive = TRUE)HTTP reads via zarrs
When the http_sync feature is compiled in, the zarrs
backend can read directly from HTTP/HTTPS Zarr stores using the zarrs_http crate. This
bypasses pizzarr’s R-native crul-based chunk loop, giving
parallel chunk decode on remote data.
HTTP stores are read-only in zarrs — write dispatch
(set_item) falls through to the R-native path
automatically.
Transparent dispatch
The zarrs fast path activates automatically when an
HttpStore-backed array is read with a contiguous selection.
No code changes are needed compared to the R-native path:
url <- "https://raw.githubusercontent.com/DOI-USGS/rnz/main/inst/extdata/bcsd.zarr"
z <- zarr_open(store = HttpStore$new(url))
# zarrs handles the HTTP reads + parallel decompression
pr <- z$get_item("pr")
pr
#> <ZarrArray> /pr
#> Shape : (12, 33, 81)
#> Chunks : (12, 33, 81)
#> Data type : <f4
#> Fill value : 1.00000002004088e+20
#> Order : C
#> Read-only : TRUE
#> Compressor : ZstdCodec
#> Store type : HttpStore
#> Zarr format : 2Direct zarrs_get_subset from HTTP
zarrs_get_subset() also works with HTTP URLs. The store
handle is cached on the Rust side, so repeated reads to the same URL
reuse the connection:
meta <- zarrs_open_array_metadata(url, "pr")
str(meta[c("shape", "dtype", "zarr_format")])
#> List of 3
#> $ shape : int [1:3] 12 33 81
#> $ dtype : chr "float32 / <f4"
#> $ zarr_format: int 2
# Read a single element (first along each dimension)
ranges <- lapply(seq_along(meta$shape), function(i) c(0L, 1L))
result <- zarrs_get_subset(url, "pr", ranges, NULL)
result$data
#> [1] 159.08
zarrs_close_store(url)
#> [1] TRUEFeature detection
Check whether HTTP support is compiled in with
pizzarr_compiled_features(). When "http_sync"
is present, zarrs can open http:// and
https:// URLs. When it is absent, HTTP reads fall through
to the R-native crul-based path:
"http_sync" %in% pizzarr_compiled_features()
#> [1] TRUES3 reads via zarrs
When the s3 feature is compiled in, the zarrs backend
can read from Amazon S3 buckets using the object_store crate with an
async-to-sync adapter. Public buckets work without credentials (unsigned
requests). Authenticated access uses standard AWS environment variables
(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,
AWS_REGION).
S3 stores are currently read-only via zarrs — write operations fall through to the R-native path.
# OME-Zarr bonsai dataset on AWS Open Data (V2, zstd, uint8)
s3_url <- "s3://ome-zarr-scivis/v0.4/64x0/bonsai.ome.zarr"
# Read array metadata
meta <- zarrs_open_array_metadata(s3_url, "scale0/bonsai")
str(meta[c("shape", "dtype", "zarr_format")])
#> List of 3
#> $ shape : int [1:3] 256 256 256
#> $ dtype : chr "uint8 / |u1"
#> $ zarr_format: int 2
# Read a small subset (first 4x4x4 corner)
result <- zarrs_get_subset(s3_url, "scale0/bonsai",
list(c(0L, 4L), c(0L, 4L), c(0L, 4L)), NULL)
str(result)
#> List of 2
#> $ data : int [1:64] 40 40 40 41 40 40 40 41 40 40 ...
#> $ shape: int [1:3] 4 4 4
zarrs_close_store(s3_url)
#> [1] TRUEGCS and other cloud stores
GCS data hosted on Google Cloud Storage is publicly accessible via HTTPS endpoints. The zarrs HTTP backend reads these directly:
# Pangeo ECCO ocean basins (V2, blosc/lz4, float32)
gcs_url <- "https://storage.googleapis.com/pangeo-data/ECCO_basins.zarr"
meta <- zarrs_open_array_metadata(gcs_url, "basin_mask")
cat("Shape:", paste(meta$shape, collapse = " x "), "\n")
#> Shape: 13 x 90 x 90
cat("Dtype:", meta$dtype, "\n")
#> Dtype: float32 / <f4
# Read a single basin mask slice
result <- zarrs_get_subset(gcs_url, "basin_mask",
list(c(0L, 1L), c(0L, 90L), c(0L, 90L)), NULL)
cat("Slice dimensions:", paste(result$shape, collapse = " x "), "\n")
#> Slice dimensions: 1 x 90 x 90
zarrs_close_store(gcs_url)
#> [1] TRUEAuthenticated GCS access via gs:// URLs requires the
gcs compiled feature and GCP credentials (environment
variables or application default credentials). The S3Store
and GcsStore R6 classes provide URL wrappers for high-level
use with zarr_open():
C/F order handling
zarrs stores data in C-order (row-major), while R uses F-order (column-major). The Rust backend handles this conversion transparently:
-
Reads:
zarrs_get_subset()returns data in F-order, ready forarray(data, dim = shape)with noaperm()needed. -
Writes:
zarrs_set_subset()accepts F-order data and converts to C-order internally before writing to the store.
The transpose uses cache-blocked tiling for 2D arrays and
output-order iteration with incremental index tracking for higher
dimensions, matching or exceeding the performance of R’s C-level
aperm().
