I. Abstract
The GeoZarr Unified Data Model and Encoding Standard specifies a conceptual and implementation framework for representing multidimensional, geospatial datasets using the Zarr format. This Standard builds upon the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, and introduces interoperable constructs for tiling, georeferencing, and metadata integration.
The model defines core elements—dimensions, coordinate variables, data variables, attributes—and optional extensions for multi-resolution overviews, affine geotransforms, and STAC metadata. Encoding guidance is provided for Zarr Version 2 and Zarr Version 3, including chunking, group hierarchy, and metadata conventions.
GeoZarr aims to bridge scientific and geospatial communities by enabling round-trip transformations with formats such as NetCDF and GeoTIFF, and supporting compatibility with tools in the scientific Python and geospatial ecosystems. This Standard enables scalable, standards-compliant, and semantically rich data structures for cloud-native Earth observation applications.
II. Keywords
The following are keywords to be used by search engines and document catalogues.
ogcdoc, OGC document, API, openapi, html
III. Preface
The GeoZarr Unified Data Model and Encoding Standard defines a layered, standards-based framework for representing and encoding geospatial and scientific datasets in the Zarr format. It integrates foundational specifications such as the Unidata Common Data Model (CDM), the CF Conventions, and selected OGC and community standards to enable semantic, structural, and operational interoperability across Earth observation platforms and geospatial ecosystems.
This Standard introduces a unified model that harmonises metadata structures, array-based data representations, coordinate referencing, and multiscale tiling semantics. It provides a coherent framework that facilitates encoding into Zarr v2 and v3, supporting scalable, cloud-native workflows.
The purpose of this document is to provide implementation guidance and normative structure for consistent, interoperable adoption of GeoZarr across tools, platforms, and services. This work extends prior standardisation efforts within the OGC, including OGC API – Tiles, the Tile Matrix Set Standard, and EO metadata conventions, and anticipates integration with catalogue systems such as STAC.
This Standard has been developed in collaboration with contributors from Earth observation, climate science, geospatial analysis, and cloud-native geodata infrastructure communities. Future work may extend this model to additional storage formats, API services, and semantic layers.
IV. Security considerations
No security considerations have been made for this document.
V. Submitting Organizations
The following organizations submitted this Document to the Open Geospatial Consortium (OGC):
- Organization One
- Organization Two
VI. Submitters
All questions regarding this submission should be directed to the editor or the submitters:
Table — Table of submitters
Name | Affiliation |
Christophe Noël (editor) | Spacebel |
Brianna Pagán (editor) | DevSeed |
Ryan Abernathey | EarthMover |
TBD | TBD |
1. Scope
The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing and encoding geospatial and scientific datasets using the Zarr format. The scope of this Standard includes the definition of a format-agnostic unified data model, the specification of its encoding into Zarr Version 2 and Version 3, and the establishment of extension points to support interoperability with external metadata and tiling standards.
This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models, such as the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, with operational encoding formats suitable for cloud-native storage and analysis.
Typical use cases include the storage, transformation, discovery, and processing of raster and gridded data, data cubes with temporal or vertical dimensions, and catalogue-enabled datasets integrated with metadata standards such as STAC and OGC Tile Matrix Sets.
2. Conformance
The GeoZarr Unified Data Model is structured around a modular set of requirements classes. These classes define the conformance criteria for datasets and implementations adopting the GeoZarr specification. Each class provides a distinct set of structural or semantic expectations, facilitating interoperability across a broad spectrum of geospatial and scientific use cases.
The Core requirements class defines the minimal compliance necessary to claim conformance with the GeoZarr Unified Data Model. It is intentionally open and permissive, supporting incremental adoption and broad compatibility with existing Zarr tools and data models based on the Unidata Common Data Model (CDM).
Additional requirements classes are defined to support enhanced functionality, semantic richness, and interoperability with established geospatial conventions and systems. These include extensions for time series, coordinate systems, affine transformations, and multiscale tiling.
Table 1 — Requirements Classes Overview
Requirements Class | Description | Identifier |
---|---|---|
Core Model | Specifies minimum conformance for encoding multidimensional datasets in Zarr using CDM-aligned constructs. Includes dimensions, variables, attributes, and groups. | http://www.opengis.net/spec/geozarr/1.0/conf/core |
Time Series Support | Defines conventions for temporal dimensions and time coordinate variables to support time-aware arrays. | http://www.opengis.net/spec/geozarr/1.0/conf/time |
Coordinate Reference Systems | Specifies use of CF-compliant CRS metadata, including grid_mapping, standard_name, and EPSG codes. | http://www.opengis.net/spec/geozarr/1.0/conf/crs |
GeoTransform Metadata | Enables affine spatial referencing via GDAL-compatible GeoTransform metadata and optional interpolation hints. | http://www.opengis.net/spec/geozarr/1.0/conf/geotransform |
Multiscale Overviews | Specifies multiscale tiled layout using zoom levels and Tile Matrix Sets as per OGC API – Tiles. | http://www.opengis.net/spec/geozarr/1.0/conf/overviews |
STAC Metadata Integration | Allows embedding or referencing of STAC Collection/Item metadata for discovery and indexing. | http://www.opengis.net/spec/geozarr/1.0/conf/stac |
Projection Coordinates | Supports encoding of data in projected coordinate systems and association with spatial reference metadata. | http://www.opengis.net/spec/geozarr/1.0/conf/projected |
Spectral Bands | Defines conventions for encoding multi-band imagery, including band identifiers, wavelengths, and metadata attributes. | http://www.opengis.net/spec/geozarr/1.0/conf/bands |
Each requirements class is independently defined. Implementations may declare conformance with any subset of classes appropriate to their use case. All classes build upon the Core model.
Associated conformance tests for each class are detailed in Annex A.
3. Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
Miles, A., et al.: Zarr Specification Version 2. Zarr Developers. https://zarr.readthedocs.io/en/stable/spec/v2.html
Zarr Community: Zarr Specification Version 3. https://zarr-specs.readthedocs.io/en/latest/v3.0
Unidata: The Common Data Model. https://docs.unidata.ucar.edu/netcdf-java/5.0/userguide/common_data_model_overview.html
Rew, R., Davis, G.: NetCDF: An Interface for Scientific Data Access. IEEE Computer Graphics and Applications, 10(4), 76–82 (1990). https://doi.org/10.1109/38.56302
CF Community: Climate and Forecast (CF) Metadata Conventions, Version 1.10. https://cfconventions.org/
GDAL Developers: GDAL/OGR Version 3.8 Documentation. Open Source Geospatial Foundation. https://gdal.org
Open Geospatial Consortium: OGC Two Dimensional Tile Matrix Set and Tile Pyramid (OGC 17-083r2). https://docs.ogc.org/is/17-083r2/17-083r2.html
STAC Community: STAC Specification v1.0.0. https://stacspec.org/en/
Open Geospatial Consortium: OGC Compliance Testing Policies and Procedures, OGC 08-134r10. https://portal.ogc.org/files/?artifact_id=55184
4. Terms, definitions and abbreviated terms
This document uses the terms defined in OGC Policy Directive 49, which is based on the ISO/IEC Directives, Part 2, Rules for the structure and drafting of International Standards. In particular, the word “shall” (not “must”) is the verb form used to indicate a requirement to be strictly followed to conform to this document and OGC documents do not use the equivalent phrases in the ISO/IEC Directives, Part 2.
This document also uses terms defined in the OGC Standard for Modular specifications (OGC 08-131r3), also known as the ‘ModSpec’. The definitions of terms such as standard, specification, requirement, and conformance test are provided in the ModSpec.
For the purposes of this document, the following additional terms and definitions apply.
4.1. Terms and definitions
A multidimensional, regularly spaced collection of values (e.g., raster data or gridded measurements), typically indexed by dimensions such as time, latitude, longitude, or spectral band.
A sub-array representing a partition of a larger array, used to optimise data access and storage. In Zarr, data is stored and accessed as a collection of independently compressed chunks.
A one-dimensional array whose values define the coordinate system for a dimension of one or more data variables. Typical examples include latitude, longitude, time, or vertical levels.
An array containing the primary geospatial or scientific measurements of interest (e.g., temperature, reflectance). Data variables are defined over one or more dimensions and associated with metadata.
An index axis along which arrays are organised. Dimensions provide a naming and ordering scheme for accessing data in multidimensional arrays (e.g., time, x, y, band).
A container for datasets, variables, dimensions, and metadata in Zarr. Groups may be nested to represent a logical hierarchy (e.g., for resolutions or collections).
Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.
A dataset that includes multiple representations of the same data variable at varying spatial resolutions. Each resolution level is associated with a tile matrix from an OGC Tile Matrix Set.
A spatial tiling scheme defined by a hierarchy of zoom levels and consistent grid parameters (e.g., scale, CRS). Tile Matrix Sets enable spatial indexing and tiling of gridded data.
An affine transformation used to convert between grid coordinates and geospatial coordinates, typically defined using the GDAL GeoTransform convention.
A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations.
4.2. Abbreviated terms
API
Application Programming Interface
CDM
Common Data Model
CF
Climate and Forecast Conventions
CRS
Coordinate Reference System
EPSG
European Petroleum Survey Group
GDAL
Geospatial Data Abstraction Library
GeoTIFF
Georeferenced Tagged Image File Format
JSON
JavaScript Object Notation
OGC
Open Geospatial Consortium
STAC
SpatioTemporal Asset Catalog
UDM
Unified Data Model
URI
Uniform Resource Identifier
URL
Uniform Resource Locator
Zarr
Zipped Array Storage format
5. Conventions
This section describes the conventions used throughout this Standard, including identifiers, metadata schemas, and referencing mechanisms relevant to the GeoZarr Unified Data Model.
5.1. Identifiers
The normative provisions in this Standard are denoted by the base URI:
http://www.opengis.net/spec/geozarr/1.0
All requirements, recommendations, permissions, and conformance tests that appear in this document are assigned relative URIs anchored to this base.
For example:
http://www.opengis.net/spec/geozarr/1.0/conf/core — refers to the Core Requirements Class of the GeoZarr Unified Data Model.
5.2. Data Encoding
This Standard specifies the encoding of geospatial data in the Zarr format. Zarr is a chunked, compressed, binary format for n-dimensional arrays, with support for both Version 2 and Version 3 encodings.
The specification makes extensive use of:
zarr.json metadata documents (Zarr v3)
.zgroup, .zattrs, .zarray metadata files (Zarr v2)
JSON-compatible structures for metadata, attributes, and conformance declarations
5.3. Schemas
Metadata schemas referenced in this Standard are represented using JSON-compatible objects and may be defined formally using JSON Schema. Metadata structures for tile matrix sets, STAC properties, or CF metadata may be embedded inline or referenced externally via URI.
5.4. URI Usage
URIs used in this Standard must comply with [RFC3986] (URI Syntax). When including reserved characters in a URI, they must be percent-encoded. Dataset identifiers, metadata links, and STAC references should use persistent and canonical forms to support reproducibility and catalogue integration.
6. Overview
The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing multidimensional geospatial data using the Zarr format. Developed under the guidance of the OGC GeoZarr Standards Working Group (SWG), the Standard establishes conventions for encoding scientific and Earth observation datasets in a way that promotes scalability, interoperability, and compatibility with cloud-native infrastructure.
GeoZarr is built on widely adopted community standards, including the Unidata Common Data Model (CDM) and Climate and Forecast (CF) Conventions. It introduces additional extensions and structural constructs to support multi-resolution tiling, geospatial referencing, and catalogue-enabled metadata integration (e.g., STAC).
This Standard provides both:
Core requirements, which define minimal compliance to represent array-based datasets using CDM constructs in Zarr, supporting open and permissive adoption across use cases.
Modular extension classes, which define additional capabilities such as time series support, affine geotransform referencing, multi-resolution overviews, and projection coordinates, in line with OGC and community practices.
These modular components enable GeoZarr to serve a wide range of applications—from basic EO data storage to high-performance, cloud-native visualisation and analytics workflows.
6.1. Encodings
GeoZarr supports encoding in both Zarr Version 2 and Zarr Version 3. Each version defines how arrays, groups, and metadata are stored within a directory-based structure. All metadata is encoded in JSON-compatible formats, ensuring both human readability and machine interoperability.
Encoding guidelines include:
Hierarchical grouping of datasets via Zarr groups.
Dimension indexing and binding via dimension metadata.
Attribute-based metadata compliant with CF conventions.
Multi-resolution overviews aligned with OGC Tile Matrix Sets.
Optional integration of STAC metadata for discovery and cataloguing.
JSON is the primary format for metadata, attributes, and structural declarations. Implementations are encouraged to support standardised naming conventions, EPSG code references, and structured metadata to facilitate search, validation, and transformation across platforms.
GeoZarr does not prescribe a single interface for data access. Instead, it enables serverless and cloud-native data access strategies by aligning its model with chunked, parallelisable storage patterns that are optimised for use in object stores and analytical environments.
7. Unified Data Model
7.1. Scope and Purpose
This Standard defines a unified data model (UDM) that provides a conceptual framework for representing geospatial and scientific data in Zarr. The purpose of this model is to support standards-based interoperability across Earth observation systems and analytical environments, while preserving compatibility with existing data models and software ecosystems..
The unified data model incorporates and extends the following established specifications and community standards:
Unidata Common Data Model (CDM) – Provides the foundational resource structure for scientific datasets, encompassing dimensions, coordinate systems, variables, and associated metadata elements.
CF (Climate and Forecast) Conventions – Defines a widely adopted metadata profile for describing spatiotemporal semantics in CDM-based datasets.
Selected constructs from related Standards and practices, including:
The OGC Tile Matrix Set Standard, which enables multi-resolution representations of gridded data.
GDAL geotransform metadata, used to express affine transformations and interpolation characteristics.
SpatioTemporal Asset Catalog (STAC) metadata elements for resource discovery and cataloguing (Collection and Item constructs).
The unified model is format-agnostic and describes the abstract structure of resources independently of the physical encoding. It does not redefine the semantics of the CDM or CF conventions, but introduces integration and extension points required to support tiled multiscale data, geospatial referencing, and metadata for discovery.
This clause specifies the logical composition of the unified model, the external standards it leverages, and the conformance points that facilitate harmonised implementation within the GeoZarr framework.
7.2. Foundational Model and Standards Reuse
The unified data model described in this Standard is derived from established community specifications to maximise interoperability and to enable the reuse of mature tools and practices. The model is grounded in the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, which together provide a robust framework for representing scientific and geospatial datasets.
7.2.1. Common Data Model (CDM)
The CDM defines a generalised schema for representing array-based scientific datasets. The following constructs are reused directly within the unified model:
Dimensions – Integer-valued, named axes that define the extents of data variables.
Coordinate Variables – Variables that supply coordinate values along dimensions, establishing spatial or temporal context.
Data Variables – Multidimensional arrays representing observed or simulated phenomena, associated with dimensions and coordinate variables.
Attributes – Key-value metadata elements used to describe variables and datasets semantically.
Groups – Optional hierarchical containers enabling logical organisation of resources and metadata.
The unified data model adopts these CDM components without modification excluding the user-defined types. Semantic interpretation remains consistent with the original CDM specification. GeoZarr structures are mapped to CDM constructs to ensure compatibility and clarity.
7.2.2. CF Conventions
The CF Conventions specify standardised metadata attributes and practices to describe spatiotemporal context within CDM-compliant datasets. These conventions support consistent interpretation of:
Coordinate systems
Grid mappings
Physical units
Standard variable naming
The unified data model supports CF-compliant metadata, including attributes such as standard_name, units, and grid_mapping. The unified data model does not prescribe CF compliance but enables it through permissive design. Partial adoption of CF attributes is supported, and non-compliant datasets may selectively adopt CF metadata as needed.
7.2.3. Standards-Based Extensions
To support additional capabilities, the model defines optional extension points referencing external OGC and community standards:
OGC Tile Matrix Set – Facilitates the definition of multiscale grid hierarchies for raster overviews.
GDAL Geotransform – Enables geospatial referencing through affine transformations and optional interpolation specifications.
STAC Metadata (Collection and Item) – Provides linkage to SpatioTemporal Asset Catalogs for resource discovery and indexing.
These extensions are integrated in a modular fashion and do not alter the core semantics of the CDM or CF structures. Implementations may selectively adopt these extensions based on their application requirements.
7.3. Model Extension Points
The unified data model specifies a series of optional, standards-aligned extension points to support functionality beyond the base CDM and CF constructs. These extensions enhance applicability to Earth observation and spatial analysis use cases without imposing additional mandatory requirements.
Each extension is defined as an independent module. Implementation of any given extension does not necessitate support for others.
7.3.1. Multi-Resolution Overviews (OGC Tile Matrix Set)
Support for multi-resolution imagery is enabled via integration with the OGC Tile Matrix Set Standard:
Tile matrix sets define spatial tiling schemes with consistent resolutions and coordinate reference systems across zoom levels.
Overviews may be represented as separate Zarr arrays or groups, each aligned to a specific tile matrix level.
Metadata includes identifiers for tile matrices, spatial resolution, and spatial alignment.
This approach aligns with the OGC API – Tiles and enables efficient access to large gridded datasets.
7.3.2. GeoTransform Metadata (GDAL Interpolation and Affine Transform)
Geospatial referencing can be further refined through the inclusion of metadata consistent with GDAL conventions:
Affine transformation is specified via the GeoTransform attribute or equivalent structures.
Interpolation methods may be declared to indicate sampling behaviour or sub-pixel alignment strategies.
This extension augments CF grid mappings by providing precise control over grid placement and coordinate transformations.
7.3.3. STAC Collection and Item Integration
To enable discovery of resources within the hierarchical structure of the data model, this Standard supports the inclusion of STAC metadata elements at appropriate locations within the group hierarchy.
A STAC extension consists of embedding or referencing STAC Collection and Item metadata within the data model:
Each dataset resource MAY reference a corresponding STAC Collection or Item using an identifier or embedded object.
STAC properties such as datetime, bbox, and eo:bands MAY be included in the metadata to enable spatial, temporal, and spectral filtering.
The structure is compatible with external STAC APIs and metadata harvesting systems.
STAC integration is non-intrusive and modular. It does not impose changes on the internal organisation of datasets and MAY be adopted incrementally by implementations requiring catalogue-based discovery capabilities.
7.3.4. Modularity and Interoperability
Each extension point is specified independently. Implementations may advertise support for one or more extensions by declaring conformance to corresponding extension modules. This modularity facilitates incremental adoption, promotes reuse, and enhances interoperability across varied implementation environments.
7.4. Unified Model Structure
This clause defines the structural organisation of datasets conforming to the unified data model (UDM). It consolidates the foundational elements and optional extensions into a coherent architecture suitable for Zarr encoding, while remaining format-agnostic. The model establishes a modular and extensible framework that supports structured representation of multidimensional, geospatially-referenced resources.
The model represents datasets as abstract compositions of dimensions, coordinate variables, data variables, and associated metadata. This abstraction ensures that applications and services can reason about the content and semantics of a dataset without reliance on storage layout or specific serialisation.
7.4.1. Dataset Structure
A dataset conforming to the Unified Data Model (UDM) is structured as a hierarchy rooted at a top-level dataset entity. This design enables modularity and facilitates the representation of complex, multi-resolution, or thematically partitioned data collections.
Each dataset node comprises the following core components, aligned with the Unidata Common Data Model (CDM) and Climate and Forecast (CF) Conventions:
Dimensions – Named, integer-valued axes defining the extent of data variables. Examples include time, x, y, and band.
Coordinate Variables – Arrays that supply coordinate values along dimensions, providing spatial, temporal, or contextual referencing. These may be scalar or higher-dimensional, depending on the referencing scheme.
Data Variables – Multidimensional arrays representing physical measurements or derived products. Defined over one or more dimensions, these variables are associated with coordinate variables and annotated with metadata.
Attributes – Key-value pairs attached to variables or dataset components. Attributes convey semantic information such as units, standard names, and geospatial metadata.
The hierarchy is implemented through groups, which function as containers for variables, dimensions, and metadata. Groups may define local context while inheriting attributes from parent nodes. This supports the logical subdivision of datasets by theme, resolution, or processing stage, and enhances the clarity and reusability of complex geospatial structures.
The diagram below represents the structural layer of the unified data model, derived from the Unidata Common Data Model, which serves as the foundational framework for supporting all overlaying model layer.
Figure 1 — Conformance-class model
Note that, conceptually, node within this hierarchy might be treated as a self-contained dataset.
7.4.2. Coordinate Referencing
Coordinate systems are defined using:
CF Conventions – Including attributes such as standard_name, units, axis, and grid_mapping to express spatiotemporal semantics and coordinate system properties.
Affine Transformation Extensions – Optional support for georeferencing via affine transforms and interpolation metadata (e.g., as defined in GDAL practices), providing enhanced flexibility for irregular grids and grid-aligned imagery.
The model accommodates both standard CF-compatible definitions and extended referencing mechanisms to support use cases that span scientific analysis and geospatial mapping.
7.4.3. Metadata Integration
Metadata may be declared at various levels within the model structure:
Global Metadata – Attributes describing the dataset as a whole, including elements such as title, summary, and license.
Variable Metadata – Attributes associated with individual data or coordinate variables, conveying descriptive or semantic information.
Extension Metadata – Structured metadata linked to optional model extensions (e.g., multiscale tiling, catalogue references, geotransform properties).
All metadata follows harmonised naming and semantics consistent with the CDM and CF standards, enabling machine and human interpretability while supporting metadata exchange across diverse systems.
7.4.4. Overviews
The Overviews construct defines a formal, interoperable abstraction for multiscale gridded data. It ensures structural consistency across zoom levels and provides a semantic model for integration with tiled representations such as GeoTIFF overviews, OGC API – Tiles, and STAC Tiled Assets.
7.4.4.1. Purpose
The Overviews construct provides a general mechanism for associating a single logical data variable with a collection of resampled representations, referred to as zoom levels. Each zoom level holds a reduced-resolution version of the original variable, with progressively decreasing spatial resolution from the base (highest detail) to the coarsest level.
Overviews enable:
Fast access to summary representations for visualisation
Progressive transmission and downsampling
Multi-resolution analytics and adaptive processing
7.4.4.2. Conceptual Structure
An Overviews construct is defined as a hierarchical set of multiscale representations of one or more data variables. It comprises the following components:
- Base Variable
The original, highest-resolution variable to which the overview hierarchy is anchored. It is defined using the standard DataVariable structure in the model.
- Overview Levels
A sequence of variables representing the same logical quantity as the base variable, but sampled at coarser spatial resolutions.
- Zoom Level Identifier
A unique identifier associated with each level, ordered from finest (e.g. "0") to coarsest resolution (e.g. "N").
- Tile Grid Definition
A mapping that associates each zoom level with a spatial tiling layout, defined in alignment with a TileMatrixSet.
- Spatial Alignment
Each overview variable MUST be spatially aligned with the base variable using a consistent coordinate reference system and compatible axis orientation.
- Resampling Method
A declared method indicating the technique used to derive coarser levels from the base variable (e.g. nearest, average, cubic).
7.4.4.3. Model Components
The Overviews construct is represented in the unified data model using the following logical elements:
Table 2
Element | Definition |
---|---|
OverviewSet | A logical grouping of variables at multiple zoom levels associated with a single base variable. |
OverviewLevel | A single resampled variable at a specific resolution, identified by a zoom level string. |
TileMatrixSetRef | A reference to the tile grid specification applied across all overview levels. May refer to a well-known identifier, a URI, or an inline object. |
TileMatrixLimits | (Optional) Constraints on the tile coverage per zoom level. |
resampling_method | A string indicating the uniform method used to downsample data across all levels. |
All overview levels MUST preserve:
The data variable’s semantic identity (standard_name, units, etc.)
The coordinate reference system
The axis order and dimension semantics
Only the resolution and extent (through tiling and shape) may differ across levels.
7.4.4.4. Relationship to Tile Matrix Set
The Overviews construct is structurally aligned with the OGC Tile Matrix Set concept. Each zoom level is mapped to a TileMatrix, and the chunk layout for the corresponding data variable SHALL match the tile grid’s tileWidth and tileHeight.
The OverviewSet MAY constrain tile matrix limits using TileMatrixSetLimits, which restrict tile indices to actual data coverage, consistent with the spatial extent of the overview variable.
7.4.4.5. Usage Context
The Overviews construct is applicable to any gridded data variable with at least two spatial dimensions. It is primarily designed for:
Raster imagery (e.g. reflectance, temperature)
Data cubes with spatial slices (e.g. time-series of spatial grids)
Multi-band products with consistent spatial structure across levels
The structure may be extended for N-dimensional datasets in future revisions, provided that two spatial axes can be unambiguously identified.
7.5. Conformance and Extensibility
The GeoZarr data model is designed with an open conformance approach to support a wide range of use cases and implementation contexts. Its core model is permissive, allowing partial implementations, while optional extensions and compliance profiles can define stricter requirements for interoperability.
7.5.1. Core Conformance
Datasets conforming to the core model must:
Represent data using CDM-compatible constructs (dimensions, variables, attributes).
Follow attribute conventions where applicable.
Be parsable as valid Zarr with structured metadata following this specification.
CF compliance is not mandatory but is recommended for semantic interoperability.
7.5.2. Extension Conformance
Implementations may optionally support one or more extension modules:
Multi-resolution overviews (Tile Matrix Set)
GeoTransform metadata (GDAL)
STAC metadata integration
Each extension defines its own requirement class with validation rules and expected metadata structures.
Tools may advertise which extensions they support and validate datasets accordingly.
7.5.3. Conformance Classes
Conformance Classes may be defined to specify required components and extensions for specific application domains (e.g., visualisation clients, EO archives, catalogue indexing).
Conformance Classes enable selective validation without constraining the general model.
7.5.4. Extensibility Principles
All extensions must preserve compatibility with the core model and avoid redefining existing CDM or CF semantics.
New extensions should be documented with clear identifiers, schemas, and conformance criteria.
The model encourages interoperability by allowing tools to interpret unknown extensions without failure.
This extensibility framework supports both minimum-viable use and high-fidelity metadata integration, enabling incremental adoption across the geospatial and scientific data communities.
7.6. Interoperability Considerations
Interoperability is a core objective of the GeoZarr unified data model. The model is designed to bridge diverse Earth observation and scientific data ecosystems by enabling structural and semantic compatibility with established formats and standards, while providing a forward-looking foundation for scalable, cloud-native workflows.
This section outlines the principles and mechanisms supporting interoperability across formats, tools, and communities.
7.6.1. Format Mapping and Alignment
The data model is explicitly aligned with foundational standards including the Unidata Common Data Model (CDM), the CF Conventions, and established practices in formats such as NetCDF and GeoTIFF. Where applicable, GeoZarr datasets may be derived from or transformed into these formats using consistent mappings.
NetCDF (classic and enhanced models):
GeoZarr shares a common conceptual structure with NetCDF via CDM.
Variables, dimensions, coordinate systems, and attributes follow directly mappable patterns.
Metadata expressed in CF conventions in NetCDF can be preserved in GeoZarr without loss of fidelity.
GeoTIFF:
Raster-based datasets in GeoZarr can map to GeoTIFF by interpreting spatial referencing (via CF or GeoTransform) and band structures.
Overviews aligned to OGC Tile Matrix Sets may correspond to TIFF image pyramids.
Projection metadata and resolution information can be mapped via standard tags.
These mappings facilitate round-trip transformations and enable toolchains that consume or produce multiple formats without reengineering semantic models.
7.6.2. Semantic Interoperability
Semantic interoperability is supported through adherence to CF conventions, use of standardised attribute names (e.g., standard_name, units), and alignment with metadata vocabularies used in other ecosystems (e.g., STAC, EPSG codes, ISO 19115 keywords).
The model does not prescribe specific vocabularies beyond CF but encourages reuse and recognition of widely accepted descriptors to promote cross-domain understanding.
7.6.3. Metadata and Discovery Integration
STAC compatibility enables integration with catalogue services for discovery and indexing. Datasets can expose STAC-compliant metadata alongside core metadata, supporting federated search and filtering via STAC APIs.
This approach enables seamless integration into modern data catalogues and platforms that support EO discovery standards.
7.6.4. Tool and Ecosystem Support
The unified data model facilitates interoperability with tools and libraries across the following domains:
Scientific computing: NetCDF-based libraries (e.g., xarray, netCDF4), Zarr-compatible clients.
Geospatial processing: GDAL, rasterio, QGIS (via Zarr driver extensions or translations).
Cloud-native infrastructure: support for parallel access, chunked storage, and hierarchical grouping compatible with object storage.
Tooling support is expected to grow via standard-conformant implementations, easing adoption across domains and infrastructures.
8. GeoZarr Conformance Classes
Datasets can include many different types of data includes rasters, combinations—such as time, height, or wavelength—and can use either a projected or geographic coordinate system.
This Standard identifies conformance classes rg r offer clear, testable building blocks as a standardised approach for representing different data types when converting to the GeoZarr Unified Data Model (e.g. for encoding RGB bands from a GeoTIFF source).
This is a very preliminary draft. The content is primarily for demonstrating the purpose of the proposed sections.
9. Unified Data Model Encoding for Zarr
This clause defines the encoding of the unified data model into the Zarr format. The encoding supports both Zarr Version 2 and Zarr Version 3.
This is a very preliminary draft. The content is primarily for demonstrating the purpose of the proposed sections.
9.1. Hierarchical Structure
A dataset conforming to the unified data model is represented as a hierarchical structure of groups, variables (arrays), dimensions, and metadata. The dataset is rooted in a top-level group, which may contain:
Arrays representing coordinate or data variables
Child groups for modular organisation, including logical sub-collections or resolution levels
Metadata attributes at group and array levels
Each group adheres to a consistent structure, allowing recursive composition. This reflects the CDM’s use of groups and is supported by both Zarr v2 and v3 with differing implementations.
Table 3
Model Element | Zarr v2 Encoding | Zarr v3 Encoding |
---|---|---|
Root Dataset | Directory with .zgroup and .zattrs | Directory with zarr.json, with node_type: group |
Child Group | Subdirectory with .zgroup and .zattrs | Subdirectory with zarr.json, with node_type: group |
Array | Subdirectory with .zarray and .zattrs | Subdirectory with zarr.json, with node_type: array |
Metadata Key | .zattrs file | attributes field in zarr.json |
Zarr v3 requires zarr_format: 3 and stores all metadata (including user-defined attributes) in the zarr.json document. Each node includes a node_type field: either "group" or "array".
9.2. Dimensions
Dimensions define the axes along which variables are indexed.
In Zarr v2, dimensions are inferred from array shape and declared in _ARRAY_DIMENSIONS within .zattrs.
In Zarr v3, dimensions are stored using the dimension_names field in zarr.json.
Example for a 2D array with dimension names ["lat", "lon"]:
{
"zarr_format": 3,
"node_type": "array",
"shape": [180, 360],
"dimension_names": ["lat", "lon"],
...
}
Listing 1
9.3. Coordinate Variables
Coordinate variables (excluding GeoTransform Coordinates) define the geospatial or temporal context of data. They are represented as named arrays with metadata attributes.
Coordinate variables are represented as named 1D arrays aligned with corresponding dimensions.
Table 4
Feature | Zarr v2 | Zarr v3 |
---|---|---|
Storage | Zarr array with .zarray, .zattrs | Zarr array with zarr.json |
Dimension Binding | _ARRAY_DIMENSIONS in .zattrs | dimension_names in zarr.json |
CF Metadata | standard_name, units, axis in .zattrs | Under attributes in zarr.json |
Example zarr.json for a coordinate array:
{
"zarr_format": 3,
"node_type": "array",
"shape": [180],
"dimension_names": ["lat"],
"data_type": "float32",
"chunk_grid": {
"name": "regular",
"configuration": {
"chunk_shape": [180]
}
},
"attributes": {
"standard_name": "latitude",
"units": "degrees_north",
"axis": "Y"
}
}
Listing 2
9.4. Data Variables
Data variables represent measured or derived quantities. They are stored as multidimensional arrays with metadata attributes.
Table 5
Feature | Zarr v2 | Zarr v3 |
---|---|---|
Storage | Multidimensional array with .zarray and .zattrs | Same structure; v3 supports additional chunk storage formats |
Dimension Association | _ARRAY_DIMENSIONS attribute | Same as v2 |
CF Metadata | standard_name, units, long_name, _FillValue, etc. | Same as v2; v3 may support typed attributes |
Example:
{
"_ARRAY_DIMENSIONS": ["time", "lat", "lon"],
"standard_name": "air_temperature",
"units": "K",
"long_name": "Surface air temperature",
"_FillValue": -9999.0
}
Listing 3
9.5. Global Metadata
Metadata associated with the dataset as a whole is stored at the root group level.
Table 6
Field | Zarr v2 | Zarr v3 |
---|---|---|
Location | .zattrs file of root .zgroup | attributes field in root zarr.json |
Group Identification | .zgroup file | node_type: group in zarr.json |
CF Conformance | Conventions attribute (e.g., CF-1.10) | Same, under attributes |
Example Zarr v3 root zarr.json:
{
"zarr_format": 3,
"node_type": "group",
"attributes": {
"title": "Example Dataset",
"summary": "Multidimensional Earth Observation data",
"institution": "Example Space Agency",
"Conventions": "CF-1.10"
}
}
Listing 4
9.6. Variables Metadata
All metadata attributes (for groups, coordinates variables and data variables) are recommended to conform to CF naming and typing conventions. Supported attributes include:
standard_name, units, axis, grid_mapping (CF)
_FillValue, scale_factor, add_offset
long_name, missing_value
In all cases:
Attribute names are case-sensitive and encoded as UTF-8 strings
Values shall conform to JSON-compatible types (string, number, boolean, array)
9.7. Encoding of Multiscale Overviews in Zarr
This clause specifies how multiscale tiling (also known as overviews or pyramids) is encoded in Zarr-based datasets conforming to the unified data model. The encoding supports both Zarr Version 2 and Version 3 and is aligned with the OGC Two Dimensional Tile Matrix Set Standard.
Multiscale datasets are composed of a set of Zarr groups representing multiple zoom levels. Each level stores coarser-resolution resampled versions of the original data variables.
9.7.1. Hierarchical Layout
Each zoom level SHALL be represented as a Zarr group, identified by the Tile Matrix identifier (e.g., "0", "1", "2"). These groups SHALL be organised hierarchically under a common multiscale root group. Each zoom-level group SHALL contain the complete set of variables (Zarr arrays) corresponding to that resolution.
Table 7
Structure | Zarr v2 | Zarr v3 |
---|---|---|
Zoom level groups | Subdirectories with .zgroup and .zattrs | Subdirectories with zarr.json, node_type: group |
Variables at each level | Zarr arrays (.zarray, .zattrs) in each group | Zarr arrays (zarr.json, node_type: array) in each group |
Global metadata | multiscales defined in parent .zattrs | multiscales defined in parent group zarr.json under attributes |
Each multiscale group MUST define chunking (tiling) along the spatial dimensions (X, Y, or lon, lat). Recommended chunk sizes are 256×256 or 512×512.
9.7.2. Metadata Encoding
Multiscale metadata SHALL be defined using a multiscales attribute located in the parent group of the zoom levels. This attribute SHALL be a JSON object with the following members:
tile_matrix_set – Identifier, URI, or inline JSON object compliant with OGC TileMatrixSet v2
resampling_method – One of the standard string values (e.g., "nearest", "average")
tile_matrix_set_limits – (optional) Zoom-level limits following the STAC Tiled Asset style
9.7.2.1. Zarr v2 Encoding Example (.zattrs)
{
"multiscales": {
"tile_matrix_set": "WebMercatorQuad",
"resampling_method": "nearest"
}
}
Listing 5
9.7.2.2. Zarr v3 Encoding Example (zarr.json)
{
"zarr_format": 3,
"node_type": "group",
"attributes": {
"multiscales": {
"tile_matrix_set": "WebMercatorQuad",
"resampling_method": "nearest"
}
}
}
Listing 6
9.7.3. Tile Matrix Set Representation
The tile_matrix_set member MAY take one of the following forms:
A string referring to a well-known identifier (e.g., "WebMercatorQuad")
A URI pointing to a JSON document describing the tile matrix set
An inline JSON object (CamelCase, OGC TMS 2.0 compatible)
Zoom level identifiers in the tile matrix set MUST match the names of the child groups. The spatial reference system declared in supportedCRS MUST match the one declared in the corresponding grid_mapping of the data variables.
9.7.4. Chunk Layout Alignment
At each zoom level, chunking SHALL match the tile layout defined by the TileMatrix:
Chunks MUST be aligned with the tile grid (1:1 mapping between chunks and tiles)
Chunk sizes MUST match the tileWidth and tileHeight declared in the TileMatrix
Spatial dimensions MUST be clearly identified using dimension_names (v3) or _ARRAY_DIMENSIONS (v2)
9.7.5. Tile Matrix Set Limits
The tile_matrix_set_limits object MAY define the extent of actual data coverage for each zoom level. This follows the style of the STAC tiled-assets extension rather than the full OGC JSON encoding.
Example:
"tile_matrix_set_limits": {
"1": {
"min_tile_col": 0,
"max_tile_col": 1,
"min_tile_row": 0,
"max_tile_row": 1
}
}
Listing 7
9.7.6. Resampling Method
The resampling_method MUST indicate the method used for downsampling across zoom levels. The value MUST be one of:
nearest, average, bilinear, cubic, cubic_spline, lanczos, mode, max, min, med, sum, q1, q3, rms, gauss
The same method MUST apply across all levels.
10. Unified Data Model Encoding for GeoTiff
This is a very preliminary draft. The content is primarily for demonstrating the purpose of the proposed sections.
Annex A
(informative)
Conformance Class Abstract Test Suite (Normative)
NOTE: Ensure that there is a conformance class for each requirements class and a test for each requirement (identified by requirement name and number)
A.1. Conformance Class A
Example
label
http://www.opengis.net/spec/name-of-standard/1.0/conf/example1
subject
Requirements Class “example1”
classification
Target Type:Web API
A.1.1. Example 1
Subject | /req/req-class-a/req-name-1 |
---|---|
Label | /conf/core/api-definition-op |
Test purpose | Validate that the API Definition document can be retrieved from the expected location. |
Test method |
|
A.1.2. Example 2
Subject | /req/req-class-a/req-name-2 |
---|---|
Label | /conf/core/http |
Test purpose | Validate that the resource paths advertised through the API conform with HTTP 1.1 and, where appropriate, TLS. |
Test method |
|
Annex B
(informative)
Title
NOTE: Place other Annex material in sequential annexes beginning with “B” and leave final two annexes for the Revision History and Bibliography
Annex C
(informative)
Revision History
Table C.1
Date | Release | Editor | Primary clauses modified | Description |
---|---|---|---|---|
2016-04-28 | 0.1 | G. Editor | all | initial version |
Bibliography
NOTE: The TC has approved Springer LNCS as the official document citation type.
Springer LNCS is widely used in technical and computer science journals and other publications
– Actual References:
[n] Journal: Author Surname, A.: Title. Publication Title. Volume number, Issue number, Pages Used (Year Published)
[1] OGC: OGC Testbed 12 Annex B: Architecture (2015).