1st July, 2020
Attending: Josh, Dennis, Ryan W, Ward, Matthias, Ryan A, John, Alistair, Matt,
Andrew, Stephan Hoyer
Apologies: Martin (Happy Canada Day!)
-
Standing items
- Introductions: new joinees should feel welcome to introduce
themselves and their interest in zarr/n5.
- Introductions: new joinees should feel welcome to introduce
-
Add Community Topics Here
- Ward: travis-ci
ishad been stuck but … now there’s supportfor nczarr in master!
- (https://github.com/Unidata/netcdf-c/pull/1769 merged)
- Dennis: some more work needed on CMake (esp. Windows)
- Josh: https://github.com/zarr-developers/zarr-specs/issues/41
- Or “Pure C implementation” https://github.com/zarr-developers/community/issues/9
- Dennis: and to netcdf developers group
- Ward: travis-ci
-
FYI / 1 minute update
- Josh: Still trying to convert
hdf2zarr without too much success, plus fighting with bool encodings and 403 for remote chunks
-
RyanA: some contributors use a compressor for the boolean issue
-
Nothing yet on the zarr side. A numpy problem.
-
Was able to use delta filters/compressors 1000x
-
Matthias: do we want to support support data types that are less than 8-bits in v3 w/o a compressor / filter
-
Alistair/John: there is a packbit filter, chained with a codec
-
- HDF5Zarr -
viewing HDF5 files as Zarr
-
Josh: was an issue of supporting types
-
General question of: should every HDF5 file be convertible?
-
John: attributes need to be json serializable, so arrays aren’t supported.
- Could define a mapping, but there are likely performance
tradeoffs.
- Could define a mapping, but there are likely performance
-
John: similar issues in xarray? Ryan A: haven’t looked at that part of the code base. Stephan Hoyer knows. (wrote backends for xarray)
- Relevant xarray code:
https://github.com/pydata/xarray/blob/master/xarray/backends/zarr.py#L247-L252
- Relevant xarray code:
-
Dennis: what’s the use case? Unsure, but generally a small numpy that’s been stored as an attribute rather than as a dataset.
-
Stephan joins (repeat
-
Questions to Stephan
- Should we be able to convert any HDF5 files in ZARR, is
there an isomorphism.
- Are there any changes to the zarr spec to make that
easier.
- String is one of the harder datatypes. Particularly
unicode strings,
- Should we be able to convert any HDF5 files in ZARR, is
-
SH: handling strings, especially unicode strings, is one of the trickier things. HDF has a sane model for strings (always encoding attached).
-
Options:
- ASCII
- UTF-8
- Fixed width & variable width data type, in terms of
encoded string.
- Xarray can write any of that. No native way to write
encoded UTF-8 string in Python. No numpy datatype. Unicode strings are 4-bytes. Or Object strings.
- Xarray writes into (variable width) numpy object arrays.
No type safety. Similar to pandas. Works reasonably well in practice.
- In Zarr xarray needs to choose an encoding. Need to
review, but use default python encoding. Doesn’t use
-
Matthias: isn’t UTF-8 variable width?
- Stephan: yes, but it’s the max width of the encoded
size. Cf. Julia model. (whereas Python is more UTF-32-like)
- Dennis: can also null terminate (even if not legal)
-
-
Josh: worth trying H5Zarr [bypasses netcdf library] ? SH: Challenge is that xarray requires the named dimensions. As long as you have dimension scales, then it should be able to convert.
-
SH: having a built string type would be worth thinking about (only at the numcodecs layer)
-
Alistair: certainly thinking that that would be a high-priority extension (Text). Copy HDF5 model? SH: for fixed-width data, it’s a pretty good model. Alistair: and for variable width strings? SH: Also good, but more complex. Depends on the complexity of variable width data in zarr in general.
-
John: does storing arrays in attributes happen in xarray? SH: yes, but I don’t think it’s very useful.
-
Dennis:
-
Attributes are one dimensional arrays of variable length
-
Attributes are type (not in Zarr; hence the extra
metadata)
-
-
- Josh: Still trying to convert
-
Add Core Topics Here
-
Matthias Questions:
-
Which approach do you prefer for zarr v3:
- A complete separate import submodule and utility
functions ?
- Try to weave v3 in the current classes and utilities
functions ?
- Some refactor of v2 will likely be needed (and stared), in particular try to be a bit stricter on internal types would help.also around attributes that are currently their own key, and would not be.
- Ryan A: in xarray, passes thing to `open_store` (never
open groups, though it may be changing to use introspection
- PR in progress:
-
Josh: an use of namespaces with {zarr, zarr2, zarr3}
- Matthias: concerned about the isinstance() uses., but
depends on what API usage you allow
- Matthias: almost have xarray loading v3 spec without
touching xarray
-
RyanA: could add a new CI branch to pull the V3 spec
- Matthias: cool, but currently choking on attributes.
Does work with a V2 adapter.
- A complete separate import submodule and utility
-
-
Image 3
-
-
-
Lots of magic to support bytes/bytes array/ memory view/ …
-
Could we have a single internal type more often?
-
(e.g. ConsolidatedStore doesn’t do one thing ….)
-
-
- From Martin in absentia: “*I can report that fsspec’s mapper
class has acquired the methods “getitems”, “setitems” and “delitems”, which will be implemented in async/concurrent when possible. This already works for HTTP (read-only), and I am doing GCS/S3 next.*”
- SH: been using Zarr recently. Very enjoyable. Kudos!
-