22nd May, 2019
Time: 20:00 BST (15:00 EDT).
Joining instructions see above.
Intend to attend: Josh, Alistair, …
**Attending **:Josh, Stephan, Dennis, JohnK, Alistair, RyanW (20:08), Tony Tung (20:11), MartinD (20:19)
Standing items
- Introductions: new joinees should feel welcome to introduce themselves and their interest in zarr/n5.
CZI funding call
We should invite Jeremy to this meeting https://ai.google/research/people/JeremyMaitinShepard
https://chanzuckerberg.com/rfa/essential-open-source-software-for-science/
Do we want to apply? What would we ask for?
Dennis: Biological application?
-
Alistair: Certainly all of imaging. Single cell. Genomics. (plus physical science)
-
Josh: Stephan Preibisch, BDV, N5, …,
-
Alistair: (stable) spec → (multiple core) implementations → applications.
- For the community, nice to accelerate the development but also
build out the current set of developers.
- Stephan: could you use the money now? Alistair: not the best
person to host someone. Better to spread out to other teams. (Stephan: but not too thin…)
- Stephan: tensor-store? Want to support something like
zarr/n5, but better “tricks” for GCS. Support zarr/n5 natively already (C++) Interested in writing in parallel to hyperrectangles that only partially overlap with the boundaries (without locking). Open source. Want to release by summer.
-
Discussed at Connectomics meeting in Berlin.
- Stephan: suggest focusing on getting new spec out. (E.g. fast
list operations)
- JohnK: throwing this spec/API out to DevRels, SAs at Google,
Amazon, etc. and saying, “this is what we want supported”
- For the community, nice to accelerate the development but also
-
Alistair: thoughts about funding/driving this
-
Could focus on getting spec done as the basis
-
Or continue stimulating community
-
-
Stephan: new spec - have number of parallel responses been considering?
- BigDataViewer seems slow during large upload. Only using 10
threads. GCS latency is quite high. Web-based viewers have no constraint on the number of threads that are being used.
- BigDataViewer seems slow during large upload. Only using 10
-
Josh: Eventually consistency, what are you allowed to get at the same time, what is the priority of different things (metadata, chunks, etc.)
-
Stephan: re: funding - can only have visitors at Janelia. Others should email Alistair with interest.
-
…additional GPU constraints? E.g. compressors in/on the GPU? (Future call)
v3.0 spec - update and discussion - 20:37 UK
Current state: https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html
Possible discussion points:
- Front matter; persistent URLs for the spec; copyright statement; license.
- Node names, too restrictive? Case insensitive uniqueness of siblings.
- Happy with core data types? Everything else defined via a protocol extension? Extension point mechanism OK?
- Happy with description of chunk grids? OK to only define regular grid here, leave irregular grids and grids which can “grow” in the negative direction to protocol extensions? Extension point mechanism OK?
- Happy with description of memory layouts? Extension point mechanism OK?
- Happy with description of codecs?
- Happy with group and array metadata? Attributes within the group and array metadata documents, not separate. Zarr format versioning and use of URI.
- Lots to discuss around store interface and storage protocol, see open PR #30.
Notes:
-
Stephan: spec starts at a relatively complicated level (close to v2). Implementations of subsets, e.g. of types. (128bit integers, … boolean as bytes, …). Alistair: previous discussions suggested as minimum as possible with as many complete implementations of that minimum as possible. The rest are protocol extensions. Stephan: complex types already in the minimum? Would think would cover all primitive types (but difficult) JohnK: complex isn’t even in core C (cF. C99) Dennis: have struggled with this in netcdf. Looking at v2, were going to subset the spec somehow. But would like to be able to read all zarr files. (read broadly, write narrow) i.e. agree with the minimal spec as discussed. Haven’t read the v3 spec yet, but may need to undo some of the current development. Currently include a (“user-defined”) type definition for complex.
-
Biggest issue is the variable-length string one. (Character type is also tough, because we want everything to be utf-8, but those aren’t single 8-bit bytes. Certainly in values.)
- AM: Position of v3 spec is that all variable length types are
protocol extensions. Perhaps one of the first extensions, but simultaneously in all languages / platforms.
- Dennis: “standardized extended types” such that “if you choose
to implement, then this is how it is expected to be done” (Prepare initially for chaos. Minimally NetCDF will be proposing several.)
- Alistair: extensions defined globally for the whole hierarchy.
Stephan: but a global/static definition (rather than URIs)
- Stephan: shape elements as 8-byte integers (longs). Dennis: good
point. Anything representing memory/storage lengths needs to go to 64bit (cF. ZFS is 128bit). Stephan: not chunks. Alistair: metadata is currently json. Dennis: MUST the metadata be json? Alistair: currently the defined canonical form. Stephan: too tight? Alistair: thinking about extensions, someone could specify a new format of the metadata. But then how to declare that for an implementation…. Stephan: HDF5 backend already stores metadata in the HDF5 storage. Josh: would like to discuss piecing apart N5 api & storage specs. Dennis: all of these systems have what amounts to 2 specs (HDF5, NetCDF). Layout on disk & API. Current spec is currently closer to on-disk or in-cloud storage. Alistair: currently looking at the protocol, but the (abstract) store interface is coming (in open PR). Dennis: tree-structure seems fairly key to the spec. Would go to a level of abstraction roughly equivalent to “a tree of objects”. Alistair: trying to provide for storage systems which are more or less aware of hierarchy. E.g. if the storage is aware of the containment, it can provide further methods. Dennis: is there an implied tree based on the keys? Giving the game away. Perhaps UUID model or explicit tree model? Alistair: needs thinking. Considering implications for when you are doing things in parallel. i.e. minimizing synchronization between workers.
- AM: Position of v3 spec is that all variable length types are
-
Briefly back to extension points:
-
Josh: ZeroC/Ice and slicing (to be written up on github)
-
Alistair: have only thought about fallbacks
-
Martin: base type and logical type (e.g. complex as array of
bytes)
-
Done: 21:18 UK