6th May, 2020

Attending: Matthias Bussonnier, Josh Moore, Dennis, RyanW, RyanA, Martin Durant,

Ward Fisher, James B., John K., Alistair, Ralf, Nicholas S.

Apologies:

  • Standing items

    • Introductions: new joinees should feel welcome to introduce

      themselves and their interest in zarr/n5.

    • Matthias: Python developer for ~8yrs. **Joined Quansight 5 days

      ago**. Starting looking at Zarr. Focusing on the v3 spec. Catching up. Questions coming on GitHub. (French in California)

      • RyanA: previous experience with Zarr? Heard about it. But haven’t played with it. Seen talks. Etc. cf. Dask
      • Martin: Plans for Zarr at Quansight. Ralf: someone at Q. is starting to use & contribute (Andrew) for a client project. Generally, Zarr is and will become more a part of the core Python ecosystem. So long-term. James: one client is interested in Zarr and supporting it.
  • (Pre-item) Dennis: ok to share a gdoc with spec comments? Sure.

  • Ryan A: OGC Zarr Community Standard Proposal:

    • Application:

      Community_Standard_Justification_Zarr_16-113r2

    • Essentially certify standards and their implementors. Also in

      the web map area.

    • Proposing OGC as a community standard (i.e. outside the main

      process)

    • Propose to sign off by the steering council (minimally). But by

      all means, suggest changes.

    • John: how simple to update to the next version? Presumably

      easier.

      • Standards are open. Membership is closed (voting members support the body)
      • Any issues yet? Nope. Feedback from a few people. Drop in replacement for HDF5 makes it a pretty clear case. But do need supporting members.
      • Josh: any experience with standards bodies? Ralf: not NumFOCUS. Some experience in semiconductor world. Painful. Lots of money at stake. Alistair: W3C consortium and a few WGs. Though not likely in scope. Josh: might be interesting to come back to PROV/json-ld in the future.
      • Josh: ISO standards body for biobanks/bioimaging
  • Josh on bioimaging, time permitting. (napari, dask, DICOM, …)

    • https://forum.image.sc/t/publication-of-idr0083-sars-cov-2-up-close-in-a-human-organoid/37163

    • https://twitter.com/notjustmoore/status/1256232842755014656

    • https://github.com/ome/ome-zarr-py

    • Alistair: few thoughts. 2 dimensions:

      • Good to separate protocol extensions for usage conventions. Multiscale likely falls under convention. Need a recommended approach. Ok to use docs repo, but shouldn’t be necessary.
    • Martin: would have included link to the specification? Yes.

    • RyanA: sounds similar to xarray. Netcdf into zarr 2 years ago,

      just by defining metadata. … don’t want all domain specifics into zarr

      • Josh: agreed, but do we want a general “image” in zarr space.

      • Nicholas: are you thinking ome-xarray?

      • Dennis: from NetCDF POV, have tried to be as domain-agnostic as possible. But have cfconventions.org which are attributes which are domain-specific. (Largely atmospheric.) Otherwise, find yourself in an endless chain of adding things to the spec.

      • Nicholas: microscopy community is crying out for something, even just cfconventions. Dennis: likely follow the same path.

      • RyanA: a lot of fields can use the NetCDF model with cfconventions. CDM enhances HDF model by defining dimensions. Beyond just arrays with names. Encode the relationships between arrays which can get you pretty far. Good generic model. Add conventions on top of it. Goal is to achieve NetCDF support in Zarr. Easy to implement at the xarray level. Primarily needed to add what the array dimensions are. @Dennis? Ward: have looked, yes.

      • Alistair: is that the root of the question? Does the new attribute need documenting? A short document might suffice. “We’ve added this into zarr to …” Shortest spec in a tweet… RyanA: opening a document against xarray documentation.

      • RyanA (from chat): https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_data_model.html

      • Nick: happy (as a user) if the answer is OME on NetCDF or whatever.

      • I.e. Alistair: need to

      • Martin: not clear that this convention is necessarily tied to zarr.

        • Not opposed to having a convenient place in zarr-specs

          to add things. But not everyone needs to use it.

        • One caveat would be to not specify things overly

          zarr-centric.

      • (Ward left)

      • Alistair: corollary is that any docs we host (like the code) are going to need some process … i.e. another governance PR. (protect us in case not everyone is friendly)

        • Martin: could see being less restrictive for conventions

          than core/extensions likely, but we could _start_ off just as strict.

        • RyanA: slippery slope. Will Zarr become a standards

          body?

        • Martin: perhaps on conda-forge? Make sure things are

          sound. (Years down the line)

        • Alistair: “zarr-forge”! :-)

        • Dennis: a problem that NetCDF has had with the HDFGroup

          is the number of changes that are made that are not publicized. Just be as open as possible.

        • Alistair: → process! (cf. W3C: editors’ draft → working

          draft → … → recommendation track)

        • Dennis: cf cfconventions group - process has become time

          consuming. Need to be able to relatively say yes or no. Nick: why? Too many committees? Process isn’t well defined whether or not something will be adopted. Wait until all say go. I.e. who is going to be the decider? Trust is critical.

        • Alistair: fundamental tension – moving quickly but with

          consensus… Key: Finding a critical number of people who can work together and having a good process. Need to get into the details of that.

        • Matthias: have experience with too big steering councils

          and death by decision making. A question on the process: what about companies that are willing to give money to influence decisions.

        • Josh: recommendation from NumFOCUS on driving commercial

          money via the roadmap.

        • Matthias: but what about having payers see drafts

          beforehand? (even without decision making powers)

        • Alistair: would prefer to always work in the open
  • Dennis: if you are interested in broader use, perhaps consider a single file format for Zarr beyond drivers for zips. Standardize that representation.

    • Allows zarr to leave the cloud community and be used by anyone.

    • Microscopy e.g.? NS: certainly issues have been raised.

    • Matthias: zarr in TIFF ….

    • Alistair: envisaged with v3 having storage specs

      • First was the DirectoryStore, than sqlite, dbm, …
      • Considered picking one of the single file ones early one.
    • Nicholas: often difference isn’t understood, so good to have

      someone else chosen.

    • Martin: place for fsspec to have an opinion (wants to regard

      everything as a filesystem).

    • Matthias: you will have issues no matter what with FAT32 limits.

      Martin: good to have a mixture.

    • Dennis: have also experimented with streaming services, take

      some representation of dataset and stream it in a compact, serialized form. Another issue to consider. E.g. tried Protobufs.

    • Matthias: BNL has unique identifiers pointing to files…

    • Dennis: people assume this is only for the cloud…

    • Nicholas: would love to get access to icons…

    • Dennis: would be good if that format weren’t too difficult to

      implement…

Image 2

    • Martin: good to note down the tension between one big and many

      small

    • John: previously used transforming e.g. to zip, then back and

      form generally

    • Josh: tree of drivers perhaps?

    • Dennis: reimplementing a file system. Interpet a node in the

      tree with this software.

    • Matthias: an appeal is to open the directory store and things

      are self-descriptive. A communication issue with the end-user. Especially with non-power-users. If it’s hard for them to get all of a dataset, then they are unhappy.

    • Alistair: understanding the use cases…

    • Matthias: sqlite on NFS is a nightmare (corruption)
  • Matthias: yearly user survey. Good idea. Never done that before. Publicize everywhere that is possible. NetCDF mailing list. Image.sc. confocal listserv. Gets better and better over time. Matthias can try to do it. Alistair to create an issue to capture this.

  • Governance at 20:30 UK … naja. See the above.