8th April, 2020

Attending: Josh M, Ward F, Nick S, Ryan A, Martin D, Ryan W, James B., Matt M., Alistair (21:00)

Apologies: John K,

  • Standing items

    • Introductions: new joinees should feel welcome to introduce

      themselves and their interest in zarr/n5.

  • Community (max. 30min)

    • Ryan A: checking in, encouraging Martin’s fsspec PR. Question of

      how to handle all the possible backends. What’s in the library. What’s an optional dep. Fan of fsspec approach – uniform API for many storage layers. Question of design.

    • Martin: PoC

      (#546) working, but haven’t touched it in a couple of weeks. Interest is in abstracting out backends. Various issues like nested v. non, consolidated or not. Tricky for someone to build a new backend. Possibly to be driven by kwargs. Or a layered approach. Simplifies usage. Cf. HTTP backends. Either make them unnecessary or let them live elsewhere.

      • Ryan A. because of the resolving logic. But multi-language support is at question: might make other languages harder. James: concerns about other languages? Ryan: if can write into mongo DB using fsspec, then what does that mean for other libraries. Josh: seems it’s if you need a specific library for reading out of a particular backend (if the URI itself doesn’t get put into the metadata itself). Martin: each backend has its own arguments. Not sure how to get the other language communities to care about that spec. Ryan W: feels ok. Different than having pickled objects in Zarr.

        • Ryan A: propose to depend on fsspec

        • Martin: and deprecate stories that are covered by fsspec

        • Ryan A: for example, remove mongo store – why do we

          have it but not s3 backend (because s3fs solves that problem)

      • Josh: next steps? Martin: wasn’t sure about the behavior/tests of nested storage. Tests pass locally but not on CI. Trying to reverse engineering the tests. (Can handle consolidation). Need tests that can run in CI. Josh: can take a look at the N5 issue

      • James: just fsspec store or multiple? Martin: the former, just reading the URI.

      • Martin: Fsspec doesn’t currently have any async functionality (cf. #536).

    • James: v3 spec status. Several issues/PRs opened. Would be a

      good time if anyone has concerns. Like: node names, partial chunk reading, … Trying to get some consensus.

    • Matt: nice block webassembly example (says Josh)

    • Josh: workin on a Java microservice *a la *xpublish

    • Ward: C library slotted for this month. Writes to S3 on local

      abstraction. Dennis got access to ncar s3 store testing remotely. Won’t be in the next release, but a maintenance release (4.7) comes first. Linking against s3 C++ library. TBD. Some issues when installing.

      • Supporting files? Yes. Zip files? Don’t know. Will find out.
      • Ryan A: would be good to include in the spec section so that implementers can implement.
      • Ward: if it were in the spec, it would be clearer what needs to be implemented.
      • Martin: currently outside of the spec
      • James: should definitely discuss
  • Governance

    • See:

      https://github.com/zarr-developers/governance/pull/9

    • James: seems like you’re in the org and core or not. Can we add

      people to the organization but without commit rights. At least took away from the document. Martin: that’s how dask does it. Nick: discussed this. Unfortunate that teams aren’t public. In dask, do they have a GH team? James: there is not.

      • Tl;dr – Josh to add “contributor” team / status to the governance (unless someone else does it first)

      • James: open question of what the criteria. For example

        • (dask) 2-3 core-devs have to agree

        • Anyone with a commit…

        • All of the above …

  • Any other business

    • Nick: held off on inviting PIMS developer

    • Arewemeetingyet.com

      link is off by an hour (indicates this meeting should start an hour later than we started today)

      • Josh: ACK! Time zone. Worrisome. I’ll stop using that link. Does anyone have a good way of reliably linking to something that resolves all the timezone issues?
    • Nick/Matt: spatial image work (on rectilinear grids…)

      • Discussed how to share images with xarray

      • Common issue in scientific imaging (bio/med) space

      • Having some definitions would be good.

      • Concepts like: x,y,z,t,c Also: scale/translate/rotate

      • Ryan A: sounds like http://cfconventions.org

        • Spec sits on top of netcdf

        • Nick: how does netcdf sit with zarr/xarray

        • Ryan A: xarray has confused things abit. Allows us to

          experiment with things in the netcdf model that aren’t netcdf. We can write netcdf to zarr but it’s not netcdf. Been a several year process to re-sync. (“breaking to innovate”)

        • Cfconventions isn’t about file formats. About metadata.

          Nice document. Lucky to have it.

        • Nick: where does it come from? Ward: older than 8 years

          (To the question of how old is CF, version 1.0 is dated October 28, 2003
). Very active mailing list. Not intrinsically linked to netcdf but is the set of conventions. Very strictly defined terms.

        • Josh: agreed

        • Ryan A: some sociology about how geoscience got here

          • Big “CMIP” … [Ryan can add this]
          • Internal Panel on Climate Change process
          • Before 2000, just as much heterogeneity
          • Then there was an agreement to align models and store data in a central repository.
          • Galvanized the community
        • Nick: bio/medical is so fragmented. What’s the value add

          for individual users. Napari can play a roll. Josh: hard to

        • Ryan A: zarr is about arrays and numbers. Martin: even

          if convenient place to put the metadata.

        • Josh: was previously thinking of using zarr specs to get

          zarr to

        • Nick: need to know dimensions for analysis interop.

        • Ryan A: before xarray didn’t have much in-memory

          exchange. It was all, read a NC and write a NC.

        • Matt: having well-defined metadata representations is

          critical, but others will have things beyond the specification

        • Josh: have LOTS of thoughts. Table for another

          conversation.

        • Martin: something like mimetypes at a single key?

          Possibly.

    • … catching up Alistair … (Ryan A & Ward leave)

      • Alistair: extension is mentioned with a “must_understand” flag of whether or not something can be interpreted without understanding the extension. Domain-specific bits could live anywhere. Just needs publishing. Might be nice to work it through with the multiscale use case. Martin: another example would be a simple “type” attribute.
      • Ryan W: here’s a big writeup of an approach to “plugins” that came out of discussions i had at the CZI event with Josh and John: zarr-specs#49 It’s a bit lower-level while Alistair’s is higher-level, might be nice to work toward the middle between them. Alistair & Martin to read through it.
      • Alistair: good to have a large set of concrete examples for polishing out the spec.
      • Ryan W: …walk through of #49… sparse-arrays are a good case.
      • Josh: …walk through of the multiscale use case for turning a group into a type of array
      • Alistair: some kind of “hook” so a specialist library can return something more useful. Graceful degradation. Actual plugins that hook into machinery.
      • Ryan W: minimal onus on writers. No need to register. Maximally flexible. Alistair: good if every can publish, but good to collect some/all into some registry.
      • Alistair: re: types of extensions – spare arrays are similar to multiscale in that you can still provide some functionality. N5 case is different. If you don’t understand it, you can’t process it. Declared in the entrypoint metadata to give the user feedback on what needs doing.