25th March, 2020

Planning to attend: Josh

Attending: Ward (NetCDF), Josh, Nicholas (CZI/napari), Talley (HMS), RyanW, Matt M. (Kitware), James B., Mike Grauer (Kitware), MartinD, John Kirkham, Josh [notes]

  • Standing items

  • Martin (anaconda): PoC PR for storing zarr on fsspec backend, consolidating the various backends. Rather than various directory/nested things, you can write a FSSpec backend. Brings the possibility of chaining filesystems (just went in): e.g. remote zip and cache it locally for your dask workers. URLs may be complicated, so may be best to do it as part of intake to not need to remember how to construct the parameters.

      • Nick: related to google drive? Yes. M: gdrive has an implementation. One problem is the authentication layer, e.g. if you need to verify the user via the browser. [if you don’t have developer accounts, etc. etc]

      • I.e. getting into a political issue and need to talk to the right people.

      • M: And those libraries keep changing. Unsure who to talk to.

      • May need to have dask see the external system even if the local client can’t.

      • https://filesystem-spec.readthedocs.io/en/latest/features.html (documentation coming, dependent filesystems need releasing)

      • Work on intake has started again, e.g. visualization layer. See xrviz

      • N: did napari look at intake as a reader plugin? Not yet.

      • M: intake has been discussing a reader plugin architecture that could be refactored out (as an interface). Consolidate? N: currently using pluggy.

      • https://blog.danallan.com/posts/2020-03-07-reader-protocol/ - RFC

        • N: Napari might be game. Talley: return value may be

          difficult.

        • M: Metadata is part of this. Arrays are just another

          type of thing.

        • N: need for call about the next generation file loading

          / api / format?

        • M: would think many people would want to attend, not

          sure how to reach them. A link with zarr is seeking protocols for what it does. Will try to do something with the blog post by next week.

        • N: to invite.
  • Update on spatial coordinates: Nick (Napari) and Matt (ITK) - 20:32 UTC

    • See:

      https://github.com/thewtex/spatial-image/tree/spatial-image

      • Summary: A multi-dimensional spatial image data structure for scientific Python.

        • X-array + extra constraints

        • This package defines spatial image metadata, provides a

          function, is_spatial_image, to verify the expected behavior of a spatial image instance, and provides a reference function, to_spatial_image to convert an array-like, e.g. a NumPy ndarray or a Dask array, to a spatial image.

      • pyimagej equivalent merged

      • ITK Python equivalent in released in release candidate

      • Drafted for first PR:

        • Purpose

        • Motivation

        • Data dimensions

        • Coordinates

      • Todo for first PR

        • Direction

        • Equation and diagrams for world coordinates

      • Next PR’s

        • Multiscale

        • Other metadata, e.g. units

    • See:

      https://github.com/napari/napari/projects/10 “World coordinates”

      • Add scale translate transforms, transform chain class. Discussion of generality of transform system, affine, deformable, complex chains
      • For now focus on nD affine, but with possibility to generalize.
      • How would you store this information in a file / zarr?
    • Notes:

      • (above all kindly provided by Matt)

      • Following on from discussions in multiscale PR #50

      • spatial-image PR should be coming soon.

      • N: do we need to specify all the “components” or just a bunch of stuff (after the XYZT)

      • Martin: as astronomer, other things may be important.

        • N: expand? Some don’t want those and want other.

        • Geo/earth-science don’t have x/y but spherical

          coordinate system. T is probably the only thing that everyone can agree on, though astronomers can disagree there, too. E.g. red-shift.

        • Josh: the bridge to zarr is perhaps what will we define

          in terms for describing a ND array

        • Matt: Additionally, this is Euclidean. I.e.

          common-constraints so you know what you’re dealing with so you can make optimizations.

        • Martin: if restricting to rectilinear, then xyzt &

          freq/energy make sense. Something like ultrasound needs translation to rectilinear.

      • Matt: in ITK, other dimensions can be non-vector-based, e.g. rate tensors, etc.

      • Lesson-learned: coordinates are stored and compressed separately as a 1D array. Another option to be storing it as metadata. Could potentially be done better in the zarr spec.

      • Feedback from CZI guys at the dask workshop: Tony, thinking about using this for standards, thinks that filters may be more of a bug than a feature. Other languages (JS) have difficulty reading such sets. Martin: would be good to describe each filter in a language agnostic way. Matt: perhaps constraining which filters in the spec?

  • AOB

    • Ward: netcdf working with zizzle (?) for testing against S3.

      • community-led organization with meetings, etc.
      • Hope to have something soon and feedback and fixes for within ~ 2 mon. (… last mile concerns)
    • Goverance

      • Josh: Please take a look at the governance repository if you haven’t
      • John: can we talk next week?
      • John: dividing meeting in half. Use-cases 50%; core zarr 50%? Sounds good.
      • Josh: where should we post? GitHub is fine.