2023-04-05

Attending: Sanket Verma (SV), Ethan Davis (ED), Norman Rzepka (NR), Marting Durrant (MD), Dennis Heimbigner (DH)

TL;DR:

SV started the meeting by giving a few updates and asking if any Zarr-Python core devs would like to assist him with the releases. After this, he started a random discussion on the metrics for defining the health of an open-source project which turned out to be a fascinating discussion with good insights!

Updates:

Zarr Office hours at: https://zarr.dev/office-hours
Proposed new sitemap: https://github.com/zarr-developers/zarr-developers.github.io/issues/80
GeoZarr-Spec: https://github.com/zarr-developers/geozarr-spec

Meeting Minutes:

SV: Assistance with the zarr-python releases
- MD: Happy to help! - expect it not too much complicated
- SV: Will take the responsibility of updating the release notes
- MD: Is there a guide to do it? - I don’t think I’m the member of Zarr-Python PyPi team
- SV: Can look into it and help you!
- MD: Conda forge should be able to do it itself
ED: Are GeoZarr meetings open?
- SV: Probably yes! I joined my first meeting last week and it seemed open.
SV: What does everyone think about the metrics for an open-source project looks like? Stars? Forks? No. of downloads? Activity on social media etc.?
- MD: Comes up a lot on anaconda - we have our own download metrics - conda downloads are much smaller than PyPi downloads - not because it has more users because PyPi is used more in automated systems particularly CI - like to watch PyPi downloads because it’s cool
- MD: Stars - may not be obvious thing as people who generally stars your project may or may not use it
- MD: Also depend on how many blogs your write - how much talks you give - also on how many PRs are open and how quickly and in what way you respond to the issues - number of GitHub repo depends on you - maybe these factors individually doesn’t give a good idea but if you combime them, I think yes!
- DH: NetCFD4 has it’s own system - we track downloads through our own page on the website - we also track clones on GitHub - we think star system doesn’t play a rule
- SV: Do you also keep track of NetCDF4 public datasets?
- DH: No, it’s so many of them!
- ED: Try to track some of them - we use that metric for funding and similar stuff - are you asking for a metric for reporting or for the health of the open-source project?
- SV: Mostly about the health but you raised a good point on using these metrics for grants and stuff - but once you reach a point where you think tracking the users/datasets is too much, then I think you’re in a good place!
- DH: Curious to know how many folks are using Zarr through NetCDF API? - I can put trackers in our code - refrained to do that because of privacy and stuff
- MD: You track the no. of requests
- DH: We tag datasets created through our API with an attribute - in theory if we have a repository for .zarr data, we can go and look over there
- MD: This was talked earlier to have a extra metadata field - to track the API usage
- DH: But most people aren’t going over the trouble for doing this - will force you to create a .zgroup at the top
- MD: Wonder if Zarr has academic publication and if we can look at that?
- ED: NetCDF has a DOI - for library and the code
- MD: Once a data format is accepted by the wider audience, they don’t usually cite it because it works!
- ED: Same case with Unidata - people have been using our stack and they forgot to cite/mention us
- SV: Have you seen someone citing your DOI?
- ED: Lil’ hard to track it - in terms of attribute for the library - OGC has something called Provenance
- SV: Has been thinking about publishing a Zarr paper over at JOSS
- ED: Something similar to OGC provenance is W3C Prov
- SV: Also see this: https://www.biorxiv.org/content/10.1101/2023.02.17.528834v2
- DH: Citing software used is lacking in academic papers - it is overlooked - you can remind people
- ED: Think there are different levels of the health - no. of downloads/users/stars and then we have the active developer community of the project - NetCDF is fine on the downloads/users but lacks the latter - Amazed at how many users contribute to Zarr code and SPEC and how much outreach is there in terms of representation! - There are different levels of where you are in the lifecycle of the project
- SV: Sometimes, it feels it it’s never enough and you can always improve - in almost all of the aspects
- SV: I think we have identified a few good metrics to define the health of your OS project in terms of software development and in the research/academic space as well!
- ED: Good to have guidance on how to engage with the contributors especially the new ones
- DH: Think it’s gonna be an issue - something which is hard for me is the coding rule/conventions - existing contributors knows this but it’s comparitively difficult to teach the new ones as there is technical debt and some dark corners which we haven’t looked at in a long time - for newcomers it important that they see them before they start contributing
- SV: Would having a code standard guide would be helpful?
- DH: Yes! But we’ve also been lucky that most of our long time contributors understand our style