Attending: Sanket Verma (SV), Ethan Davis (ED), Norman Rzepka (NR), Marting Durrant (MD), Dennis Heimbigner (DH)


SV started the meeting by giving a few updates and asking if any Zarr-Python core devs would like to assist him with the releases. After this, he started a random discussion on the metrics for defining the health of an open-source project which turned out to be a fascinating discussion with good insights!


Meeting Minutes:

  • SV: Assistance with the zarr-python releases
    • MD: Happy to help! - expect it not too much complicated
    • SV: Will take the responsibility of updating the release notes
    • MD: Is there a guide to do it? - I don’t think I’m the member of Zarr-Python PyPi team
    • SV: Can look into it and help you!
    • MD: Conda forge should be able to do it itself
  • ED: Are GeoZarr meetings open?
    • SV: Probably yes! I joined my first meeting last week and it seemed open.
  • SV: What does everyone think about the metrics for an open-source project looks like? Stars? Forks? No. of downloads? Activity on social media etc.?
    • MD: Comes up a lot on anaconda - we have our own download metrics - conda downloads are much smaller than PyPi downloads - not because it has more users because PyPi is used more in automated systems particularly CI - like to watch PyPi downloads because it’s cool
    • MD: Stars - may not be obvious thing as people who generally stars your project may or may not use it
    • MD: Also depend on how many blogs your write - how much talks you give - also on how many PRs are open and how quickly and in what way you respond to the issues - number of GitHub repo depends on you - maybe these factors individually doesn’t give a good idea but if you combime them, I think yes!
    • DH: NetCFD4 has it’s own system - we track downloads through our own page on the website - we also track clones on GitHub - we think star system doesn’t play a rule
    • SV: Do you also keep track of NetCDF4 public datasets?
    • DH: No, it’s so many of them!
    • ED: Try to track some of them - we use that metric for funding and similar stuff - are you asking for a metric for reporting or for the health of the open-source project?
    • SV: Mostly about the health but you raised a good point on using these metrics for grants and stuff - but once you reach a point where you think tracking the users/datasets is too much, then I think you’re in a good place!
    • DH: Curious to know how many folks are using Zarr through NetCDF API? - I can put trackers in our code - refrained to do that because of privacy and stuff
    • MD: You track the no. of requests
    • DH: We tag datasets created through our API with an attribute - in theory if we have a repository for .zarr data, we can go and look over there
    • MD: This was talked earlier to have a extra metadata field - to track the API usage
    • DH: But most people aren’t going over the trouble for doing this - will force you to create a .zgroup at the top
    • MD: Wonder if Zarr has academic publication and if we can look at that?
    • ED: NetCDF has a DOI - for library and the code
    • MD: Once a data format is accepted by the wider audience, they don’t usually cite it because it works!
    • ED: Same case with Unidata - people have been using our stack and they forgot to cite/mention us
    • SV: Have you seen someone citing your DOI?
    • ED: Lil’ hard to track it - in terms of attribute for the library - OGC has something called Provenance
    • SV: Has been thinking about publishing a Zarr paper over at JOSS
    • ED: Something similar to OGC provenance is W3C Prov
    • SV: Also see this: https://www.biorxiv.org/content/10.1101/2023.02.17.528834v2
    • DH: Citing software used is lacking in academic papers - it is overlooked - you can remind people
    • ED: Think there are different levels of the health - no. of downloads/users/stars and then we have the active developer community of the project - NetCDF is fine on the downloads/users but lacks the latter - Amazed at how many users contribute to Zarr code and SPEC and how much outreach is there in terms of representation! - There are different levels of where you are in the lifecycle of the project
    • SV: Sometimes, it feels it it’s never enough and you can always improve - in almost all of the aspects
    • SV: I think we have identified a few good metrics to define the health of your OS project in terms of software development and in the research/academic space as well!
    • ED: Good to have guidance on how to engage with the contributors especially the new ones
    • DH: Think it’s gonna be an issue - something which is hard for me is the coding rule/conventions - existing contributors knows this but it’s comparitively difficult to teach the new ones as there is technical debt and some dark corners which we haven’t looked at in a long time - for newcomers it important that they see them before they start contributing
    • SV: Would having a code standard guide would be helpful?
    • DH: Yes! But we’ve also been lucky that most of our long time contributors understand our style