Author: Reshama Shaikh
High Level Summary
Number of participants who:
- Registered: 76
- Attended: 38
- Submitted >= 1 pull request: 24
- Countries represented: 11
Sprint Background
The PyMC open source working sessions were organized by Data Umbrella to increase the participation of underrepresented persons in open source, python and data science.
This report focuses on the summary, impact and lessons learned of the Data Umbrella PyMC Open Source Working Sessions.
Pre-Series Office Hours
Photo not available.
Session 1
Session 2
Session 3
Post-Series Office Hours
Event Sponsors
This event was supported by:
This is a 3-minute video by Mariatta Wijaya of Google with inspirational tips on contributing to open source.
Schedule of Sessions
- 02-Jul-2022: Pre-series Office Hours (13-14:00 UTC) (1 hr)
- 09-Jul-2022: Session #1 (13-16:00 UTC) (3 hrs)
- 22-Jul-2022: Session #2 (16-19:00 UTC) (3 hrs)
- 4/5-Aug-2022: Session #3 (23-2:00 UTC) (3 hrs)
- 18-Aug-2022: Post-series Office Hours (23-24:00 UTC) (1hr)
Number of Attendees
Session | Data Umbrella Organizers | PyMC Mentors | Community Contributors | Note |
---|---|---|---|---|
Pre-series Office Hours | 3 | 2 | 24 | |
Session #1 | 3 | 4 | 20 | |
Session #2 | 3 | 4 | 12 | |
Session #3 | 1 | 4 | 6 | Asia-Pacific (a) |
Post-series Office Hours | 1 | 3 | 4 | Asia-Pacific(a) |
(a) Session 3 and post-series office hours were for Asia-Pacific time zone.
Contributions Statistics
The contributions during the working sessions were tracked in this PyMC OS-WS spreadsheet. Contributions included both submitting a pull request and opening an issue where observed.
We worked on a few different repositories for the PyMC project:
- video-timestamps: this is a beginner-friendly list of issues where contributors watch a video from the PyMCon 2022 conference and add timestamps
- pymc-data-umbrella: this is the event website. Contributors could submit PRs to fix typos or clarify the contributing guide, as well as add their information to the list of participants
- pymc-dev/pymc: this is the main code repository for PyMC
- pymc-dev/pymc-examples: this is the repo that holds notebook examples for PyMC
As of the date of this report (28-Aug-2022), these are the PR stats:
- Open: 2
- Merged: 56
- Issues opened: 6
Timestamps
Timestamps were added for 16 videos.
Event website
A number of PRs were submitted to update contributor information.
Updating Jupyter Notebooks
This was a more intermediate issue for new contributors, which was updating notebooks with consistent information for sphinx rendering.
PyMC documentation
These contributions were in the main code repository.
Demographics
Of the 74 people who registered, 38 attended. Of the 38 who attended, 24 submitted a pull request. This funnel graph shows the breakdown, by gender.
A total of 38 contributors attended the sprint. 14 of 38 (37%) identified as she/her. 24 of 38 (63%) identified as he/him.
Contributors joined from 10 different countries. Country information was provided based on where participants were joining from.
- United States of America: 13
- India: 6
- Ghana: 4
- Kenya: 4
- Germany: 3
- United Kingdom: 2
- Canada: 2
- Brazil: 2
- Colombia: 1
- Ireland: 1
Returning Contributors
There were 3 “returning” contributors. These contributors had participated in a previous scikit-learn sprint.
Spoken Languages
The event was run in English. Participants were asked on their registration forms to indicate if they needed a translator. No translators were requested.
This barplot shows the primary spoken languages by the sprint participants.
Impact Report for Data Umbrella PyMC Open Source Working Sessions
Non-measurable Impact
Aside from the number of PRs that were merged and issues that were opened, there is non-quantifiable impact of the open source working sessions. Some examples include:
- learning to set up virtual environment
- using Git (fork, clone, branch, fetching another’s PR)
- introduction to tests such as: flake8 (linting, formatting), pytest, “continuous integration”
- learning about sphinx and documentation
- learning about NumPy validation
- navigating through the codebase structure of pymc
- digging into functions, learning about errors
- interacting with contributors on GitHub
- learning, in general
- networking, meeting people from around the world
- building confidence (making a dent in “imposter syndrome”)
- having fun
Finding out About the Working Sessions
For those who attended the working sessions, this is how they learned of the event. The main avenues were by invitation from Data Umbrella, Meetup, Twitter, LinkedIn and their network (“word of mouth”).
Next Steps
Explore options to continue momentum of contributions.
Sessions Feedback
Feedback has been shared a number of ways:
- Event survey
- Social media (Twitter, LinkedIn)
- Casually, in conversation during the office hours and working sessions
Survey
We received 5 responses to the survey. The primary reason the responses rate was so low is that these events were spread over a 7-week period and different people attended different events.
Overall, the feedback on the surveys was positive.
In response to the question “What are your favorite parts about the sessions?”
- Interacting with Mr. Christian and getting to know more about the community and workings.
- Working with other people - a lot of time spent alone when learning usually so it’s a nice change and good to be exposed to other people’s ideas
- Meeting core PyMC team and other contributors, networking, learning to contribute to open source project
Suggestions for Improvement
In response to the question “What could have worked better at the sessions?”
- I had (and still have) difficulty finding certain pages and links - between pymc contributing section and dataumbrella/pymc website I get confused, since the websites look similar but have different URLs
- Call out need to fork both pymc and pymc-examples (or whichever one you plan to contribute to)
Challenges
Challenge 1: Emails going to spam
We communicated with registrants via email and Discord. For a number of people, the emails went to spam and they missed it. We do have a reminder on the registration form to keep an eye out on their spam folder, but emails were still missed.
Challenge 2: Preparing by reading
The event had a comprehensive website and the events were posted on Meetup with instructions as well as in multiple places (event website, Discord, newsletters, emails) on the process (join Discord, go through website, submit a registration form). Despite numerous reminders a number of people did not join Discord, some joined Discord at the start of the event, which might indicate they missed reminders, some participants did not submit a registration form, etc.
It is important that participants submit a registration form for these reasons:
- They have read and agreed to the code of conduct.
- They understand how the event will go.
- Many participants have anonymous Discord profiles and this information is needed to track who is joining the event and can be added to the private channel.
- We need to connect participants to their GitHub pull requests.
- We need participants email addresses to communicate with them about the event.
Challenge 3: Discord
Some participants had technical issues with Discord. We have a 10-minute video on how to navigate Discord, though it is not apparent that all participants watched the video.
What’s Next
We hope to maintain the momentum by holding casual monthly “study groups” to continue contributing to PyMC.
Sessions: Social Media Shares
Carlo of Brazil
Pablo of Brazil
Igor of USA
Dustin of USA
Prince of Ghana
Rowan of Tennessee, USA
Made my first open source contributions today with @CarolBasknRobns! Watch out world 💪🤓 Thanks for the great event #DataUmbrellaPyMCSprint @DataUmbrella @pymc_devs pic.twitter.com/BKRPZcLETC
— rowan schaefer (@rowan_________) August 5, 2022
Benjamin
Really enjoyed the first working session of the #DataUmbrellaPyMCSprint! Thank you @DataUmbrella for organizing the event.
— Benjamin Datko (@BenDatko) July 9, 2022
Chris Fonnessbeck, PyMC Team Member
The @DataUmbrella PyMC sprint on Friday was fantastic. It's a great way to get involved with the project and with the open source data science community in general. https://t.co/pj3s8PNUas
— Chris Fonnesbeck (@fonnesbeck) July 24, 2022
Zoe
I’m proud to join the @pymc_devs contributors team, thanks to the leadership or @reshamas and the @DataUmbrella community. https://t.co/ns017TCvsC#DataScience #OpenSource #Statistics
— Zoe Braiterman (@zbraiterman) July 10, 2022
Social Media Promotion
We created a social media kit for the Data Umbrella PyMC Open Source working sessions to provide content for our community partners to share.
Twitter (English)
🧵
— Data Umbrella (@DataUmbrella) June 20, 2022
📣Join us: *online* working sessions to contribute to @pymc_devs #oss
👉🏽with a focus on underrepresented persons in #DataScience
🗓️ Jul/Aug 2022: office hrs + 3 sessions
We thank our sponsors @Google @cziscience @pymc_labs
Submit a registration form:https://t.co/WFLPuy6rts pic.twitter.com/UyptFHrPav
LinkedIn (English)
Acknowledgments
We thank the Data Umbrella & PyMC organizers who created the website, conducted outreach, marketing and so much more!
- Reshama Shaikh
- Beryl Kanali
- Sandra Meneses
- Sandy Weng
- Cristina Mulas Lopez
- Christian Luhmann
- Oriol Abril Pla
- Thomas Wiecki
We thank the PyMC team who mentored at the sessions and those who were online during the weekend afterwards to promptly review the submitted pull requests, particularly:
- Christian Luhmann
- Oriol Abril Pla
- Ravin Kumar
- Dan Phan
- Chris Fonnesbeck
- Alex Andorra
- Michael Osthege
- Fernando Irarrázaval
References
- PyMC sprints organized by Data Umbrella
- Interview with Sandra Meneses: Contributing to PyMC
- Reflections on the Data Umbrella PyMC February 2022 Sprint
- Data Umbrella scikit-learn Sprint Reports
Addendum
- [no addendums or updates at the time of publication]