open source Posts
by phildini on June 13, 2017
There are many challenges to running an Open Source organization, but the one that I have personally felt the pain of again and again is that our tooling is awful. Github (and realistically we’re all using Github at this point) still feels in many ways like a tool designed around the idea that all the action is going to happen in one repo. This may not be entirely the fault of Github. Git itself is very tightly coupled to the idea that anything you care about for a particular action is going to happen in one, and only one, repository.
When Github released Organizations, the world rejoiced, because we could now map permissions and team members in our source repository the way they were mapped in the real world. Every new feature Github adds to its Organizations product causes more rejoicing, because so many teams work across multiple repos, and the tooling around multiple repos is still awful.
The awfulness of this tooling is probably a strong factor in the current trend towards “microservice, monorepo” code organization, but that’s another post.
I’ve been the equivalent of a core contributor for a half dozen Github organizations, and I’ve noticed that one area where the tooling is especially lacking is around labels. I’ve seen labels used to designate team or individual ownership, indicate the status of pull requests, signal that certain issues are friendly for beginners, and even used as deploy targets for chunks of code. It’s fair to say that labels form a core tool in the infrastructure of every team I’ve seen using Github, and yet the tooling Github exposes for labels is painfully lacking.
I could go on and on about this, but my goal here isn’t to necessarily make Github feel bad. I hope they’re working on better label tooling, and if they want ideas, boy am I willing to give them. But there is one label-specific wall I kept banging my head against, and that is label consistency across all the repos of an Organization.
Some of you read that and feel remembered pain. I feel that pain with you, and we are here for each other. Some of you might have no idea what I’m talking about, so I’ll explain a bit more.
Let’s say you want to add a “beginner-friendly” label to all the repos in your Open Source Organization, so that new contributors can find issues to start with. Right now on Github, you would need to go into every repo, click into the Issues page, click into the Labels tab, and manually create that label. There are no “Org-wide labels”, and no tool for easily creating and updating labels across all the repos of an organization.
Introducing Epithet, a Python-based command line tool for managing labels across an organization. You give it a Github key, organization, and label name, and it will make sure that label exists across all the repos in your org. Give it a color, and it’ll make the color of that label consistent across all repos as well. Have you decided you’re done with a particular label? Epithet can delete it from all your repos for you. Are you using Github Enterprise? Epithet supports that too.
Epithet exists to fill a very particular need in open (and closed) source Github organizations, and it’s still pretty alpha. We use it for the BeeWare project, and it might be used soon for syncing labels in the Ragtag organization. You can start using it today by checking out the (sadly small) documentation, and if there’s a feature missing you’d like to see, I’m happy to work with you on getting a PR submitted.
Managing Open Source organizations is hard. My hope is Epithet makes it a little bit easier.
WordFugue is independent, and we will never run traditional ads. If you like what we're doing, consider donating to phildini's Patreon, or buy a book from our affiliate store. This week we're reading Patrick Rothfuss' "The Name of the Wind".
Special thanks to Katie Cunningham and Kenneth Love for reviewing this post.
by phildini on June 6, 2017
Python is the best technical community I’ve seen, and close to the best community I’ve seen at this scale. If you’ve been programming for any length of time, you’ve seen technologies and frameworks and languages rise and fall. We often bemoan the loss of certain ideas from these fallen works, but rarely talk about the communities that fell with them. Python is in many ways the most deliberate community that I’ve ever seen around a technology, and my life will be worse if it ever falls.
I think the Python Community is either near an inflection point, or right on top of one. What do I mean by that? I mean that, over the next five to ten years, I see two paths for the Python community and ecosystem. (Because “Python community and ecosystem” is long to type and read, I’m going to use “Python” to mean “the Python community and ecosystem” for the rest of this post.)
Path one, the one I hope we take, is the one where we take active steps to grow Python. It means that we are continuing to welcome new people into the community, from areas we never considered. It means we have a surplus of good, well-paying jobs for Pythonistas at every experience level. It means the companies and organizations creating those jobs recognize what Python gives them, and sponsors the ecosystem and community events to be better than ever.
Path two is the path I’m worried about. It’s the path where we expect Python to take care of itself, where we collectively take a more passive approach to the community that so many of us enjoy, and which has given much to many of us. I think this path results not in Python dying overnight, but in a slow decrease in Python, in Python becoming more and more irrelevant over time. It results in less Python jobs, more Go or Node or “insert language here” jobs. It results in Python being pigeonholed into certain industries, and new Pythonistas being forced to learn some other language to start their career. It results in our major events slowly shrinking over time, and a time where we start counting down attendees instead of counting up.
I’m not going to try too hard to convince you that this is where we are, that we are at or close to a fork in the road. It’s what I believe, and I think you some of you might agree already, but here’s some of the things I’ve noticed that make me think we’re close to such a point.
- PyCon 2017 was fantastic, and had more attendees than ever, but had noticeably fewer booths in the expo hall then last year, and I believe fewer sponsors overall.
- Other Python and Django conferences, especially the smaller regional conferences, are finding it harder and harder to get sponsors. Some of this is the market tightening, some of this is companies moving out of Python, or not feeling like they get a return on their investment.
- More programs and code schools are using Python as their teaching language, but for many the entry-level positions just aren’t there. Some of this is, again, the market not hiring entry-level, some of this is the companies we work for being willing to take risks and train.
Based on the above, and some other feelings and anecdotes, I think we’re right on top of the fork in the road. So what do we do about it? We take deliberate actions to help grow Python. Here’s what I’m planning to do over the next year:
- Running for the PSF Board of Directors. Why do I think being on the Board is important in the context of this post? Because I can push for growth at the Python organization level, and I can get things done as a Board member that I can’t get done as a non-Board member of the PSF. Anyone reading this can, and should, run for the Board if they feel so inclined. But I’d also love to see more participation in the PSF committees, especially along the lines of fundraising and outreach. No matter the outcome of the election, I’m going to continue my work on the Sponsorships committee, and keep doing the other things on this list.
- Reaching out to University Computer Science departments about using Python. I’m already in the process of arranging a guest lecture with classes in my old CS department about life as a professional Software Engineer. I’m planning to add specifics about how I use Python (which is more and more the introductory teaching language) in my professional life. My hope is I can help connect classroom lessons to professional Python just by showing up and giving a small talk.
- Reaching out to University Science departments about Python. If the keynotes at PyCon 2017 taught us anything, they taught us that Python is an incredible resource in research science departments, statistics departments, anywhere deep thinkers need to do computation and visualization. I’m hoping to put together a “Python in Science” roadshow to help with this, but the reality is Software Carpentry is years ahead of me in making this happen, and anything we can to do help with them is almost certainly worthwhile.
- Being a Core Contributor to the BeeWare project. Python has great stories around developing web applications, working in the sciences, and doing systems tasks. Our stories around developing consumer apps are lacking, and I don’t think they need to be. BeeWare, and many others, are taking a stab at filling this gap, but for you reading this the action item could be “find a Python project in an area you care about, and work at making it the best it can be.”
- Volunteering time to get more companies and projects started in Python. This one is more nebulous, and I haven’t done it yet but plan to soon. I’m planning to reach out to VCs and incubators and especially hackathons and say “Here’s my background, I’m happy to show up to any event and donate my time to help, but I’m only going to help with Python.” I don’t know how this is going to go over, but this idea has some exciting potential. If we want more jobs in Python, we need to be pushing for more companies and projects to use Python, right from the beginning.
If any of these ideas seem interesting to you, feel free to copy them! If they seem interesting but daunting, feel free to reach out to me ([email protected]) to chat about them. If these ideas inspired your own ideas in a different direction, great! Tell me about what you’re doing and I’ll share it far and wide. My goal in listing these ideas isn’t to toot my own horn, but start a conversation about methods for Python outreach, in the hope of growing Python.
Of course, I could be wrong in my beliefs. (I’d actually love to be corrected with stories or data that show I’m wrong, and would happily share them here.) What if Python is healthy, and is going to grow consistently over the next decade?
Then I’d still do everything I’m planning to do, and encourage others to do the same. I think everything we pour into the Python community is valuable, and any new Pythonista we bring in enriches us all in ways we can’t possibly anticipate.
If I’m wrong, and we make Python better for no reason, we’ll still have a better Python.
by phildini on June 7, 2016
I think people have an impression that I make lots of contributions to Open Source (only recently true), and that therefore I am a master of navigating the steps contributing to Open Source requires (not at all true).
Contributing to Open Source can be hard. Yes, even if you’ve done it for a while. Yes, even if you have people willing to help and support you. If someone tries to tell you that contributing is easy, they’re forgetting the experience they’ve gained that now makes it easy for them.
After much trial and error, I have arrived at a workflow that works for me, which I’m documenting here in the hopes that it’s useful for others and in case I ever forget it.
Let’s say you want to contribute to BeeWare’s Batavia project, and you already have a change in mind. First you need to get a copy of the code.
I usually start by forking the repository (or “repo”) to my own account. “Forking” makes a new repo which is a copy of the original repo. Once you fork a repo, you won’t get any more changes from the original repo, unless you ask for them specifically (more on that later).
Now I have my own copy of the batavia repo (note the phildini/batavia instead of pybee/batavia)
To get the code onto my local machine so I can start working with it, I open a terminal, and go to the directory where I want to code to live. As an example, I have a “Repos” directory where I’ve checked out all the repos I care about.
git clone [email protected]:phildini/batavia.git
This will clone the batavia repo into a folder named batavia in my Repos directory. How did I know what the URL to clone was? Unfortunately, GitHub just changed their layout, so it’s a bit more hidden than it used to be.
Now we have the code checked out to our local machine. To start work, I first make a branch to hold my changes, something like:
git checkout -b fix-class-types
I make some changes, then make a commit with my changes.
git commit -av
The -a flag will add all unstaged files to the commit, and the -v flag will show a diff in my editor, which will open to let me create the commit message. It’s a great way to review all your changes before you’ve committed them.
With a commit ready, I will first pull anything that has changed from the original repo into my fork, to make sure there are no merge conflicts.
But wait! When we forked the repo, we made a copy completely separate from the original, and cloned from that. How do we get changes from the official repo?
The answer is through setting up an additional remote server entry.
If I run:
git remote -v
origin [email protected]:phildini/batavia.git (fetch)
origin [email protected]:phildini/batavia.git (push)
Which is what I would expect -- I am pulling from my fork and pushing to my fork. But I can set up another remote that lets me get the upstream changes and pull them into my local repo.
git remote add upstream [email protected]:pybee/batavia
Now when I run:
git remote -v
origin [email protected]:phildini/batavia.git (fetch)
origin [email protected]:phildini/batavia.git (push)
upstream [email protected]:pybee/batavia.git (fetch)
upstream [email protected]:pybee/batavia.git (push)
So I can do the following:
git checkout master
git pull upstream master --rebase
git push origin master --force
git checkout fix-class-types
git rebase master
These commands will:
- Check out the master branch
- Pull changes from the original repository into my master branch
- Update the master branch of my fork of the repo on GitHub.
- Checkout the branch I’m working on
- Pull any new changes from master into the branch I’m working on, through rebasing.
Now that I’m sure my local branch has the most recent changes from the original, I push the branch to my fork on github:
git push origin fix-class-types
With my branch all ready to go, I navigate to https://github.com/pybee/batavia, and GitHub helpfully prompts me to create a pull request. Which I do, remembering to create a helpful message and follow the contributing guidelines for the repo.
That’s the basic flow, let’s answer some questions.
Why do you make a branch in your fork, rather than make the patch on your master branch?
- GitHub pull requests are a little funny. From the moment you make a PR against a repo, any subsequent commits you make to that branch in your fork will get added to the PR. If I did my work on my master, submitted a PR, then started work on something else, any commits I pushed to my fork would end up in the PR. Creating a branch in my fork for every patch I’m working on keeps things clean.
Why did you force push to your master? Isn’t force pushing bad?
- Force pushing can be very bad, but mainly because it messes up other collaborator’s histories, and can cause weird side effects, like losing commits. On my fork of a repo, there should be no collaborators but me, so I feel safe force pushing. You’ll often need to force push upstream changes to your repo, because the commit pointers will be out of sync.
What if you need to update your PR?
- I follow a similar process, pulling changes from upstream to make sure I didn’t miss anything, and then pushing to the same branch again. GitHub takes care of the rest.
What about repos where you are a Core Contributor or have the commit bit?
- Even when I’m a Core Contributor to a repo, I still keep my fork around and make changes through PRs, for a few reasons. One, it forces me to stay in touch with the contributor workflow, and feel the pain of any breaking changes. Two, another Core Contributor should still be reviewing my PRs, and those are a bit cleaner if they’re coming from my repo (as compared to a branch on the main repo). Three, it reduces my fear of having a finger slip and committing something to the original repo that I didn’t intend.
That’s a good overview of my workflow for Open Source projects. I’m happy to explain anything that seemed unclear in the comments, and I hope this gives you ideas on how to make your own contribution workflow easier!
by phildini on June 5, 2016
It’s true that, for many projects, how you become a Core Contributor can seem mysterious. It often seems unclear what a Core Contributor even does, and it doesn’t help that each Open Source project has a slightly different definition of the responsibilities of a Core Contributor.
So this deliberately isn’t a “How to Become a Core Contributor” guide. It would be impossible to write such a guide and be definitive. This is me trying to reverse engineer how I became a Core Contributor on BeeWare and then extracting out things I think are good behaviors for getting to that stage.
How I Became a Core Contributor to BeeWare:
Met Russell Keith-Magee at DjangoCon EU 2016, where he spoke about BeeWare and Batavia.
Chatted with Russell about BeeWare, sprinted some on Batavia at DjangoCon EU 2016.
Saw Russell and Katie McLaughlin at PyCon 2016, chatted more about BeeWare with both of them, joined the BeeWare sprint.
Recognized that BeeWare had some needs I could fill, namely helping onboard new people and reviewing Pull Requests.
Asked Russell for, and received, the ‘commit bit’ on the Batavia project so I could help review and merge PRs.
Tips I Can Give Based on My Experience:
Be excited about the project and the project’s future. I think the whole BeeWare suite has amazing potential for pushing Python to limits it hasn’t really reached before, and I want to see it succeed. A Core Contributor is a caretaker of a project’s future, and should be excited about what the future holds for project.
Be active in the community. Go to conferences and meetups when you can, join the mailing lists and IRC channels, follow the project and the project maintainers on Twitter. I met Russell and Katie at a conference, then kept in touch via various IRC and twitter channels, then hung out with them again at another conference. Along the way, I was tracking BeeWare and helping where I could.
Be friendly with the existing project maintainers and Core Contributors. It’s less likely I would be a Core Contributor if I wasn’t friends with Russell and Katie, but the way we all became friends was by being active in the community around Python, Django, and BeeWare. One way to figure out if you want to be a Core Contributor on a project is to see which projects and project maintainers you gravitate towards at meetups and conferences. If there’s a personality match, you’re more likely to have a good time. If you find yourself getting frustrated with the existing Core Contributors that’s probably a sign you’ll be more frustrated than happy as a Core Contributor to that project. It’s totally fine to walk away, or find other ways to contribute.
Focus on unblocking others. I still make individual code contributions to BeeWare projects, but I prioritize reviewing and merging pull requests, and helping out others in the community. From what I’ve seen, a Core Contributor’s time is mainly one of: Triaging issues in the issue tracker, reviewing patches or pull requests, and helping others. It’s only when everyone else is unblocked that I start looking at my own code contributions.
Have fun. I asked to become a Core Contributor to BeeWare because I enjoy the community, enjoy Russell’s philosophy on bringing on newcomers, and think the project itself is really neat. If you’re having fun, it’s obvious, and most Core Contributors want to promote the people who are on fire for a project.
My hope is that I have made becoming a Core Contributor to an Open Source project seem achievable. It is completely achievable, no matter your current skill level. There’s a lot more detail I didn’t cover here, and I can’t promise that if you do all these things you’ll become a Core Contributor, even on the BeeWare project. When you ask to become a Core Contributor to a project, the existing project maintainers are evaluating all kinds of things, like how active you are, how well you might mesh with the existing team, and what existing contributions you’ve made to the project and the community. It might not be a great fit, but it doesn’t mean you’re not a great person.
What I can say is that being a Core Contributor is work, hard work, but incredibly rewarding. Seeing someone make their first contribution, and helping shepherd that contribution to acceptance, is more rewarding for me than making individual contributions. Seeing a project grow, seeing the community grow around a project, makes the work worth it.
If you want have questions about my experience, or about contributing to Open Source in general, I'm happy to answer questions in the comments, or on twitter @phildini, or email [email protected].