GitHub Copilot and Open Source: A Love Story That Won’t End Well? – thenewstack.io

Sasha Medvedovsky

Sasha is a software engineer with over 20 years experience, and a co-founder of Diversion, which offers open source source control management software. He has been around long enough to have seen quite an evolution of programming languages and developer tools. Hes passionate about building the next generation of tools for developer productivity and collaboration, leveraging current technologies to create a world in which software is developed faster and with ease.

GitHub has been an important part of the software development world, and of open source software in particular. It has provided free hosting for open source projects (the Apache Software Foundation moved its entire operation to GitHub a few years ago), and played a large part in turning the open source git into the popular source control management (SCM) system it is now.

However, it seems that the cooperation is now coming to an abrupt and ugly ending, with the Software Freedom Conservancy (SFC) joining Free Software Foundation in a recommendation to cut ties with GitHub over the creation of GitHub Copilot.

GitHubs recently commercialized offering of Copilot (which was free until very recently), which delivers AI-powered code composition/auto-completion, was built upon the sourcing of code from the millions of open source projects hosted in GitHub. Needless to say, not all open source projects were created equal, with many different licenses (learn more about OSS licenses), some of which DO NOT enable the reuse or copyleft of code, despite being publicly available on GitHub.

Its true that using the code for training an AI model is somewhat different from simply using the code as it is. But shouldnt the codes creators at least be consulted whether they agree to this use of their creation?

To many open source developers, this constitutes unauthorized use of their work, and a breach of their trust. Obviously, Copilot wouldnt work without ingesting millions of code samples from GitHub, so its safe to say that the open source code is an integral part of it. Moreover, any code created by Copilot could be considered a derivative of this open source code (in some cases whole snippets of open source code could find their way into a closed-source codebase).

Its true that using the code for training an AI model is somewhat different from simply using the code as it is. But shouldnt the codes creators at least be consulted whether they agree to this use of their creation?

If this recent divorce between GitHub and open source organizations may seem surprising, it shouldnt be. It really stems from a misalignment of goals and ideals.

From the beginning, GitHub has been a commercial organization that has turned open source software git into a business. While theres nothing wrong with doing so plenty of companies have built thriving businesses through commercial offerings of open source technology its imperative we dont get confused and consider GitHub an open source company or project. Its neither. This confusion lies in its business model, where production-grade, hosted git was provided for a fee to commercial organizations, and free for open source projects.

As someone once said, if the product is free, YOU are the product. Never has this sentence been more correct than in the case of GitHub. In 2018 Microsoft acquired GitHub for $7.5 billion The common understanding was that the high price (for 2018) was paid not for GitHubs technology (again, it didnt develop git, and there were many competitors, e.g. BitBucket and GitLab); but rather for its developer community, which at that time was 28 million strong.

If Microsoft paid for the OSS community, Microsoft was ultimately going to use the community to make profit. Microsoft is a commercial entity with shareholders and has an obligation to make as much profit as possible. Copilot is just the perfect example of that. Microsoft owns both GitHub, and a large stake in OpenAI, the AI company that trained the Copilot AI model. The cooperation makes so much corporate sense that can be summarized as: they have all of the most popular OSS projects in the world that they are hosting, alongside amazing AI capabilities. It just makes sense to use the synergies to make a commercially successful product.

Theres just one problem with this line of thought: hosting the code doesnt mean that Microsoft owns the code. And this is not the first time this company has made this mistaken assumption.

One illustrative exchange that took place recently points at the potential dangers.

A developer, who goes by the handle of Marak, intentionally broke the code of his open source Faker mock data generator, because he allegedly felt his work was thankless. He complained about the lack of funding for his popular projects, including Faker, which are used by hundreds of companies.

This opened the whole Pandoras Box of who really owns open source code. What if companies are using the code in production? The developer can just break the code And thats it?

GitHub got involved, and reverted the changes, and denied Marak access to his own projects (around 100).

NPM (incidentally, owned by Microsoft as well) has also reverted his repo to a previous version effectively taking control of his code.

Imagine the situation: a programmer has created a very useful open source project. They have maintained and provided it for free for hundreds of companies. Then they decide to make a change that the companies did not like. Then Microsoft (through GitHub and NPM) took over their code repositories and reverted their changes.

Does this look like Microsoft understands that the developer owns the code, or do they think that Microsoft owns the code?

I dont think the open source movement should cut all ties with commercial organizations, or stop using commercial products. Cooperation is a good thing. Its not a zero-sum game, and it helps to benefit humanity as a whole.

But the boundaries should be clearly set. If a developer doesnt want their code to be used in commercial applications, they should be given a right to refuse. If they are ok with it, then theres no problem. But companies (be it Microsoft, Google or Amazon Web Services) shouldnt just assume that if they give something for free they can take something else in return.

At the company I co-founded, Diversion, we have developed our own SCM. We plan to release it as open source (on our own platform, not on GitHub), and we hope it will become useful to millions of developers.

We will also offer free hosting for open source and indie developers, as our thanks and giveback to the amazing people whove given their time and effort for the betterment of all humankind, without asking for anything in return.

In light of these recent developments, I feel that theres a need to make a promise: we pledge, right here, to honor the software creators license agreements, and to not use their code in ways they do not agree with.

To me its something that should go without saying; but apparently, it needs to be said explicitly.

Note: Sharone Zitzman contributed to this post.

Follow this link:

GitHub Copilot and Open Source: A Love Story That Won't End Well? - thenewstack.io

Related Posts
This entry was posted in $1$s. Bookmark the permalink.