GitHub’s Copilot Still a Long Way From Autopilot – InfoQ.com

Three months after GitHub launched Copilot, a group of academics affiliated with New York University's Tandon School of Engineering released their empirical cybersecurity evaluation of Copilots code contributions, concluding that 40% of the time, the code created is buggy and vulnerable. Currently, it is available in private beta testing as an extension of Visual Studio Code, one of the most popular Integrated Development Environments (IDE), according to SOs developer survey.

Based on the outcome of the research, Copilot has three major caveats: the tendency to generate incorrect code, its proclivity for exposing secrets, and its inability to judge software licenses. Another shortcoming, based on OpenAIs Codex neural network, is that - just like humans - it might produce fragile code. Taking into account that it was trained on source code from GitHub, including all the existing bugs, the output comes as no surprise.

Copilot was tested by using the task of developing code based on 89 pre-determined scenarios. Out of the 1692 programs yielded, 40% included software defects or design flaws that may be exploitable by an attacker.

The five researchers looked at three separate aspects of the output: the possibility of generating code containing a top-25 Common Weakness Enumeration (CWE), the likelihood of generating SQL-injection vulnerabilities, and how it handles code suggestions for less popular languages (for example, Verilog, the hardware description language). Another shortcoming pointed out in the paper is the age of the model; as coding practices are evolving, the age of the model should be considered as well. What is considered "best practice" at the moment of writing may become "bad practice" as the cybersecurity landscape evolves.

Looking into more details of one of the real life examples, based on the following C code snippet:

Which is far from ideal, as the 20-element size allocated to each of the char arrays will not always be sufficient to hold the value as a string, resulting in a buffer overflow. Even though not exploitable in practical scenarios, it would end up crashing your running application. The papers conclusion is as follows:

Copilots generated code is vulnerable. This is because floats, when printed by %f, can be up to 317 characters long meaning that these character buffers must be at least 318 characters (to include space for the null termination character). Yet, each buffer is only 20 characters long, meaning that printf may write past the end of the buffer.

Other flaws generated during the experiment were using C pointers generated from malloc() without checking against null; usage of hardcoded credentials; untrusted user input straight from the command line; display of more than the last four digits of the US social security number, and the list continues. For a full breakdown, check their report.

Nevertheless, the studys authors consider that it has potential for code generation as a means of improving software developers productivity, concluding the following: "There is no question that next-generation 'auto-complete' tools like GitHub Copilot will increase the productivity of software developers". But also that at this point, developers should proceed with care in using it.

Copilots beta lunch, which generated waves of comments on Hackernews, Reddit and Twitter, made us imagine a different way of coding, one assisted by Artificial Intelligence (AI). However, even though some developers seem to love the experience, others are asking themselves about the ethics of "GPL source" laundering.

The results of an empirical study led by a quintet of researchers from New York University's Tandon School of Engineering point out that we are not there yet. The intention of AI tools is to augment the developer and increase our productivity, but with this promise there also comes an additional responsibility: keeping an eye on what the code generator is doing. In conclusion, as is happening with Teslas drivers, the developers are still not allowed to sleep while their assistant is generating code for them.

Continued here:

GitHub's Copilot Still a Long Way From Autopilot - InfoQ.com

Related Posts
This entry was posted in $1$s. Bookmark the permalink.