In the Search of Code Quality

Key Takeaways

Recently I have encountered research on the correlation between a programming language used in the project and code quality. I was intrigued because the results were contrary to what I would expect. On the one hand the study could be flawed on the other hand many established practices and beliefs in software development are of obscure origin. We adapt them because "everybody" is doing them or they are considered best practices, or they are preached by evangelists (the very name should be a warning sign). Do they really work or are urban legends? What if we look at the hard data? I checked a couple of other papers, in all cases the results held surprises.

Taking into account how important software systems are in our economy it is surprising how scarce are scientific researches on the development process. One of the reasons could be that the software development process is very expensive, usually owned by companies that are not eager to let in researchers, which makes experiments on real projects impractical. Recently public code repositories like GitHub or GitLab change this situation providing easily accessible data. More and more researchers try to dig into the data.

One of the first studies based on data from public repositories - titled A large ecosystem study to understand the effect of programming languages on code quality - was published in 2016. It tried to validate belief - almost ubiquitously taken for granted - that some programming languages produce a higher quality code than others. The researchers were looking for a correlation between a programming language and the number and type of defects. Analysis of bug related commits in 729 GitHub projects developed in 17 languages indeed showed an expected correlation. Notable, languages like TypeScript, Clojure, Haskell, Ruby, and Scala were less error-prone than C, C++, Objective-C, JavaScript, PHP, and Python.

In general functional and statically typed languages were less error-prone than dynamically typed, scripting, or procedural languages. Interestingly defect types correlated stronger with language than the number of defects. In general, the results were not surprising, confirming what the majority of the community believed to be true. The study got popularity and was extensively cited. There is one caveat, the results were statistical and interpreting statistical results one must be careful. Statistical significance does not always entail practical significance and, as the authors rightfully warn, correlation is not causation. The results of the study do not imply (although many readers have interpreted it in such a way) that if you change C to Haskell you will have fewer bugs in the code. Anyway, the paper at least provided data-backed arguments.

But thats not the end of the story. As one of the cornerstones of the scientific method is replication, a team of researchers tried to replicate the study from 2016. The result, after correcting some methodological shortcomings found in the original paper, was published in 2019 in the paper On the Impact of Programming Languages on Code Quality A Reproduction Study.

The replication was far from successful, most of the claims from the original paper were not reproduced. Although some correlations were still statistically significant, they were not significant from a practical point of view. In other words, if we look at the data, it seems that it is of marginal importance which programming language we choose, at least as far as the number of bugs is concerned. Not convinced? Lets look at another paper.

Paper from 2019, Understanding Real-World Concurrency Bugs in Go, focused on concurrency bugs in projects developed in Go, a modern programming language developed by Google. It was especially designed to make concurrent programming easier and less error-prone. Although Go advocates using message passing concurrency as less error-prone it provides mechanisms for both message passing and shared memory synchronization concurrency, hence it is a natural choice if one wants to compare both approaches. The researchers analyzed concurrency bugs found in six popular open source Go projects including Docker, Kubernetes, and gRPC. The results bewildered even the authors:

"Surprisingly, our study shows that it is as easy to make concurrency bugs with message passing as with shared memory, sometimes even more."

Although the studies we have seen so far suggest that advances in programming language have little bearing on code defects, there can be another explanation.

Lets take a look at yet another research - the classical Munich Taxi-cab Experiment conducted in the early 1980s. Although the research is not related to IT but road safety, the researchers encountered similar unintuitive results. In the 1980s German car manufacturers began to install the first ABS (anti-lock braking system) in cars. As ABS makes the car more stable during braking, it is a natural expectation that it improves safety on the road. The researchers wanted to find out how much. They cooperated with a taxi company that planned to install ABS in part of their fleet. 3000 taxi cars were selected and in half of randomly selected cars ABS was installed. The researchers had been observing the cars for 3 years. Afterward, they compared accident rates in the group with ABS and without ABS. The result was at least surprising, there was practically no difference, even the cars with the ABS were slightly more likely to be involved in an accident.

As in the case of the research on bugs rate and concurrency bugs in Go, in theory, there should be a difference, but data shows otherwise. In the ABS experiment, the investigators had collected additional data. Firstly, the cars were equipped in kind of black boxes collecting information like speed and acceleration. Secondly, observers were assigned to the drivers to take notes of their behavior on the road. The picture from the data was clear. With ABS installed in the cars the drivers changed their behavior on the road. Noticing that now they have better control of the car and stopping distance is shorter the drivers started to drive faster and more dangerously, taking sharper turns, tailgating.

The explanation of this phenomenon is based on the concept of target risk from psychology - people behave so that overall risk - called target risk - is on a constant level. When circumstances change people adapt their behavior so that the level of risk is constant. Installing the ABS in the cars lowers the risk of driving, so the drivers, to compensate for this change, begin to drive more aggressively. Similar risk compensation was found in other areas as well. Children take more physical risk when playing sports with protective gears, medicine bottles with childproof lids make parents more careless with medicines, better ripcords on parachutes are pulled later.

Lets come back to the studies on the code quality. What the researchers were analyzing? Commits to the code repository. When the developer commits the code? When he is sure enough that the code quality is acceptable. In other words, when the risk of committed buggy code is at a reasonable level. What happens when the developer switches to a language that is less error-prone? She will quickly notice that she can now write fewer tests, spend less time reviewing the code, and skip some quality checks at the same time maintaining the same risk of committing low quality code. Like in the case of drivers with installed ABS, she adapted her behavior to the new situation, so that the target risk is the same as before. Every developer has an inner standard of code quality and target risk of committing the code below this standard. Note that the target risk and the standard will vary among developers, but the studies suggest that on average they are the same among developers of different languages.

Natural question is what about other established techniques to improve code quality? I looked for papers on two of them: pair programming and code review. Do they work as is commonly preached? Well, yes and no, it turns out that the situation is a bit more complicated. In both cases there are several studies examining the effectiveness of the approach.

Lets look at meta-analysis of experiments on pair programming The effectiveness of pair programming: A meta-analysis. Does it improve code quality? "The analysis shows a small significant positive overall effect of pair programming on quality". Small positive effect sounds a bit disappointing, but thats not the end of the story.

"A more detailed examination of the evidence suggests that pair programming is faster than solo programming when programming task complexity is low and yields code solutions of higher quality when task complexity is high. The higher quality for complex tasks comes at a price of considerably greater effort, while the reduced completion time for the simpler tasks comes at a price of noticeably lower quality."

In the case of the code review the results of the researches were usually more consistent, but main benefits are not as I would expect, in the area of early defects detection. As authors of the study on code review practices at Microsoft - Expectations, Outcomes, and Challenges of Modern Code Review - conclude:

"Our study reveals that while finding defects remains the main motivation for review, reviews are less about defects than expected and instead provide additional benefits such as knowledge transfer, increased team awareness, and creation of alternative solutions to problems."

Natural question is why is there such a discrepancy between results of scientific research and common beliefs in our community? One of the reasons can be the divide between academia and practitioners, so that the results of studies find difficult way to the developers, but thats only half of the story.

In the mid 1980s Fred Brooks published the famous paper "No Silver Bullet Essence and Accident in Software Engineering". In the introduction he compares the software project to a werewolf

"The familiar software project has something of this character (at least as seen by the non-technical manager), usually innocent and straightforward, but capable of becoming a monster of missed schedules, blown budgets, and flawed products. So we hear desperate cries for a silver bullet, something to make software costs drop as rapidly as computer hardware costs do."

He argues that there are no silver bullets in software development due to its very nature. It is inherently complex endeavour. In the 1980s most software ran on a single machine with a single one-core processor, the Internet was in its early infancy, smartphones were in distant future, and nobody heard about virtualization or clouds. Brooks was writing mainly about technical complexity, now we are more aware of the complexity of the social, psychological and business processes involved in the software development.

This complexity has also increased substantially since Brooks publication. Development teams are larger, often distributed and multicultural, the software systems are much closer entangled with business and social tissue. Despite all the progress, software development is still extremely complex, sometimes on the verge of chaos. We must face constantly changing requirements, rising technical complexity, and confusing nonlinear feedback loops created by entangled technical, business, and social forces. The natural wiring of our brains is quite poor at figuring out what is going on in such an environment. It is not surprising the IT community is plagued with hypes, myths, and religious wars. We desperately want to make sense of all the staff, so our brains do what they are really good at - finding patterns.

Sometimes they are too good, and we see channels on the Mars surface, faces in random dots, rules in roulette wheel. Once we start to believe in something we are getting literally addicted to it, each confirmation of our belief gives us a dopamine shot. We start to protect our beliefs, as a result we close ourselves in echo chambers, we choose conferences, books, media that confirms our cherished beliefs. With time the beliefs solidify in a dogma that hardly anyone dares to challenge.

Even with the scientific method that allows us to tackle complexity and our biases in a more rational way it can be very hard to predict the result of an action in complex processes like software development. We change programming language to better and code quality does not change, we introduce pair programming or code review to improve code quality and we experience lower quality, or we get benefits in unexpected areas. But there is also a bright side of the complexity - we can find unexpected leverage points. If we want to improve code quality, instead of looking for technical solutions, like a new programming language or better tools we can focus on improving development culture, raising the quality standards, or making committing the bugs more risky.

Looking from this perspective can shed light on some unobvious opportunities. For example, if a team introduces code reviews it makes the code produced by a developer more visible to other members of the team and hence rises the risk of committing poor quality code. Hence code review should have the effect of raising the quality of committed code, not only by finding bugs or standard violations by the reviewers (what quoted above researches were looking for), but by preventing developers from commiting bugs. In other words, to raise the quality of the code it should be enough to convince the developers that their code is being reviewed even if nobody is doing it.

The moral of the studies is also that technological factors cannot be separated from psychological and cultural ones. As in many other areas, data based researches show that the world does not function in the way we believe. To check how far our belief corresponds with reality we dont have to wait for researchers to conduct long term studies. Some time ago we had an emotional dispute on some topic with many arguments from both sides. After about half an hour someone said - lets check it on the Internet. We sorted out the disagreement in 30 seconds. Scientific thinking and some dose of scepticism are not reserved for scientists, sometimes quick check on the Internet is enough, sometimes we need to collect and analyze data, but in many cases it is not rocket science. But how to introduce more rationality into the software development practices is a broad topic maybe worth another article.

Jacek Sokulski has 20+ years experience in software development. Currently works in DOT Systems as a software architect. His interest spans from distributed systems and software architecture to complex systems, AI, Psychology and Philosophy. He has a PhD in Mathematics and a postgraduate diploma in Psychology.

Continued here:
In the Search of Code Quality - InfoQ.com

Research, Evaluation and Learning at the International Rescue Committee - World - ReliefWeb [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Conserving Biodiversity with AI - BBN Times [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
DevOps Fundamentals You Ever Wanted To Know - hackernoon.com [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Another Perspective on Evictions - Bacon's Rebellion [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Amitabh Bachchan on fans alternate job suggestion: My job is now insured - The Indian Express [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Will You Soon Download Packaging Machine Controls from the Internet? - Packaging Digest [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
5 free resources every data scientist should start using today - The Next Web [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Who's hoping to make an Epic impact on Green Bay area music scene with a new concert venue? | Streetwise - Green Bay Press Gazette [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Industrial robots are dominating but are they safe from cyber-attacks? - TechHQ [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Friday Rant - Rise of the Rogue-Bots? - Diginomica [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Important Reasons Why You Should Pick RoR As Your Web-Based Development Project - Customer Think [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Portrait of the software developer as an artist - ComputerWeekly.com [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Python may be your safest bet for a career in coding - Gadgets Now [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
1Password is coming to Linux - ZDNet [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
IBM creates an open source tool to simplify API documentation - TechRepublic [Last Updated On: August 10th, 2020] [Originally Added On: August 10th, 2020]
Mastercard : Accelerate Ignites Next Generation of Fintech Disruptors and Partners to Build the Future of Commerce - Marketscreener.com [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Expanding the Universe of Haptics | by Lofelt | Aug, 2020 - Medium [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
UX Designer Salary: 5 Important Things to Know - Dice Insights [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Persistent memory reshaping advanced analytics to improve customer experiences - IT World Canada [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
NextCorps and SecondMuse Open Application Period for Programs that Help Climate Technology Startups Accelerate Hardware Manufacturing - GlobeNewswire [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Buried deep in the ice is the GitHub code vault humanity's safeguard against devastation - ABC News [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Top 12 Most Used Tools By Developers In 2020 - Analytics India Magazine [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Facebook's React 17 JavaScript library: Here's why its top feature is 'no new features' - ZDNet [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
CORRECTING and REPLACING Anyscale Hosts Inaugural Ray Summit on Scalable Python and Scalable Machine Learning - Business Wire [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
Google: Here's how much we give to open source through our GitHub activity - ZDNet [Last Updated On: August 12th, 2020] [Originally Added On: August 12th, 2020]
How Chriselle Lim And Joan Nguyen Created Bmo, The Coworking Space And Virtual Classroom Of The Future (With A Childcare Twist) - Forbes [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
How Will Public Libraries Adapt To New School Year Norms? - Book Riot [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
Google: We'll test hiding the full URL in Chrome 86 to combat phishing - ZDNet [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
How to install Python 3 and PIP 3 on Ubuntu 20.04 LTS - Linux Shout - H2S Media [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
What are Bitcoin Wallets: Everything You Need to Know - Programming Insider [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
JSHint is Now Free Software after Updating License to MIT Expat - WP Tavern [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
How to learn JavaScript: These are the best online courses - Mashable [Last Updated On: August 13th, 2020] [Originally Added On: August 13th, 2020]
What developers need to know about inter-blockchain communication - ComputerWeekly.com [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
Introducing the CDK construct library for the serverless LAMP stack - idk.dev [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
IBM asked software developers to take on the wrath of Mother Nature - The Drum [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
Aspire Technology Launches First Truly Secure Public Blockchain for Creation of Digital Assets - GlobeNewswire [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
GM Creates And Shares New Workplace Safety Technologies - Pulse 2.0 [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
Key Considerations and Tools for IP Protection of Computer Programs in Europe and Beyond - Lexology [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
The state of application security: What the statistics tell us - CSO Online [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
Open Source: What's the delay on the former high/middle school on North Mulberry? - knoxpages.com [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
The Risks Associated with OSS and How to Mitigate Them - Security Boulevard [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
news digest: Microsoft launches open source website, TensorFlow Recorder released, and Stackery brings serverless to the Jamstack - SD Times -... [Last Updated On: August 14th, 2020] [Originally Added On: August 14th, 2020]
Build Your Own PaaS with Crossplane: Kubernetes, OAM, and Core Workflows - InfoQ.com [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
ISRO Is Recruiting For Vacancies with Salary Upto Rs 54000: How to Apply - The Better India [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
Does technology increase the problem of racism and discrimination? - TechTarget [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
CORRECTING and REPLACING Anyscale Hosts Inaugural Ray Summit on Scalable Python and Scalable Machine Learning - Yahoo Finance [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
In the City: Take advantage of open recreation, cultural and park amenities - Coloradoan [Last Updated On: August 17th, 2020] [Originally Added On: August 17th, 2020]
Exploring the future of modern software development - ComputerWeekly.com [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Hadoop Developer Interview Questions: What to Know to Land the Job - Dice Insights [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
SiFive Opens Business Unit to Build Chips With Arm and RISC-V Inside - Electronic Design [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Use Pulumi and Azure DevOps to deploy infrastructure as code - TechTarget [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Why ASP.NET Core Is Regarded As One Of The Best Frameworks For Building Highly Scalable And Modern Web Applications - WhaTech [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
NITK figures 4th in Google Summer of Code ranking - BusinessLine [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Learn More About Dynamo for Revit: Features, Functions, and News - ArchDaily [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Linux Foundation showcases the greater good of open source - ComputerWeekly.com [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Programming language Kotlin 1.4 is out: This is how it's improved quality and performance - ZDNet [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Top 10 Languages That Paid Highest Salaries Worldwide In 2020 - Analytics India Magazine [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Programming language Rust: Mozilla job cuts have hit us badly but here's how we'll survive - ZDNet [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
In-App Bidding Gathers Steam, But Adoption Looks Nothing Like Header Bidding On The Web - AdExchanger [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
13 thoughts on Fitting Snake Into A QR Code - Hackaday [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Newham test and trace app was designed by man who grew up in the borough - Newham Recorder [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
'Trapped in a code' the fight over our algorithmic future - Open Democracy [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Telegram launches one-on-one video calls on iOS and Android - The Verge [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
AWS Controllers for Kubernetes Will Be A 'Boon For Developers' - CRN: Technology news for channel partners and solution providers [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Coding within company constraints - ComputerWeekly.com [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Open Source and Open Standards: The Recipe for Success Featured - The Fast Mode [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
How Intel helped give the worlds first cyborg a voice - The Next Web [Last Updated On: August 21st, 2020] [Originally Added On: August 21st, 2020]
Tiger Woods, Rory McIlroy near bottom of field at The Northern Trust - ESPN [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
Intel Owl OSINT tool automates the intel-gathering process using a single API - The Daily Swig [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
IOTA Foundation presents the current projects in the mobility industry - Crypto News Flash [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
How 'Fortnite' and 'Second Life' Shaped the Future of Indian Market - Santa Fe Reporter [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
Apple Enters $ 2 Trillion Club, Github's Chinese Counterpart And More In This Week's Top News - Analytics India Magazine [Last Updated On: August 22nd, 2020] [Originally Added On: August 22nd, 2020]
As world grapples with pandemic, schools are the epicenter - ABC News [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
Why Businesses Should Embrace Modernizing Their Legacy Applications - TechBullion [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
Is It Time To Rename RPG? - IT Jungle [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
Phantasy Star Online programmers on breaking new ground and their Diablo-style isometric prototype - Polygon [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
How To Learn To Program In Python By Playing Videogames - Analytics India Magazine [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
New Microsoft program to help develop the quantum computing workforce of the future in India - Microsoft [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
How the Docker Revolution Will Change Your Programming, Part 1 - Walter Bradley Center for Natural and Artificial Intelligence [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]
The art of developing happy customers - ComputerWeekly.com [Last Updated On: August 24th, 2020] [Originally Added On: August 24th, 2020]