Key Takeaways
There can be no agile software delivery without the right DevOps infrastructure. In this article, we would like to share our experience in our DevOps and agile transformation journey. We have a big and distributed team structure and we are delivering an on-premise software that makes the delivery different from cloud practices. We have been using many tools that are almost standard in the agile world. The challenge was bringing all the teams together in a pipeline for faster delivery. Our first release we managed to do in 3 years! After establishing SAFe, we were able to release in semi-regular intervals 3-4 times per year. And currently, we are laying the groundwork for even faster delivery, basically trying to do the "release-on-demand" defined by SAFe, delivering a feature as soon as it is ready to be delivered.
We managed to create 2 release trains so far, and are currently working on dividing it into more pieces to help enable faster delivery. This isnt as easy as it sounds, because its not just about technically creating the trains and thinking about their domains, dependencies, etc. It is also about people and teams. The team members are used to working with each other and sometimes can show resistance to join other teams and work with different individuals. There is no silver bullet for this situation, only clear, transparent, and bi-directional communication can make things move.
Like other software development teams, we have been using many different tools for our DevOps. The fragmented DevOps landscape resulted in a lack of visibility and difficulty in dealing with problems. These problems were blocking us from releasing because we were observing the problems too late, and resolving them took time which delayed the delivery further.
The main issues we dealt with in speeding up our delivery from a DevOps perspective were: testing (unit and integration), pipeline security check, licensing (open source and other), builds, static code analysis, and deployment of the current release version. For some of these problems we had the tools, for some, we didnt and we had to integrate new tools.
Another issue was the lack of general visibility into the pipeline. We were unable to get a glimpse of what our DevOps status was, at any given moment. This was because we were using many tools for different purposes and there was no consolidated place where someone could take a look and see the complete status for a particular component or the broader project. Having distributed teams is always challenging getting them to come to the same understanding and visibility for the development status. We implemented a tool to enable a standard visibility into how each team was doing and how the SAFe train(s) were doing in general. This tool provided us with a good overview of the pipeline health.
The QA department has been working as the key-holder of the releases. Its responsibility is to check the releases against all bugs and not allow the version to be released if there are critical bugs. As standard as it sounds, this doesnt follow the agile principles. We have been trying to "inspect quality in", instead of building it in. This is why we followed DevOps principles to enable the teams to deliver quality in the first place as well as getting help from QA about expectations and automation to speed up many processes. This is taking time but the direction is clear and teams are constantly working toward it. When we analyze our release cycles, we can see where we spend the most time - it is on Staging, and this is what we are working on reducing.
Finally, we are enabling the release-on-demand concept in SAFe, because we want to release any feature, bug resolution, tweak as soon as it is ready if the Product Owner and the team say it can be released. This is a big paradigm change compared to the very long staging times for releasing a fixed scope, which was usually huge, where just to ensure everything worked required a long testing time.
The current definition of DevOps in Wikipedia is:
DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary with Agile software development; several DevOps aspects came from Agile methodology.
This definition and the practical reality tells us that no software development project can be really agile without the proper DevOps. Without having a healthy DevOps in place, it would be very difficult to have a fast, reliable delivery pipeline. Lets map some of the key features of DevOps to some key agile principles to show the relationship in a clearer manner.
Although there is no manifesto for DevOps, lets list the most commonly used DevOps practices:
Besides these more technical best practices, some culture-related practices are very commonly mentioned with DevOps:
The principles which are linked to technical best practices of DevOps are emphasized in bold above. These principles are focused on delivery, and so are the DevOps practices. Different DevOps practices focus on different parts of the delivery cycle. But they complement a cycle of faster delivery by working together.
For example, the team can only deliver continuously if they stick to the DevOps practices by doing build/integration/deployment/release, test automation, secure delivery, monitor the metrics continuously. For the rest of the agile principles, there is still a big dependency on DevOps practices actually, but they are a bit less obvious, which is why we preferred not to link directly. However, again just to give an example, lets look at "continuous attention to technical excellence and good design enhances agility." To pay close attention to technical excellence, the code itself should also be subject to change when the team decides to do some refactoring, and this can be MUCH faster and robust only if there is the DevOps infrastructure ready for it, providing automated tests, continuous integration, and deployment, etc. Otherwise refactoring work can be a big pain, and very costly due to many bugs discovered in later stages.
If we think about the cultural aspects of DevOps, we see a lot of transition from agile principles. We wont delve deeper into those, as we believe agile and DevOps complement each other from a cultural perspective anyway, and our focus for this article is more on DevOps technicalities. We will focus on relevant cultural elements in the rest of the article, only from an agile perspective.
DevOps is much more commonly used for cloud-based software nowadays, but DevOps was there before there was cloud. DevOps principles can be applied to any kind of software development project.
Workplace Hub (WPH) is also one of those projects that deliver software to the endpoints, not to the cloud. Although the project has some cloud elements, the majority of the software being developed runs on-premise. Actually, from a DevOps perspective, this makes no big difference. DevOps is about automation and enabling fast delivery. As long as the team can deliver fast releases, we can say that DevOps is successfully utilized.
Lets try to explain what we mean. Going back to some of the most common DevOps best practices weve been doing:
Each of these items will be explained in detail in the "How to use DevOps to enable faster delivery cycles" section. But here, we want to show that all these practices are applicable for on-premise software too. The key difference lies in the delivery method. For cloud software, the distinction between the release and the deployment is usually blurred. However, in our case, the endpoints are actually at client sites, so what the development team is responsible for is delivering the release to the deployment team which plans the deployment, notifying the customers when necessary. Some of the updates might cause downtime, due to their nature (i.e. when the firmware for the server is updated, there has to be a restart). This requires careful planning on the deployment teams part. Following this, the development team has to deliver the releases faster, so that the deployment teams have enough time to plan for the deployment.
The development team has to ensure the release is stable, can be readily deployed without issues, doesnt have known security issues, and so on. Solving a problem at a customer site is always more difficult than cloud scenarios, where the delivery team is in charge of the infrastructure anyway. In cloud scenarios, when there is a problem with the deployment it is much easier to rollback, or diagnose and troubleshoot the problem. The cloud environments provide necessary tools, and they can be used comparatively easily, especially by experts. In most cases, the development team is in charge of deployment too, and even if they arent they can work together with the deployment team much easier because it doesnt require any planning to be done with customers. In cloud scenarios, there is high-availability, blue-green deployments, or similar scenarios that can be used to avoid downtime.
But in our scenario, the deployment is being done to the WPHs at the customer site, and the network infrastructure or the other servers existing in the network arent under our control. This results in carefully planning, deploying, and monitoring the upgrade/deployment of the new release to the endpoints. Solving any problems that can occur at the customer site is costly and time-consuming - usually not as easy as solving problems in a cloud environment. This is why DevOps becomes even more critical to ensure that the release is stable, secure, tested, and delivered faster.
A sample schema of how DevOps can work in a cloud environment vs an on-premise delivery environment can be seen below.
Following through on the definition of a release train, each train should be able to interact with its stakeholders and deliver more or less independently from the other trains.
Deciding how to construct a release train isnt as easy as it sounds. Some key questions to keep in mind are:
With these and other similar considerations in mind, the organizations try to find and create the optimal release trains, which can act fairly independently from one another (from a technical as well as a customer point of view). I have to emphasize here that being completely independent is impossible for the majority of the projects. The goal is to be as independent as possible. Otherwise, this might turn into a game of a cat chasing its tail.
The goal of establishing release trains is ensuring there is consistent and fast delivery of customer expectations. Each customer (group) can work with different release trains to ensure they get what they want.
Another thing to keep in mind is that software projects are live systems. This means that the software will evolve with the advance of technology and new customer requirements. Changes to technology and requirements will mean that the release trains, as well as the teams, need to adapt to new situations and reorganize to cover the new status.
With this short introduction, lets take a look at our example, and how we reshaped our release trains. First of all, let me emphasize that we have 2 different sets of customers in this example. The first is of course our end users, the clients. WPH is delivered to the customer site with the software on it and updated and supported remotely, or when necessary on-site. The second customer group is our support teams, who are the users of the support functionalities we deliver on our data centers and public cloud environments. We initially created 2 release trains which had different deployment environments and different customers: Platform and Support trains. The Platform train is responsible for implementing and delivering core functionalities we expect from the on-premise WPH, whereas the Support train covers support team requirements. Due to different deployment environments and different customers, these 2 trains have different deployment methodologies. Even though we do deployments with different frequencies for these 2 trains, we use one single Program Increment (PI, as defined by SAFe) event to cover the planning for all of the teams. As mentioned before, the teams are NOT completely independent from one another and the planning has to be done together.
Establishing the release trains, or (re)forming teams isnt an easy task. The goal is clear, enabling the faster and better delivery cycles (which in its turn will bring faster customer benefit/feedback), but it is also about human beings, who need to be informed/reminded about the goals of the reorganization and listened to for their suggestions and thoughts. Some team members might be used to working with each other and sometimes can show resistance to joining other teams and working with different individuals. There is no silver bullet for this situation, only clear, transparent, and bi-directional communication can make things move. After all, these individuals are still part of the same broader team, and everyone is working for the single purpose of delivering a solution that works for the customer. The members and the trains will still have to work together. Many agile principles imply self-organizing teams. We havent been successful with this principle so far. We think this has to be a cultural principle as well. Some companies are more successful with this approach because they start teaching and reminding this principle of reorganizing from hiring on. They continue to encourage their employees to reorganize or come up with their ideas to reorganize. The goal is always two-fold: to be able to deliver a better solution for the customers; and making developers lives easier because of having clearer targets and fewer dependencies. All the team members need to keep in mind at all times that the broader team has to be in the best shape to deliver the best solution to the customer and be willing to reorganize when necessary. Suggestions should come from the team members, who can see situations clearly looking at the backlog, dependencies, etc. If the company culture isnt designed as such, it is all too easy to fall into the trap of "staying in the comfort zone" and implementing no changes to the team structure, thus delivering sub-optimally.
In our situation, we had cases where team members suggested creating new trains themselves (like the applications train, which will be mentioned in the next paragraph), and other cases where teams continued to deliver sub-optimally due to the lack of a clear backlog or lacking enough capacity to deliver. In some of these cases, some team members saw the situation and spoke up, but not all were willing to change. It was up to the management to take action to reorganize some teams to speed up the delivery and to have teams with more capacity split-up, etc. What we did was make the intention clear in each case (like we need more team members for team A or team B isnt able to deliver a clear customer benefit so must be disbanded, etc.). Once the intention was clear, the team members came up with suggestions of which teams they could move or what kind of split they could apply to a growing team, from a technical standpoint. These helped us reshape the teams in a more meaningful way with much less frustration to the team members, (although there were emotions, which is normal of course).
Now, we are on our way to creating a 3rd release train for applications that can run on different environments, be it WPH, cloud, or our data centers. This train will be responsible for delivering applications that run on or connect to WPH and can be used by our end users.
Here is a schema that shows how our release trains are shaping up. Team names and the number of teams arent shared but the concept should be quite clear.
Having different release trains enable us to package different solutions in their own separate way and deliver them separately. This allows us to deliver faster in general because each train can deliver separately and doesnt have to wait for another. To make this work, the infrastructure of the release mechanism has to be addressed and designed so that different trains can deliver without causing each other significant disruptions, ideally no disruptions at all. We are avoiding using absolute terms (like saying absolutely no disruptions) because especially with software running on-premise there can be some challenges quite difficult to overcome, like version dependencies between software running on-premise and compatibility with the version running on the data center. The goal here is to minimize the disruption and to design the system as close to the ideal state as possible.
In this section, well highlight what we have done to enable faster delivery cycles from a DevOps perspective. It should be noted that the topics mentioned in all other sections of this article complement this picture. Without establishing proper release trains, or without organizing QA to contribute to faster delivery with quality this couldnt have been possible.
We have to underline that the biggest problem we had with our releases was the huge scope. Huge scope means a very long staging time, and many bugs are found and resolved, which all adds to the release time. We are now changing our release schema to deliver smaller scope, which requires very little extra testing, and fewer bugs due to the size of the scope. To enable this, we have been changing the architecture of our software and utilizing container updating capabilities, which is more industry-standard and causes little to no-downtime when upgrading.
Our teams have used various DevOps tools. However, governance and consolidation were missing. Lack of governance resulted in the teams using the tools as they "see fit." There is nothing necessarily wrong with this if there arent too many teams and there are no big integration challenges. However, in our case and in many other cases, the teams output needed to be integrated to deliver a complete product, which means that each piece should fit into the puzzle properly. Without having some governance around DevOps tools this was proving to be impossible for us.
We decided to standardize our processes. Without going into tool details, well explain our guiding principles per DevOps practice.
We were using many different programming languages and different technologies which were making the pipeline standardization difficult. To avoid issues with build configuration errors and manually setup build processes, we started to treat build definitions the same way as production code. Having pipelines declared in code and versioned allowed us to make adjustments on scale (across more than 100 pipelines) safely. Adding a new step to all of our builds, like a vulnerability check, is a matter of opening a merge request and completing a review process.
The WPH development environment is quite diverse. We have different programming languages being used, which was one of the challenges in creating standardization in the first place. We have created a set of rules for each programming language and encouraged the teams to use this toolset (with additional rules from their part if they so desire), to check the code quality.
Static code analysis has a 2-fold benefit for us. First, as mentioned above, is following the coding rules which results in a successful integration. Second is making the code reviews and handovers easier. It is a very common scenario that teams change their domains, or people change their teams. In this kind of scenario, the recipient of the code is now more comfortable with the code received because it follows the defined set of rules.
We integrated a tool into our pipeline to check which open-source libraries (OSS) we were using and what their license types were. This tool is used to list the libraries in our release notes as well as take care of license-specific issues.
Depending on the license type, there are specific actions the team had to take. For instance, using an LGPL licensed library might mean the company has to expose its code too. By using this integrated tool, we have more visibility into our OSS landscape and cooperate with the legal teams about what we need to do for different cases.
We have been running penetration tests for our releases to check for any security issues we might have and address them before the release. This is a waterfall approach and has been costing us time. This is why we have been trying to shift-left this approach and find security flaws as early as possible in our development lifecycle.
There are also risks stemming from the dependencies from things like libraries or other components being used. The libraries can bring their own risks with them.
For this reason, we have integrated a tool into our pipeline, which runs a vulnerability check and lists its findings. Using these results, we can address some critical gaps much faster, leading to faster delivery.
Single Component View (Developer names are hidden)
Report View (Component names are hidden)
Last, but not least, we would like to emphasize how critical it is to monitor the whole delivery pipeline. Using some custom tools we have developed and constantly improving, we are now looking at the status of each release train and can see if any red flags need to be addressed by teams. Of course, each team is looking at their page and taking necessary actions on their part, but for governance (standardizing the approach) as well as being able to monitor the general health of the delivery by all stakeholders, we have an overview monitoring page that shows the general status of the delivery.
The Release Portal
We used the SAFe definition of Release on Demand as another guiding principle for our delivery pipelines and have the goal of delivering in increments to customers.
This is not a new concept in agile, but just a rephrasing of it. As weve mentioned before, one of the pillars of agile is releasing faster to get customer feedback faster, learn from the feedback, and constantly evolve the software. Long waiting times and huge scope arent desirable in this sense because they make getting direct feedback very complicated.
SAFe, and any other agile methodology, aims to be able to release whenever a feature is ready to be released and get feedback as soon as possible. To realize this, we have been shaping our release trains accordingly and preparing our release process and technical architecture to support releasing in small increments. Instead of releasing a fixed scope (which is generally big and suspect to scope creep), we are now moving towards releasing on demand.
Here, lets define what can be released at each increment. There is no static rule for this. We can only say that the Product Owner(s) (or the customers, if you happen to have them as part of your team directly) are in charge of defining what should be released because they are the ones who know the effect of releasing a finished implementation. This completed piece can be some completely new feature, an update to an already existing feature, maybe a removal of a feature, some bug resolution, or even some refactoring or security update to the code that isnt visible to the end-user at all, but would be critical from a technical standpoint. The Product Owner is in charge of judging when something is worth releasing, and potentially getting some feedback from this.
Other reasons for releasing faster have been already covered in this article in the above sections. Heres the SAFe perspective to DevOps and Release on Demand.
Agile simply doesnt work without DevOps. Whatever kind of software you might be producing, try to take the DevOps principles to heart and apply them as they fit your deliverables. But even this isnt enough, as it implies that you use whatever you can use. Lets be more specific here. The agile teams need to go out of their way to change their organizational structures, deliverables, and customer interactions to follow DevOps principles as much as possible. This will result in faster, more secure delivery cycles, which will enable the team to get faster feedback from the customer which will be fed back into the next cycle(s) improvements.
Burak Ilter worked as the Head of Engineering at Konica Minolta. He is an IT professional with a long and diverse career. Hes worked for major companies in different roles, ranging from software engineering to system engineering, and from architecture to engineering management. He has practical experience with different programming languages, methodologies, and business domains, including the public sector, defense sector, finance, healthcare, and productization. He is married with two children. He enjoys reading science-fiction, history, and is interested in cycling and running. He is also an avid Japanese anime fan.
Follow this link:
How to Make DevOps Work with SAFe and On-Premise Software - InfoQ.com