Challenges. Before we are going to explore why this talk might be High Explosive and Dangerous, as the Head Line of course shows, I would like to says thank you to Alexander who will share the stage with me today. He is a developer so we are not only talking DevOps, whatever it means, we are really working this way. about Some forewords to the talk: Last year, as I was on this stage, I talked about “Docker: Ops unleashed”, where I showed you why new technology maybe an enabler to be more creative and how you can get back slack time which is needed to be creative. Today our talk will not cover technologies like Docker or Kubernetes, or Service Meshes, or Proxies, Routing, DNS, whatever, although there will be some side links and if we have enough time, we can show some details. If not, we will be backstage after the talk and if you are interested, please talk to us. Our talk will cover 4 parts: Part 1 called Start-Up, will show you why we started our cultural change. Part 2 The Way will show you how we managed it to come up and running with our new way, from bottom up. Afterwards, Part 3 will show you how we are working every day, therefore Part 3 is called The Life. And as last Part 4, called Future challenges, we will show you, from our point of view, what future challenges we might see soon. We’ve picked our top ten challenges.
Team Leader Cloud Solution Architects / CI Team Leader Software Development Web Department @m4r10k @ao2_io m4r10k.gitlab.io ao2_io.gitlab.io Mario: My name is Mario Kleinsasser. I am the Team Leader of a small team with 4 people including myself. We are the cloud solution architects for hybrid-cloud setups and therefore, together with our colleagues from the development department, we are also providing the infrastructure for our on-premises data center. This means, we are running, extending and fixing a full stack Docker environment currently consisting of more than 150 Stacks with nearly 400 services and around 1800 containers. Of course, all the way from Source Code to Deployment/Delivery is fully automated, if the developer is in the need of. Alex: My name ist Alexander Ortner. I am the Team Leader of a team with 7 people including myself. We are software engineers for the STRABAG. We build special software for operational and commercial units. We have a highly differsity of different applications. Due to the new flexibility of our hybrid cloud setup and on premises docker clusters. We can use newest technologies to fulfill the needs of our customers, although we have to maintain our old treasures. Without a stable automation process this would not be possible.
A picture says more than thousand words and the mental model of this picture was the source of may problems and to be honest, it still is one of the problems. But in the past, we worked a lot like this. Imagine the following: John is an operator who does the best he can. The voice that John hears in the first picture comes from our colleagues from the development department and the voice tells him to do something. Because of the fact that John doesn’t told his colleagues from the other department that he’s colorblind, they don’t know about this fact. So both have the problem, that they just thinks the other knows everything about the situation. That’s not true and even if you are working closely together, like we and our departments do, it’s are rarely case if we fully understand each other. We’ve worked totally blindfolded. Basically it’s a communication problem, and communication is key. Tools, and culture, like CI/CD automation can support you. There is a quote from Mr Konrad Lorenz which I roughly translated into english and it goes like this:
that summarize our collaboration in the past. - There were new requirements for project like e.g. fast full text search - We had to think about new technologies to fulfill the requirements - Our strict development and deploy corset did not allow us to use new technologies like for example ealsticsearch - The Operations said nice idea but sorry … NO, we do not have the man/woman power to operate or maintain such systems - They were frustrated because they have to maintain so many virtual servers, we were frustrated because we can’t try something new - It's a little exaggerated, but all in all, it was the truth. - In the past we had also a very high manual effort to install new software, we wasted lifetime. - We will talk about the way out of this plight
tell you something about the way we’ve gone to reach the point where we are today. First, like said before. We had to re-think the way we would like to work in the future. The goals where a) save time b) gain flexibility. To achieve this goals we had to re-think our infrastructure, thats a technical thing. But we also had to tear down the metaphorical wall between us, the walls in our brains. As told at the beginning. A lot of problems are the result of communication problems - and it’s definitely not enough to just relabel a department with “DevOps”.
#3 Tearing down the great wall. There are a few ways to tear down the great wall between Developer and Operations. In our case, and we think that this can be true for a lot of teams, it’s much easier to tear down the wall, when you have a project, where both Devs and Ops are forced to change the existing system. There are several reasons for this change to happen depending on your environment - Internal reasons - Maybe the need for a new software stack, like npm, golang - A change to the daemon software you are using, for example a shift from Apache Tomcat, to VertX or whatever - You need a supporting software for your product like Elasticsearch for for fast search for example - External reasons: They are obviously harder to achieve for example - Reduce time to market - Reduce (not remove) the dependency to the operations (tool flexibility, time schedule for deployments), this includes faster fixing and so on - Automatization to roll-out/deploy testable software for the customers - Repeatable, documented, auditable steps - People - You REALLY need them! - Therefore you have to invest into the game at first
project injected the change and how? - A new operational project come up with new requirements like - Fast full text search, Fast Live Reports of commercial data - Agile Development - more deployments in shorter iterations - We did not have an automated development process We did not have the infrastructure to run available technologies, e.g. Elastic Search - First step was to automate our development process - The challenge: automate our process from building a war package to install it on an OpenVz Testserver - In the beginnings we used Jenkins but we quickly changed to GitLab CI/CD, because we liked the idea to have everything in one spot. the code and the CI/CD process. In short: Compile the Code, make a war-package, open an ssh channel, upload the file, restart the server. The first automation process to deliver our package to a test server was born - Second step was to provide a flexible infrastructure - The challenge: - Run current software stack on new infrastructure with minimal effort - Introduce a self hosted Docker swarm cluster, Kubernetes to complex at this stage - Run independent docker swarm stacks (> 150 stacks in > 1800 docker instances) in a docker node cluster, expose different services to the outside - Load balancer problem (out of the scope of this talk) - Full automation from commit to deployment - During the iterations new technologies where evolved - (Bosnd, small binaries to glue the automation process), Each iteration made the usage easier more automated and the system more stable - No Microservices at this time - After the system were stable and the flexibility was available. New development concepts like e.g. microservice architecture could be introduced. - The key was the short iterations from Development and Operation - the credo was fail fast: - Problem: Cache Cluster -> Multicast -> The Cluster broke up itself due to default load balancing in overlay network….., Instances with no restrictions broke up cluster …. , Health check disaster ;-) - The mistakes and setbacks and the resulting solution finding made the system more stable and welded the team together.
Multi pipeline - Monthly Iterations to talk about the whole GitOps Environment and improvements - keep on rollin - We make internal workshops for new technologies and culture changes, to counteract fears of contact - Improvement of our CI/CD process - Introduction of a central CI/CD process with individual adaptation options - We have a centralized CI/CD process file which includes the whole automated process - Increased employee motivation due to automation of dummy manual processes - We have automated our initial setup of a new software projects. - Increased employee motivation due more flexibility (developer can build his/her own world to try out something new) - Continuous exchange to standardize new ideas that others can benefit from it. - There are so many tools available - we have to standardize things, but everyone can contribute with his/her ideas
last year, we have faced a massive change in the way we work due to the usage of cloud technologies. It basically doesn’t matter which cloud you use but if you have to use on-premises services in combination with cloud infrastructure or the other way around, you will find yourself in a hybrid-cloud environment. And that’s not easy to handle. Like in our example, people, colleagues have just started to commit to the Container environment. Therefore we have all kind of projects, some are still legacy, a greater number is somewhere between just deploying via the CI/CD pipelines and full stack, from source to deployment pipelines and some project are somewhere around this scenarios. The cloud challenge for us is to decide, how we can use our GitOps way in a hybrid-cloud environment in a way that we can make use of multiple clouds without a cloud-vendor-lockin. There are three questions inside this challenge: - How can benefit from clouds? - Is it wise to move to the cloud, or am I just afraid of it? - How we can GitOps the hybrid-cloud to gain the next level? This is a continuous change challenge!
of my favourites. It’s called failure culture. In the past, people were personally punished if they made a mistake. Why? Because in our culture we are trained to focus on the failures and only on the failures or mistakes. Nothing is else counts! That’s why it’s so hard to regain creativity because being creative often means to make mistakes. Failure culture is about seeing the positive and honoring the positive more the the failures. Here is an example: - Famous german poem - First two paragraphs of it - 58 words in sum, 4 mistakes or failures - Marked with 5 which is equal to the 6 in German - The failure quote is roughly 7% - Or better spoken, 93% of the words are fully correct! So fail fast and fail often is key to be successful in changing your work culture.
in Challenge #6 Failures are a must have. But they are only useful, if we investigate the why it was a mistake. Therefore we committed ourselves to start writing post mortems as often as we can. To be honest, we should write much more post mortems as we currently do, but hey, nobody is perfect.The strategy of the five whys.So why we write post mortems? First why: To get a documentation about what went wrong - OK. Second why: To find the root cause of the problem - OK. Third why: To learn from it - OK. Fourth why: So that other colleagues can learn from it - OK Fifth why: Why we really want to use post mortems - To also use the potential of failures!!! They have a value, they are assets! Mistakes are positive. Only people who dare to make mistakes, will build something new.
Lorean_Time_Machine-OtoGodfrey.com-JMortonPhoto.com-07.jpg - Gain more people for the cultural change - Care for the culture, That should not be underestimated - We still have a technological catch-up in some parts of our software stack - To find an optimal way out of the tool jungle, and quickly take another path if we were wrong. - Do NOT rebuild the wall just by using other words for wall!
of the future challenges are shown on this slide. The first one, the left one shows a Tweet from Uwe Friedrichsen and this Tweet sum up the situation about the problem with DevOps very well. The word DevOps is used by everything nowadays. Maybe of course DevOps Magic…. As the Tweets says, just naming something DevOps or putting together a team, with the same people as before, and labeling this team DevOps, will not change anything. If the culture of work is not changed, if the mental barriers are not pulled down, if the failure culture is not changed - the only thing someone will achieve is, that the word DevOps is messed up.The second Tweet brings up one of the greatest problems of all. “We replayed one wall of confusion with another”. That’s definitely true. We talked with a lot of colleagues and one of them told me the following. For example: In the past one problem was, that you have to decide between Oracle database or Mircosoft SQL server. Most of the time, the database which had the best support of the software vendor was taken. Today, a lot of products are reaching the level, where they support one of the cloud vendors best. After some time you have multiple clouds to manage and this situation is much more difficult to handle as in the past because people have to have much more knowledge of what they have to do to get things connected and up and running. Furthermore, today for a lot of areas, there are multiple products as you can see on the CNCF landscape for example. That’s principal good thing but to know which software project fits best to your problem, you need extreme more knowledge.
14 Therefore, you have to be patient. This slide shows a famous painting about Galileo Galilei as he defends his thesis about the movement of the earth before the Holy Office. It’s a great metaphor for don’t giving up, if you are facing the hard way. There are some important thoughts, if you go the Culture Change Way: - Do not be scared, if it does not work the first time! - Do not be scared, if something goes really wrong! - You need to have a lot of patience!
That’s the last slide and it’s a difficult one because some people you meet will not like to change the way they work because they work the same way over years. Therefore sometimes you have to force people to leave their comfort zone. Once the people have tried the new system and the comfort it offers they would not miss the old life. The key in our company was the new culture change packed up with automation of recurrent dummy manual processes. Mario The technology is moving forward really fast. The longer you need to accept this change the more distance will be between you and the actual problems. It will be much harder to catch up the technology train and there’s now short way.