Devops originally started as a way of enabling both developers and operations engineers to work better together, but over the years since its inception it has become much more than that. The principles of devops can be beneficial to everyone involved in creating software solutions, not just dev and ops teams. This talk will discuss practical ideas for how the pillars of effective devops can be used throughout entire organizations, the benefits of sharing operational effectiveness beyond the ops team, and how to use devops principles to develop and maintain a functional culture within your organizations.
D&D is a brief concept discussed in SRE book. A number of engineering teams use this technique to prepare team members for going on-call. The idea is to re-investigate a recent on-call incident, with the team. The team tells the dungeon master what they would do or query to understand or solve the problem, and the dungeon master tells the team what happens with each action or observation.
In this highly interactive session, I will be acting as Dungeon master as we role-play some real life issues. We will have a few scenarios in which something is not working properly and volunteers from audience will go through a series of questions/steps to isolate those problems. We will see how the D&D exercise can provide more context of the infrastructure to the volunteers. These issues will be fairly common but the process of going through debugging these issues as a team is fun and a great learning exercise. The key takeaway for audience is how to hold the similar session for their team and gain more confidence on going on-call.
Growing up, I wore a lot of hand-me-downs. I thought I looked great–all three of my older siblings were cool. It didn’t matter how many times I tripped on my brother’s torn jnco jeans or how long my sister laughed when I got my head stuck in the armhole of her t-shirts. I wanted what they had because they were so influential in my life–even if I wasn’t comfortable and nothing ever fit.
What happens then when we apply DevOps strategies innovated at companies like Amazon and Starbucks to our mid-size applications? Do they fit?
In this talk, we’ll discuss how to approach DevOps at an average company–how much should we automate? If we can’t afford to stress test everything, how do we choose which pieces to test? Should our devs work on development tools to expedite their process in the future or spend time on the features we need to ship? When do we turn to DevOps tools like Chef and Puppet? How can we avoid getting our heads stuck in the armhole of a shirt that doesn’t fit? We’ll investigate how to answer these questions, and how to make the most out of others’ success while we learn how to be happy being average.
Many DevOps devotees have heard about security and maybe even read a few blog posts about DevSecOps. But you get busy building code, eliminating WIP, and eliminating tech debt. That doesn’t leave a lot of time to make sure the code you are actually deploying is secure. But the power of DevOps is that you can build security in once, and it’s there. And even better, you don’t have to be a security expert to be able to start deploying more secure code.
Yet, where do you start? Especially if you consider yourself a security novice. Don’t worry, we all start as novices, but it’s usually pretty helpful to get a head start by learning what’s important and having some samples to model.
In this workshop, you will learn what attackers look for first, and the best practices to make your technology stack much more secure quickly. Then you’ll get code to figure out if you are at risk and as well as scripts to make sure the issues are quickly fixed, and don’t happen again.
Why reinvent the wheel when you can get a quick and dirty tutorial on making your DevOps environment more secure?
As an operator of a system, you will often find yourself with data in CSV or unstructured log files that contain the answer to questions you have. Which IP address is hitting my server the most? How many requests came during this time frame? What kind of queries are getting written to the slow log? You want the answers to those questions as fast as possible and either you don’t have log aggregation set up or inevitably there will some data that isn’t flowing into it yet. Your data is probably on a remote server that is difficult to export from or you might even have a few gigs of data on your local machine.
Command line tools can save the day! I’ll present a number of methods on how to do data analysis using standard command line tools, like sort, cut, uniq, and more! These tools are great because they: * most likely to already be installed where you need them * very fast and can handle GBs of data * have a lot of built in options to do a lot of work for you
Command line utilities can be intimidating! But they aren’t going anywhere and the best time to learn them is now. I’ve found practical utilities and ways to iterate quickly on the command line that allowed me to expand my skills, forge my data into shape, get the answers that I need and I want to want to show you what I’ve picked up.
In this presentation we’ll learn what are the most important metrics we should be measuring in our systems (upper and lower bounds, SLAs/SLOs), what is the purpose of having dashboards, how different consumers will need different dashboards and why dashboards are for gathering more information about outages and not to figure out there is one outage happening, and, sadly, alerting. What to think about before including a new alert (can we automate the response? is it really actionable? do we have expectations for when it will trigger) and avoiding alerting burnout. The main goal is to help teams and managers to make sense of their data by collecting meaningful information, showing it in a way that is useful for all parties involved and not drowning teams on noise.
Richard Dawkins described memes as being a form of cultural propagation, which is a way for people to transmit social memories and cultural ideas to each other. Not unlike the way that DNA and life will spread from location to location, a meme idea will also travel from mind to mind.
Getting your organization to take a step back and look at how ops affects people (awareness of alert fatigue, burnout risk, proactive/reactive approaches) can be a tough challenge.
In this talk, I will discuss how the very DNA of an organization can evolve through the use of actionable communications from all levels - management, strategy, and practitioners. The “virus” of humane ops will infect your organization, providing a more sustainable approach to on-call, incident resolution, post-mortems, and more. There also will be copious references to the Neal Stephenson classic novel, Snow Crash.
After this talk, you will have ideas of practical approaches to effect change in your organization, regardless of your level of influence. While not every group will use the same “viruses”, you will take away a good understanding of where to get started as Patient Zero.
It’s the middle of the night and you’re awakened by a phone call. What could it be? You’re not on-call this week. You glance at the caller-ID and see that it’s a teammate who has been struggling at work. Maybe it’s a serious outage and they’re escalating it to you? You answer the call to discover your teammate is having a mental health crisis. What do you do?
Late one night in 2016, this happened to me. It left me terrified and motivated me to get certified in mental health first aid.
With the constant demand to build systems rapidly with more resiliency and less budget, compounded with on-call support duties and a seemingly endless stream of alerts, it’s no wonder that burnout has become a serious problem within the tech industry.
In this session, I’ll share ways that you can identify burnout and action steps to prevent yourself from burning out. But more importantly, because we are a DevOps community, I’ll share a framework from mental health first aid that will help you identify, help and support your teammates and friends experiencing burnout and mental health issues.
Serverless has become a much discussed topic recently, and for good reason! But many serverless offerings have strict limitations on package size, processing power, installed packages, and general freedom, which limits what you can do with them. But what about if you want to write a web scraper using Headless Chrome? Well, it’s probably a bad idea, but it’s possible! We’ll discuss how to compile binaries that work around Lambda restrictions, tools you can use to explore features of a serverless environment, and show various ways to take advantage of resource sharing between different invocations of the function.
DevOps trends are clear on measuring systems Mean Time To Recovery rather than Mean Time Between Failures. I argue that worrying about time between failures actually causes more harm than worrying about recovery. But do we think of our human systems the same way as our digital? I’ll apply lessons learned in SysOps to HumanOps.
I’ll talk about how our complex social systems act like complex computer systems and how focusing on MTTR rather than MTBF is a good thing between people, not just machines. I’ll cover the environmental requirements for focusing on MTTR and discuss potential conflict resolution steps for a jumping off point in your organization or community.