In other fields, safety science has had a remarkable impact on systems reliability -- aviation, medicine, engineering; all of these fields understand the importance of designing human-machine interfaces to reduce error.
Many of the lessons from several decades old safety science apply well to computer operations as well. As systems grow we find the overheads of maintenance grow as well, straining our automation to the point of failure or at least requiring humans to supervise that automation, which increases to the point of being a greater cost than the original work!
In this talk I'll present some findings from the 1990s and discuss their applicability to systems administration and site reliability.
Jamie is a Site Reliability Engineer at Google in Sydney, leading a team who runs one of Google's oldest planet-scale eventually-consistent replicated key value stores. He's always been interested in monitoring since before he started at Google many years ago, and wants to share everything he has learned about making monitoring systems useful for people and business.
Geelong is Victoria's second largest city, located on Corio Bay, and within a short drive from popular beach-front communities on the Bellarine Peninsula as well as being the gateway to the famous Great Ocean Road
linux.conf.au is widely regarded by delegates as one of the best community run Linux conferences worldwide and is the largest Linux and Open Source Software conference in the Asia-Pacific.
Our Sponsors help make linux.conf.au become the awesome conference everyone comes back to year after year. Come see who's on board this year, or find out how to get in contact with us