Notes from DevOps Days AU/NZ

Posted by Katie McLaughlin on October 27, 2019

The following are my raw notes from DevOpsDays AU and NZ 2019.

Notes from open spaces are non-attributed views of the group of participants.

DevOpsDays Sydney

October 10 - 11, 2019

Hilton, Sydney

Open Spaces

Learning from incidents

TLDR: It’s not just writing it down.

beyond retro

incentivising

PIR/post mortem != learning from incidents

“learnings register”

Needs to turn into an action (this is not fun) (easy to say the words than to do the work)

knowledge publishing != sharing (PIR like your taxes is busywork)

Testing in Prod

TLDR: “everyone is doing it, some admit to it.”

synthetic testing - writing, not just reading

  • if you do this, let your pen-testers know

only test what you can fix

you shouldn’t go to your dentist for a colonoscopy

staging vs prod isn’t what it used to be

Docs for DevOps

Dispelling the myth

Systems evolve - disabling edits is based on trauma

complexity of docs (use cases)

grep code for apache/java

docs in code

books - docs like code (Ann Gentle

Confluence isn’t bad, it’s just has a bad search engine (try elastic/algolia)

Make it a habit to throw things away

deprecate then delete

tribal knowledge slack -> docs but also BOFH who doesn’t want to share(in a way you can adopt)

mike blank - slack/github interface via emoji

create templates (new project, new server) (KM: Oh hey like anchor did!)

proper tickers - templates! an enforce better defaults - don’t reward quantity of reports

everyone who sees an error is already pissed off

reasoning in commit messages (why, not what)

crispin.io - how to write a good commit message

errors/pages should link directly to the doc (not require searching/extra steps)

docs should be as helpful as possible

old digital ocean - create image/instance, self-contained (API command, curl)

elastic - what not to do

tools : vscode, markdown w/linting, asciidoc, RST, Sphinx, github/lab wiki, notion.so

less than 20 people: why not grep a text file?

(from devopsdays sydney 2019)

Pet Care

TLDR: Backups and test them!

It’s not about the backup, it’s about the restore.

Automate care and attention

multiple DNS/AZ failover: state is persistence vs instance is persistent

postgres followers

elasticsearch backup points (30d, eg)

Restore times! (not useful having a full day backup that takes more than a day to restore

backup production, restore to staging for daily testing✝

✝ BIG caveat here

backup rotations (days, weeks), plus x number of images

SaaS:

  • don’t trust
  • automate the recreation
  • take a copy of what you said, don’t capture the state capture the history
  • providers that allow an export A++

backup github repo + offsite

calendar reminders for things

offsite != off your website

every change updates the PR plan

Black Swans.

Smashing the Glass Ceiling

Constant bonus vs Multiplier bonus

Peter Principle

You can’t be what you can’t see

“Network of ‘no’” - there’s always folk who will be adverse to change

Management identifying those who want to go up.

Tech in general has a glass ceiling - it’s men too having issues (no progression except into management)

DevOpsDays NZ

October 21 - 22 2019

Aotea Centre, Auckland

Talk notes

Going back to basics

Brooke Treadgold, ANZ

Organisations processes aren’t devops-compatible

buzzwords are an exclusive club

  • half the time, people don’t now what you’re talking about (DORA plug)

What is “process”?

  • steps to achieve outcome
  • … that evolves over time, has a life of it’s own

⃤ People, Priority, Funding - the trifecta

Risk management, empathy, influence

What is a hecking “pipeline”?

  • fundamental to us
  • and yet, it’s just another buzzword.

⭐️ CI/CD pipeline replaced one doc

  • for non-technical folk, what’s the difference?
  • worked with change management team, with empathy

“perceived no”

  • security will say…
  • change management will say…

Change reports are just that, reports. Records of change. Why not automate that?

Dockerised local build

Charles Korn, Thoughtworks.

TLDR: https://github.com/charleskorn/batect

You can’t buy devops

Julie Gunderson, Pagerduty

Come to DevOpsDays Boise!

“I’ve moved to the cloud, I’m devops!” “DevOps Is a checklist!” – NOPE

Devops is people over processes

Westrum Model - three cultures: pathological, bureaucratic, generative (see Accelerate)

Psychological safety

  • empowering people to make decisions without being yelled at

Honesty. Accountability. Trust

The culture code - Daniel Coyle

Change management doesn’t stop failure, it just takes longer to get there.

Configuration management! Woo!

DevOps periodic table - don’t buy all of them

You don’t need to bring the whole system down to test your hypothesis is correct

Infect your org with devops.

The myth of the senior engineer

Cath Jones, Buildkite

TLDR: ASSUMPTIONS

Securing the systems of the future

Laura Bell, SafeStack

fear is profitable

“we caught cyber”

people are jerks

Historical situation > attack

electronic sticks and stones are lying to you

flight, flight, or freeze

hacking back is a TERRIBLE IDEA

“Year, I can take a bear”

Seven threes

This is a monolith. It’s pretty, it’s got pink in it. The monolith is why you have a job.

It wasn’t all about the castle’s wall

SWORDS! Sottish stairs built different because more swords-people were left handed

We have to practice our response

AVOID SCAR TISSUE.

Ignite Talks

Don’t reinvent the wheel

Josh King @ Windosnz

toast script, windows 8 top right, windows 10 bottom left

New PowerShell has classes! Let’s re-write!

UWP community toolkit

  • instead of creating new modules, see what exists

Instead of web scraping, ask for an API

use shared tools? give back with docs, etc

(But on your own time, reinvent wheels all you like :)

Use existing tools, contribute back!

Implicit Trust

Srdan Dukic, srkiNZ

linux sysadmin, LAMP stack

then ansible, then… get replaced by automation? are we paid to follow instructions, or achieve results?

No-one was born knowing this.

GitOps Buzzwords

Everett Toews

TLDR: Yes

FizzOps (lol)

PR, merge, action > gitops

GitOps == doing deployments

Git as a single point of failure

role-based access control.

Do you have a data quality problem?

Steven Ensslen

People aren’t measuring data quality

data warehouses are expensive

three easy steps:

  • doc data characteristics and train
  • monitor data as infrastructure
  • professionalise your support for data professionals

DataOps exists and has literature to explore!

Why bare metal still matters

Joel Wirāmu Pauling

All the buzz needs hardware (microcontrollers, AI/ML, hardware)

(bare/bear metal)

virtualisation is great, but we abstracted away the real

the cloud clique has made us complacent

cloud diversity and lock in

but also: climate change, data privacy/sovereignty, lock in

You need to consider bare metal, both technically and environmentally.

Kubernetes Operators and why do I care

Mandi Buswell

App store - “click and go” feeling

  • that’s why you should care

Sharded databases

  • so much work
  • so why not use a robot?
  • install -> autopilot the pods

operatorhub.io (requires k8s v1.7+)

Open Spaces

Devops in the database

Tools: evolve.net, flywayDB (multiDB), t-sqlt.org (sql server), pgtest

Consider: quality assurance

Do not used stored procedures in greenfields

App vs DB mismatching - softly handle

  • don’t depend on a situation where you bundle db and app change in the same deploy
  • roll back code is easier than rolling back the change
  • zero downtime is just a journey

Update per section of DB (table, index, etc)

Creating an index? that’s a table lock

blue green? one person in the room (who uses secondary data source)

Women in Tech/STEM

HR applications - not many women

positions posted out of grabs

(Russia: traditional society. “These jobs are for men, sorry about that”, but women in leadership are supported

Stems from the family values

  • “Girls can do anything”, but in reality… cycles Visible role models are elements of survivor bias.

Don’t seek a role model, find a reference.

T O K E N I S M

you are amazing because you are amazing

You need to validate and back yourself up

Culture and generation

  • typewriting courses
  • cycles and the system itself
  • self-perpetuating?
    • e.g. Math is a “science” (?)

Women have to prove. Men don’t.

Francis Valentine - study

  • biggest influence for 17/18 year olds? Mother.
  • not the right prereqs for course, issues following pathways laters
  • (much discussion ensuring methods don’t do the same today)

Why do so many women leave engineering?

  • sexism, harassment, stereotyping.

Real decisions happen in social situations (e.g. gold, beer)

How do you address?

  • Ally skills workshop
  • even as a minority it helped
  • Again, ASSUMPTIONS
    • Tip: don’t engage and reward bad behaviour
    • How would you address if they weren’t a woman?
      • ask if they want an intercept

It may not be “wrong”, just different

Be mindful of your space.

Loneliness in devops

Written text is one dimensona

How do you share team culure remoetly?

  • check in with peers twice a day

Loneliness can still happen in the same office

Work slack? Have a chatter/misc channel

  • normalize socialisation
  • “donut” bot for Slack - random intro for coworkers

No-agenda calls

Physical face-to-face (onboarding, team building, etc)

  • irreplaceable human interaction
  • volunteer as a group, share an experience

Random “getting to know you” questions

  • neptunes pride
  • random question websites
  • raid the room
  • Cards again humanity may not be the best choice

Pissups aren’t the best

Monday morning coffee/muffin

  • anyone is welcome

Wanting lonely time

  • keep balance
  • not always coffee time.

TLDR:

  • believe in yourself
  • believe in your work
  • empathy and kindness

Documentation for on-call

TLDR: START.

Copying solutions to responses

Mildbland slack/github interface via emoji

response.pagerduty.com

copy the slack thread! save the state

Be condescening when you write documentation

link in context

searchability

seed the start of the doc

  • how to log in
  • where is monitoring
  • how to get logs
  • dive through old outage artefacts

Team of one? STILL document it!

  • think about if you are the only one, are the systems really that important?

Migrate DMs to a shared channel

Docs should be alive

Raise tickets at 3am for 9am coworkers (or you)

  • “triage roster” in business hours

On call - you need to response/triage

  • BUT you may not be the one to fix it

Test docs with a non-expert following them

Blameless post mortem

  • was the docs at fault?
  • doc as you fix
  • doc root cause

Code of Condiuct

Fix confluence search - elastic, algoia)

“Do nothing” script

On call implementations

  • ensure you hand over with changes to the next person
  • if a deployment happens
    • pager override to them

Outage fixed? email your team -> evolve into a formal handoff.