Observability is not just about being able to ask questions to your systems. It's also about getting those answers in minutes and not hours.
#sre
#last9
Good folks in Bangalore attending
@DevOpsDaysIN
- please leave a 👋below.
We have merch, games, and some really fun experimental things (😉) at the conference tomorrow.
Drop by our booth 😍
#DevOpsDaysIN
.
📣 We're giving away two conference tickets for
@GopherConIndia
.
(28th Aug, Pune)
Narrate a story on how Go has helped you, or a problem it has unlocked.
Comment below 👇🏻
⚽️Half the world watched the recent FIFA World Cup 🏈113 million people watched the Superbowl
🏏And then, there’s cricket…
450+ million people watched the last
#IPL
What does this mean for an
#SRE
? 👇
#CricketScale
"The management of engineering health is stuck in the stone age." -
@ponnappa
How does business get better at defining a metric and holding engineering accountable?
MTTRs
Answers 👇
Dashboards are horrendous puzzle assembling exercises.
So, here’s a tale of dashboards:
Left: A sample dashboard an Engineer put together
Right:
@last9io
's ChangeBoard
Gone are the days of yore when we named are our servers Etsy, Betsy, and Momo, fed them fish. Well, servers were our pets.
Is it beneficial anymore to continue observing an individual server?
What should engineering teams focus on in the midst of a tech arctic winter?
@AjeyGore
writes for us on ‘tech debt’ and 4 broad areas to tackle in the coming year 👇
Here. We. Go. 🟢
We just completed our SOC II Type 2 Certification - a third-party audit of our data security and privacy protection processes.
Onwards & Upwards 🪜
Our High Cardinality slayer
@preeti_dewani
takes the stage at
@rootconf
.
If you're around, ask her everything about how we tame High Cardinality, manage 'Cricket scale', and our differentiated approach to
#SRE
#DevOps
.
Oh, and,
@sphirani
is there for puns, but do quiz him about…
What does the future of reliability tooling look like?
Catch Azret from
@Yieldstreet
&
@realmeson10
from
@last9io
on Monday, Sep 12, 6PM ET.
(Lots of war stories to be spilt, over drinks 😜)
Please RSVP here 👇
Today,
@realmeson10
talks about doing SRE the right way with
@hasgeek
&
@sphirani
on Mindset, Processes & Tools at
What does your orgs SRE Recipe look like?
Prometheus remote write is the defacto way to ship metrics to a hosted Prometheus setup.
Its default settings work just fine, but we ran into a scenario that needed some tuning to make it work at scale.
Without strong policy rules for nomenclature, and verbiage around observable 'entities', your war rooms are perpetually going to be chaotic.
@aniket_rao
on the importance of 'boring' in the world of SRE 👇
I had a plan. To tell you a Terraform joke. I wanted to apply my best efforts. I hoped it would be so good that it would destroy any other terraform jokes that came before it. If you didn't get it yet, it's not you. It's just the lack of init-iative from my side.
#SRE
#DevHumor
What does the Shannon Limit tell you about Site Reliability Engineering? 🤔
@satyajeetjadhav
takes the India vs Pakistan cricket match 🏏 as an example to narrate a short story you must read 👇
What in the world is 'Reliability engineering' ?
From
#DevOps
,
#SRE
to tooling for building reliable systems,
@mohandutt134
does an, 'Explain Like I'm 5' about our complex, fascinating universe.
Still don't get it? Send your protestations to Mohan 😜
Folks coming to
#SREcon
, please do visit our booth. We have goodies to give, and your chance to quiz
@nishantmodak
&
@prathamesh2_
on how we think of Reliability engineering.
Pro tip: Ask them about their funniest war room stories, they have plenty to share 😜
@SREcon
We all have that story of being paged about something going wrong at 5 AM!
The
@PagerDuty
call is a nightmare to wake up to. It's always been a pain to understand
- Is there a pattern to this failure?
- Should this have been a Page or a Ticket to look at with morning coffee ☕️
If the answer is kubernetes then what is the question? 🤔
Reply with wrong questions only.
Best answer will be picked on 14th November and get Last9 swag + early access to the platform! 🎁
SLOs that lie
- Is uptime really the right measure of your reliability?
- What happens when that which monitors downtime has downtime?
- If upstream/downstream is down - how does it impact your numbers?
Wit, snark and insight guaranteed.
@gojektech
🏏 What do
#DevOps
/
#SREs
do to prepare for massive events such as the IPL?
💪 How do engineering teams collaborate?
😱 What kind of edge cases do teams witness at scale?
Catch
@nishantmodak
quiz
@theprogrammerin
.
Register here -
The neglected 'Arctic' tooling 🥶 — Money spent on internal tools to support tech infrastructure is broken, contends
@nishantmodak
.
Read this stellar post on
@moneycontrolcom
on what you can do as a CTO to reduce costs during this tech downturn 👇
🔢 Metrics
🎫 Events
🪵 Logs
🗺️ Traces
The nuts and bolts of
#Observability
start with 4 key data types. But… How much do we know about M.E.L.T?
Register below for a talk by
@prathamesh2_
👇
Is it a bird? Is it a plane? No it's Clever Hans! - the wicked smart horse who is here to guide you on the mindset to cultivate when choosing SRE tools. Check out - follow and subscribe our newsletter for much such interesting stories.
#SRE
#DevOps
SLAs at 99.9x% are challenging.
How is it being measured?
Who monitors the uptime of the uptime monitor?
If something is checked for x secs, was it always up, or only up during the check? 🤪
Answers in our next
#include
😍
🗓9th July, 6:30PM IST
👉
An Explain It Like I'm 5 on OpenTelemetry 👇
Our resident ELI5 expert
@mohandutt134
delves into all things OTel, and what do they mean for an engineering organization.
Service Level Objectives: Where do we start?
Most of us have heard about SLOs & know what they mean but always found it hard to start adopting them across our teams. Come learn how to get started on SLOs!
Sign up here for our meet-up this Sat 19th Feb:
In cricket, a hat-trick occurs when a bowler takes three wickets with consecutive deliveries.
Reason for this seemingly obvious fact -
@realmeson10
pulled off a hat-trick as last9's tech posts got featured back to back in SRE weekly issue
#247
,
#248
, Devops Weekly
#520
!
1/4
No longer!
This *neat* 👌🏼 Health insight about a Service has saved teams hours of troubleshooting and given them the much-needed 2 hours of morning sleep!
It also created tickets to track these failure scenarios to be debugged with morning coffee!
#sre
#page
A simple guide by
@sphirani
to crunch numbers for understanding overall HTTP content length metrics.
Tip: Having the right tooling handy to get back-of-the-napkin calculations gives insights before instrumenting logs and metrics.
What is High Cardinality in Reliability engineering?
@mohandutt134
with another
#ELI5
(Explain Like I'm 5) on a problem we're hearing a lot more about; how to deal with High Cardinality?
But, before that, a simple explanation of High Cardinality with this real-world example👇…
It’s December 31st.
Food delivery orders are going to go off the roof.
Then...
The Customer Support team tells you that some orders are not coming through & there are complaints on...
Twitter.
@prathamesh2_
on an all too familiar story 👇
DevOps is dead
Monitoring is dead.
Observability is dead.
Platform engineering is dead.
Site Reliability is dead.
What’s working then, asks
@aniket_rao
?
“At which point in the maturity of an engineering org should you focus on reliability tooling?”
It’s a question we get often, so, some answers below + a whitepaper by
@realmeson10
to delve deeper 🟢
Levitate intends to change this, and more.
We want to ‘uplift’ our customers from metrics woes, because the self-management sucks like gravity. 😜
Hence, Levitate. 🕴
Last year’s Indian Premier League witnessed nearly 450 million viewers on
@JioCinema
🏏 This year promises to be bigger, better. How does a Site Reliability Engineer prepare to monitor their distributed infrastructure at this massive scale?
@theprogrammerin
himself in…
Who wants
@realmeson10
to do an AMA on this?
Coerce him to share some secrets around monitoring infra at Cricket Scale, taming High Cardinality data, and war room stories from our stables? 😝
🏏 Every peak season of 'Cricket Scale'
@last9io
gateways accept >250 million requests a minute.
This insanity at API gateways alone can set us back by a million-dollar cloud bill 💰.
Guess how we cheaply unmarshal, validate, sanitise, and enqueue while guaranteeing 99.99%?
Every week, there’s one piece dedicated to a tombstone claiming the death of DevOps, or SRE, of o11y or monitoring, or ‘Platform’ Engineering, or whatever new thing props up. -
@aniket_rao
Here’s how the
#monitoring
landscape has taken shape ⬇️😂
"A system should be correlated to its monetary efficacy. If my EMEA server goes down, how much money do I stand to lose in a day? How about 90 minutes a month? What are my monetary dependencies on a service?"
Sanjay Singh from Games24x7 writes 👇
Fun line-up with ChatGPT at the center of conversations.
With
@_swanand
,
@chinmay185
and our very own Aditya Godbole, this promises to be educational, as much as entertaining!
Register here:
Understanding the world of Site Reliability Engineering through self-driving cars.
@mohandutt134
maps out the state of
#Observability
, and the missing pieces in this Rube Goldberg of an industry 😉
Cardinality. 🔢
If you're an
#SRE
, one word that would've likely cropped up more times than you can count - Cardinality.
At Last9, we've worked really hard to rein High Cardinality with Levitate - Our Time Series Data Warehouse.
Our first episode features the one and only
@goinggodotnet
💚 at
@ardanlabs
Lots of interesting learnings from Bill:
✅The 2-week QA rule
✅The mistake boot
✅Building ACs
✅Compute as a black box
✅How to use AI in coding
...and so much more:
Get your hot, fresh Friday learnings here -
@realmeson10
peels the layers of percentiles and explains how downsampling and aggregation colour the numbers that you see.
1/9 In the SRE world, we often talk about P99 latency as a way to measure user experience.
Often, this percentile approach as a yard stick yields incorrect results.
I've tried capturing the common misunderstanding of percentiles and their limitations in this thread and a blog.