1
00:00:07,450 --> 00:00:09,354
After watching this video, you will be able
to:

2
00:00:09,354 --> 00:00:12,658
Recognize how Site Reliability Engineering
differs from DevOps

3
00:00:12,658 --> 00:00:16,490
Recognize the commonality between Site Reliability
Engineering and DevOps

4
00:00:16,490 --> 00:00:21,570
Explain how Site Reliability Engineering and
DevOps can be used together

5
00:00:21,570 --> 00:00:25,829
You might be wondering how DevOps compares
with Site Reliability Engineering (SRE).

6
00:00:25,829 --> 00:00:30,650
Before we can explore this topic, we should
probably describe what SRE is, how it differs

7
00:00:30,650 --> 00:00:36,020
from DevOps, and how you can leverage SRE
in a DevOps environment.

8
00:00:36,020 --> 00:00:40,770
According to Benjamin Treynor Sloss, SRE is
“…what happens when a software engineer

9
00:00:40,770 --> 00:00:45,010
is tasked with what used to be called operations.”

10
00:00:45,010 --> 00:00:49,760
Most system administrators are happy doing
the same manual tasks day after day perhaps

11
00:00:49,760 --> 00:00:52,760
because they feel that it's their job to do
these manual tasks.

12
00:00:52,760 --> 00:00:57,000
But if you ask a software engineer to build
a server, they will probably do it manually

13
00:00:57,000 --> 00:00:58,000
the first time.

14
00:00:58,000 --> 00:01:02,830
If a few days later you ask them to build
another server just like the first one, they

15
00:01:02,830 --> 00:01:04,260
might do that one manually as well.

16
00:01:04,260 --> 00:01:08,880
But by the time you ask for the third server,
a software engineer is going to start writing

17
00:01:08,880 --> 00:01:12,979
a program that builds the server for them
automatically.

18
00:01:12,979 --> 00:01:14,819
That's just how software engineers think.

19
00:01:14,819 --> 00:01:18,490
They're programmers; they write programs.

20
00:01:18,490 --> 00:01:23,340
The goal of site reliability engineers is
to automate themselves out of a job.

21
00:01:23,340 --> 00:01:28,200
Of course, that will never happen because
there are always more things to automate.

22
00:01:29,106 --> 00:01:33,539
One of the tenets of SRE is to only hire software
engineers.

23
00:01:33,539 --> 00:01:37,609
You want people who know how to write code
so that they can automate repetitive tasks

24
00:01:37,609 --> 00:01:40,270
using Infrastructure as Code.

25
00:01:40,270 --> 00:01:45,259
Site reliability engineers focus on reducing
toil, that is, repetitive, manual tasks.

26
00:01:45,259 --> 00:01:49,869
It is recommended that they spend about 50%
of their time reducing toil through automation.

27
00:01:49,869 --> 00:01:53,929
The idea is that anything you do repeatedly
should be automated.

28
00:01:53,929 --> 00:01:56,729
You shouldn't be doing the same manual task
day after day.

29
00:01:57,418 --> 00:02:00,250
SRE teams are separate from development teams.

30
00:02:00,250 --> 00:02:04,039
This is a big difference between DevOps and
SRE.

31
00:02:04,039 --> 00:02:08,179
DevOps is the recognition that working in
separate siloed teams is inefficient.

32
00:02:08,179 --> 00:02:12,540
But SRE, on the other hand, keeps those silos
in place.

33
00:02:12,540 --> 00:02:17,379
The development team is a separate and distinct
team from the operations team.

34
00:02:17,379 --> 00:02:21,507
In SRE, stability is controlled through something
known as error budgets.

35
00:02:21,507 --> 00:02:25,850
Developers are allowed to deploy their applications
into production as long as they don't cause

36
00:02:25,850 --> 00:02:28,349
too many production outages.

37
00:02:28,349 --> 00:02:32,629
The upper limit of allowed outages caused
by errors is the error budget.

38
00:02:32,629 --> 00:02:37,189
So, let's say you’ve got a service level
agreement of 99.9% uptime.

39
00:02:37,189 --> 00:02:40,069
That equates to about 44 seconds per month
of downtime.

40
00:02:40,069 --> 00:02:44,100
As long as the outages are below 44 seconds
per month, the developers are free to keep

41
00:02:44,100 --> 00:02:46,709
deploying their releases to production.

42
00:02:46,709 --> 00:02:50,530
Once the developers have caused enough outages
to exceed their error budget, they're no longer

43
00:02:50,530 --> 00:02:52,549
allowed to deploy to production.

44
00:02:52,549 --> 00:02:53,680
This actually works pretty well.

45
00:02:53,680 --> 00:02:57,579
It solves the problem of developers waiting
for operations, yet it still gives operations

46
00:02:57,579 --> 00:03:02,519
control over the stability of the production
environment.

47
00:03:02,519 --> 00:03:07,230
One last thing about SRE is that developers
spend about 5% of their time rotating through

48
00:03:07,230 --> 00:03:12,220
the operations team so that they understand
what the SRE team is doing on a daily basis.

49
00:03:12,220 --> 00:03:16,530
Also if they cause too many outages, or the
toil exceeds 50% of the site reliability engineer’s

50
00:03:16,530 --> 00:03:21,980
time, more developers are shifted to operations
to help bring things back into balance.

51
00:03:21,980 --> 00:03:25,969
There is a big difference in teaming between
SRE and DevOps.

52
00:03:25,969 --> 00:03:31,060
As we've learned, SRE maintains separate development
and operations teams, but it does have one

53
00:03:31,060 --> 00:03:32,879
staffing pool.

54
00:03:32,879 --> 00:03:37,379
That means if you need another site reliability
engineer, you take away one of the developers.

55
00:03:37,379 --> 00:03:41,599
If you want another developer, you take away
one of the site reliability engineers.

56
00:03:41,599 --> 00:03:43,760
This is an effort to balance things out.

57
00:03:43,760 --> 00:03:49,790
DevOps on the other hand breaks down the silos
into one team with one common business objective

58
00:03:49,790 --> 00:03:53,870
to deploy software to production quickly and
safely.

59
00:03:53,870 --> 00:03:58,219
The other big difference between DevOps and
SRE is how they maintain production stability.

60
00:03:58,219 --> 00:04:04,299
As we said, SRE uses error budgets that development
has to comply to and those are based on service-level

61
00:04:04,299 --> 00:04:05,540
objectives.

62
00:04:05,540 --> 00:04:09,670
When a developer exceeds the error budget,
making production unstable, they can no longer

63
00:04:09,670 --> 00:04:11,639
deploy to production.

64
00:04:11,639 --> 00:04:18,000
In contrast, DevOps maintains stability by
using automation through Continuous Delivery

65
00:04:18,000 --> 00:04:23,720
pipelines, and by making sure that everyone
is responsible for the code that runs in production.

66
00:04:23,720 --> 00:04:27,640
DevOps has this “you build it, you run it”
mantra.

67
00:04:27,640 --> 00:04:32,480
Unlike SRE, developers are responsible for
their applications in production.

68
00:04:32,480 --> 00:04:36,860
There is commonality between the two practices.

69
00:04:36,860 --> 00:04:40,560
Both seek to make development and operations
visible to each other.

70
00:04:40,560 --> 00:04:46,660
Whether you have developers rotating through
operations as in SRE, or you have development

71
00:04:46,660 --> 00:04:51,880
operations on the same team as in DevOps,
everyone understands what it takes to keep

72
00:04:51,880 --> 00:04:54,320
production stable.

73
00:04:54,320 --> 00:04:56,470
Both require a blameless culture.

74
00:04:56,470 --> 00:04:58,980
No one comes to work wanting to take down
production.

75
00:04:58,980 --> 00:05:03,280
It’s usually the system that fails the people,
not the other way around.

76
00:05:03,280 --> 00:05:07,560
So having a blameless culture is important
in both practices.

77
00:05:07,560 --> 00:05:13,510
People can speak openly and honestly about
how things are going and how to improve things.

78
00:05:13,510 --> 00:05:17,830
The objective of both is the same—to deploy
software faster with stability.

79
00:05:17,830 --> 00:05:24,060
So, DevOps and SRE do have common goals, they
just achieve them in completely different

80
00:05:24,060 --> 00:05:26,130
ways.

81
00:05:26,130 --> 00:05:29,910
When we look at how DevOps and SRE can be
complement each other and used together,

82
00:05:29,910 --> 00:05:35,470
I like to think of SRE as the team that maintains
the infrastructure and DevOps as the team

83
00:05:35,470 --> 00:05:38,430
that uses the infrastructure to maintain their
applications.

84
00:05:38,430 --> 00:05:44,580
If you are in a cloud environment, SRE includes
the people who operate the cloud and DevOps

85
00:05:44,580 --> 00:05:47,960
includes the people who are consuming the
cloud.

86
00:05:47,960 --> 00:05:52,460
This is why using things like platform as
a service is so important to DevOps.

87
00:05:52,460 --> 00:05:54,650
The SRE teams provide a platform.

88
00:05:54,650 --> 00:06:00,280
The DevOps teams utilize the platform to deploy
their applications.

89
00:06:00,280 --> 00:06:01,860
In this video, you learned that:

90
00:06:01,860 --> 00:06:04,100
SRE takes a different approach than DevOps,

91
00:06:04,687 --> 00:06:06,740
SRE and DevOps have some common goals.

92
00:06:07,620 --> 00:06:12,590
SRE and DevOps can be used together to both
maintain and use computer infrastructure.