Deploying a Scalable Application with SRE Best Practices

ADVANCED
180 minutes
5 tasks

In this lab, you will deploy a scalable web application that adheres to Site Reliability Engineering (SRE) best practices on Google Cloud Platform. The lab will guide you through defining SLIs and SLOs, setting error budgets, and configuring autoscaling policies for your application. Additionally, you will implement CI/CD pipelines using Cloud Build and Cloud Deploy, ensuring a streamlined deployment process with monitoring and alerting integrations for proactive incident management.

Scenario

You have been hired as a DevOps engineer at a tech company, responsible for scaling its customer-facing web application. The application must maintain 99.9% uptime, with a latency threshold under 300ms per request at peak times. Your team includes SREs tasked with maintaining a balance between change velocity and system reliability during frequent updates.

Learning Objectives

  • Balance change velocity and reliability using error budgets and SLOs
  • Implement and manage CI/CD pipelines for efficient deployments
  • Configure autoscaling for Managed Instance Groups to optimize scalability
  • Set up monitoring and alerting using Cloud Monitoring and Error Reporting

tasks (5)

task 1: Define SLIs, SLOs, and error budgets for the web application

30 min

task 2: Create a CI/CD pipeline with Cloud Build and Cloud Deploy

60 min

task 3: Configure autoscaling policies for Managed Instance Groups

45 min

task 4: Implement monitoring and alerting using Cloud Monitoring

60 min

task 5: Enhance incident response with automated failovers

60 min

Prerequisites

  • Basic understanding of SRE principles
  • Familiarity with CI/CD concepts
  • Experience with GCP console navigation

Skills Tested

Defining SLIs and SLOsImplementing CI/CD pipelinesConfiguring autoscaling for Managed Instance GroupsSetting up Cloud Monitoring and alertingAutomating failover processes