This course teaches you techniques for monitoring, troubleshooting, and improving infrastructure and application performance in Google Cloud. Guided by the principles of Site Reliability Engineering (SRE), and using a combination of presentations, demos, hands-on labs, and real-world case studies, attendees gain experience with full-stack monitoring, real-time log management and analysis, debugging code in production, tracing application performance bottlenecks, and profiling CPU and memory usage.


* Actual course outline may vary depending on offering center. Contact your sales representative for more information.

Learning Objectives

This course teaches participants the following skills:
Plan and implement a well-architected logging and monitoring infrastructure
Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
Create effective monitoring dashboards and alerts
Monitor, troubleshoot, and improve Google Cloud infrastructure
Analyze and export Google Cloud audit logs
Find production code defects, identify bottlenecks, and improve performance
Optimize monitoring costs

  • Introduction to Google Cloud Monitoring Tools

  • Understand the purpose and capabilities of Google Cloud operations-focused components: Logging, Monitoring, Error Reporting, and Service Monitoring
    Understand the purpose and capabilities of Google Cloud application performance management focused components: Debugger, Trace, and Profiler

  • Avoiding Customer Pain

  • Construct a monitoring base on the four golden signals: latency, traffic, errors, and saturation
    Measure customer pain with SLIs
    Define critical performance measures
    Create and use SLOs and SLAs
    Achieve developer and operation harmony with error budgets

  • Alerting Policies

  • Develop alerting strategies
    Define alerting policies
    Add notification channels
    Identify types of alerts and common uses for each
    Construct and alert on resource groups
    Manage alerting policies programmatically

  • Monitoring Critical Systems

  • Choose best practice monitoring project architectures
    Differentiate Cloud IAM roles for monitoring
    Use the default dashboards appropriately
    Build custom dashboards to show resource consumption and application load
    Define uptime checks to track aliveness and latency

  • Configuring Google Cloud Services for Observability

  • Integrate logging and monitoring agents into Compute Engine VMs and images
    Enable and utilize Kubernetes Monitoring
    Extend and clarify Kubernetes monitoring with Prometheus
    Expose custom metrics through code, and with the help of OpenCensus

  • Advanced Logging and Analysis

  • Identify and choose among resource tagging approaches
    Define log sinks (inclusion filters) and exclusion filters
    Create metrics based on logs
    Define custom metrics
    Link application errors to Logging using Error Reporting
    Export logs to BigQuery

  • Monitoring Network Security and Audit Logs

  • Collect and analyze VPC Flow logs and Firewall Rules logs
    Enable and monitor Packet Mirroring
    Explain the capabilities of Network Intelligence Center
    Use Admin Activity audit logs to track changes to the configuration or metadata of resources
    Use Data Access audit logs to track accesses or changes to user-provided resource data
    Use System Event audit logs to track GCP administrative actions

  • Managing Incidents

  • Define incident management roles and communication channels
    Mitigate incident impact
    Troubleshoot root causes
    Resolve incidents
    Document incidents in a post-mortem process

  • Investigating Application Performance Issues

  • Debug production code to correct code defects
    Trace latency through layers of service interaction to eliminate performance bottlenecks
    Profile and identify resource-intensive functions in an application

  • Optimizing the Costs of Monitoring

  • Analyze resource utilization cust for monitoring related components within Google Cloud
    Implement best practices for controlling the cost of monitoring within Google Cloud


This class is intended for customer job roles including, Cloud architects, administrators, and SysOps personnel, Cloud developers and DevOps personnel.




To get the most out of this course, participants should have: Completion of “Google Cloud Platform Fundamentals: Core Infrastructure” or equivalent experience Basic scripting or coding familiarity Proficiency with command-line tools and Linux operating system environments


Length: 3.0 days (24 hours)


Not Your Location? Change

Course Schedule:

To request a custom delivery, please chat with an expert.