Image

GCP Logging And Monitoring

GCP-LOG-logging-and-monitoring.png

Specific Tools:

  • Debugger
    • Inspection of a service’s code state without stopping or degrading its performance
  • Profiler
    • Examine CPU and Memory to help spot bottlenecks and improve algorithmic performance
  • Trace
    • Analysing latency in an application
  • Logs API
    • Used to developers to write directly to Google Cloud logs
  • Uptime Check
    • Regularly check the public connectivity of your app

Principle

Monitoring should strive to address two questions:

  • What exactly is broken? (Symptom)
  • And why? (Cause)

Example: The site is serving is “500” errors, because the db is down. A latency spike due to VM resources exhausted.

Monitoring Scope

Normally, each project on GCP is scoped locally, but one project can contain the metrics scope of another, allowing it to be observed by this project.

Per Google’s recommended best practices, the project we use to host the metrics scope will not be one of the projects actually housing monitored resources.

Cloud Monitoring

Supports two query languages. Both are text-based.

  • MQL. Manipulate, retrieve, and perform complex operations on time-series data. Very versatile, pretty much like SQL but for cloud metrics on GCP.
  • PromQL. Query system metrics from GKE and Compute Engine.

Example with MQL to count the number of http 500 responses:

fetch https_lb_rule::loadbalancing.googleapis.com/https/request_count | group_by [matched_url_path_rule], sum(if(response_code_class = 500, val(), 0)) / sum(val())

Group Monitoring

Note: If a Monitoring group is created based on labels, then the group will keep checking for powered off server for 5 minutes. After 5 minutes, Google Cloud determines the server should no longer be counted as a member of the group.
This is important because if an uptime check is tied to the group, then it will only report failures while the group reports that missing server.
When the group quits reporting the off server, the uptime check quits checking for it, and suddenly the check starts passing again. This can be a real issue if you’re not careful.

© Filip Niklas 2024. All poetry rights reserved. Permission is hereby granted to freely copy and use notes about programming and any code.