PCA Sample Practice Questions¶
Practice Resources¶
Domain 1: Observability Concepts (18%)¶
Question 1¶
What are the three pillars of observability?
Show Answer
**Answer:** Metrics, Logs, and Traces - **Metrics**: Numerical values that measure aspects of a system over time - **Logs**: Immutable records of discrete events - **Traces**: Records of request paths through distributed systemsQuestion 2¶
What is the difference between an SLA, SLO, and SLI?
Show Answer
- **SLA (Service Level Agreement)**: A formal agreement with customers defining expected service levels - **SLO (Service Level Objective)**: Internal targets that teams aim to achieve - **SLI (Service Level Indicator)**: The actual metrics used to measure service performance Example: SLA promises 99.9% uptime, SLO targets 99.95%, SLI measures actual availability.Question 3¶
When should you use the Push model (Pushgateway) instead of the Pull model?
Show Answer
Use Pushgateway for: - Short-lived batch jobs - Cron jobs that complete before scraping - Jobs behind firewalls that can't be scraped - Legacy systems that can't expose endpoints **Important**: Pushgateway should NOT be used as a general metrics aggregator.Question 4¶
What is a span in the context of distributed tracing?
Show Answer
A **span** represents a single operation within a trace. It provides: - Start and end timestamps - Operation name - Tags/labels - Logs/events - Parent span reference Multiple spans together form a complete trace showing the request flow through a system.Domain 2: Prometheus Fundamentals (20%)¶
Question 5¶
What are the four metric types in Prometheus?
Show Answer
1. **Counter**: Cumulative metric that only increases (resets on restart) 2. **Gauge**: Metric that can go up or down 3. **Histogram**: Samples observations into configurable buckets 4. **Summary**: Similar to histogram but calculates quantiles client-sideQuestion 6¶
What is the purpose of relabeling in Prometheus?
Show Answer
Relabeling allows you to: - Modify labels before scraping (`relabel_configs`) - Filter which targets to scrape - Modify labels before storing (`metric_relabel_configs`) - Drop unwanted metrics - Rename labels - Extract values from labels using regexQuestion 7¶
Why should you avoid high-cardinality labels?
Show Answer
High-cardinality labels (like user IDs or request IDs) create problems because: - Each unique label combination creates a new time series - Increases memory usage significantly - Slows down queries - Can cause Prometheus to run out of memory **Best practice**: Use labels with bounded, low-cardinality values.Question 8¶
What is the difference between scrape_interval and evaluation_interval?
Show Answer
- **scrape_interval**: How often Prometheus scrapes targets for metrics (default: 1m) - **evaluation_interval**: How often Prometheus evaluates recording and alerting rules (default: 1m) These can be set globally and overridden per scrape job.Domain 3: PromQL (28%)¶
Question 9¶
What is the difference between rate() and irate()?
Show Answer
- **rate()**: Calculates per-second average rate over the entire range - More stable, better for alerting - Uses all data points in the range - **irate()**: Calculates instant rate using only the last two data points - More responsive to changes - Better for volatile metrics in graphs - Can miss spikes between scrapesQuestion 10¶
Write a PromQL query to calculate the 95th percentile latency from a histogram.
Show Answer
Or with aggregation by service:Question 11¶
How do you calculate error rate as a percentage?
Show Answer
This divides error requests by total requests and multiplies by 100 for percentage.Question 12¶
What does the absent() function do?
Show Answer
`absent()` returns 1 if the vector has no elements, otherwise returns nothing. Use cases: - Alert when a metric is missing - Detect when a service stops reportingQuestion 13¶
How do you compare current values to values from 1 hour ago?
Show Answer
Use the `offset` modifier:Question 14¶
What is the difference between sum by and sum without?
Show Answer
- **sum by (label)**: Aggregates and keeps only the specified labels - **sum without (label)**: Aggregates and removes the specified labels, keeping all othersDomain 4: Instrumentation and Exporters (16%)¶
Question 15¶
What are the Four Golden Signals of monitoring?
Show Answer
From Google SRE: 1. **Latency**: Time to service a request 2. **Traffic**: Demand on your system (requests/second) 3. **Errors**: Rate of failed requests 4. **Saturation**: How "full" your service isQuestion 16¶
What metrics does the Node Exporter provide?
Show Answer
Node Exporter provides hardware and OS metrics: - CPU usage (`node_cpu_seconds_total`) - Memory (`node_memory_*`) - Disk (`node_filesystem_*`, `node_disk_*`) - Network (`node_network_*`) - Load average (`node_load1`, `node_load5`, `node_load15`) - System infoQuestion 17¶
What is the correct naming convention for Prometheus metrics?
Show Answer
Format: `Question 18¶
When should you use the Blackbox Exporter?
Show Answer
Use Blackbox Exporter for: - HTTP/HTTPS endpoint probing - TCP port checks - DNS lookups - ICMP ping checks - SSL certificate expiry monitoring It's useful for monitoring external services or endpoints where you can't install an exporter.Domain 5: Alerting & Dashboarding (18%)¶
Question 19¶
What are the three states of an alert in Prometheus?
Show Answer
1. **Inactive**: The alert condition is not met 2. **Pending**: Condition is met but `for` duration hasn't elapsed 3. **Firing**: Condition has been true for the `for` durationQuestion 20¶
What is the purpose of the for clause in an alert rule?
Show Answer
The `for` clause specifies how long the condition must be true before the alert fires. Benefits: - Prevents flapping alerts - Reduces false positives from brief spikes - Ensures the issue is persistentQuestion 21¶
What is the difference between silences and inhibition in Alertmanager?
Show Answer
**Silences**: - Manually created to mute specific alerts - Time-bounded (start and end time) - Used for maintenance windows - Created via UI or API **Inhibition**: - Automatic suppression based on rules - Suppresses alerts when related alerts are firing - Configured in alertmanager.yml - Example: Suppress warnings when critical is firingQuestion 22¶
What is a recording rule and when should you use one?
Show Answer
Recording rules pre-compute frequently used or expensive PromQL expressions. Use when: - Query is computationally expensive - Query is used in multiple dashboards/alerts - You need to aggregate across federation - Query performance is criticalQuestion 23¶
How does Alertmanager group alerts?
Show Answer
Alertmanager groups alerts based on: - `group_by` labels in the route configuration - Alerts with matching group labels are batched together Configuration:Question 24¶
What notification channels does Alertmanager support?
Show Answer
Built-in receivers: - Email (SMTP) - Slack - PagerDuty - OpsGenie - VictorOps - Webhook (for custom integrations) - Pushover - WeChat - Telegram Custom integrations can be built using the webhook receiver.Bonus Questions¶
Question 25¶
What is meta-monitoring?
Show Answer
Meta-monitoring is monitoring the monitoring system itself (Prometheus monitoring Prometheus). Important metrics to monitor: - `prometheus_tsdb_head_series` - Number of time series - `prometheus_engine_query_duration_seconds` - Query performance - `prometheus_target_scrape_pool_sync_total` - Scrape health - `up{job="prometheus"}` - Prometheus availabilityQuestion 26¶
How can you scale Prometheus for high availability?