AWS Cloud Operations Blog
Identifying resources driving Amazon CloudWatch GetMetricData charges using AWS CloudTrail
Organizations frequently use third-party monitoring tools to retrieve CloudWatch metric data for their dashboards and alerting systems. This practice often leads to significant GetMetricData API usage and results in high CloudWatch costs. A common challenge for cost optimization teams is identifying which specific resources or applications are driving these increased expenses, especially when they’re not directly involved with the operational workflows.
Previously, there were no data sources available to the customer to isolate the clients making excessive GetMetricData calls. Consequently, AWS Support had to be engaged to provide client details (e.g. IAM Role, Source IP Address). This increased the time to validate the business case and take action to reduce costs.
With the introduction of CloudTrail data events for CloudWatch GetMetricData, these API calls can be captured by a CloudTrail trail or lake. Customers can now conduct their own investigations and implement corrective actions faster.
In this post, you’ll learn how to gain deeper insights into your CloudWatch GetMetricData API usage. CloudWatch data events can be enabled on a CloudTrail trail and logged to Amazon S3 well as CloudWatch Logs. We will demonstrate how use Amazon Athena and Amazon CloudWatch Log Insights to identify the specific sources responsible for GetMetricData API calls.
Log Insights recently gained OpenSearch SQL support. By using SQL, customers can easily pivot between Log Insights and Athena with minor changes to their queries.
Solution Overview
This solution covers how to:
- Enable CloudWatch data events on an existing CloudTrail trail
- Use AWS API Usage metrics to identify timeframes with high or unusual API activity
- Analyze the corresponding GetMetricData API data events using Amazon Athena and CloudWatch Log Insights queries
- Implement best practices to avoid excessive CloudWatch API Charges
Prerequisites
Before we start, make sure you have an existing CloudTrail trail in the target region. You can temporarily enable CloudWatch metric data events on an existing trail to investigate usage. If you do not have an existing trail, please follow this guide to learn how to create a CloudTrail trail using the AWS Console.
Alternatively, a new trail can be created with only CloudWatch data events enabled. Once the analysis is completed, the trail can be deleted to avoid ongoing charges.
Solution Walkthrough
1. Enable CloudWatch Data events in CloudTrail
CloudWatch data events incur additional charges and are not enabled by default.
To enable data events:
- Navigate to the CloudTrail Console.
- Open Trails and choose the desired trail.
- Under Data events, choose Edit
Figure 1: CloudTrail Data Events not configured
- Under the Events section, select the Data events checkbox.
- Under the Data events section, choose Switch to advanced event selectors.
- Next under Resource type choose CloudWatch metric from the dropdown.
- Choose Save Changes.
Figure 2: Enabling CloudWatch metric Data events
Once the trail is created, it may take a few minutes to log API activity.
2. Identify a time range to investigate
AWS publishes CloudWatch API usage metrics to the AWS/Usage
namespace. This includes the following metrics:
CallCount
: Successful API callsErrorCount
: Failed API callsThrottleCount
: Throttled API calls
By visualizing the CallCount
metric over time, you can identify your baseline API usage, detect spikes or significant increases and pinpoint time ranges for further investigation. This can be useful when correlating new workflows utilizing the GetMetricData API calls against increases in call volume. The objective is to generate a CloudWatch graph illustrating the daily usage over the last 30 days.
To create a CloudWatch metrics graph illustrating GetMetricData API calls:
- Navigate to the CloudWatch Console
- In the navigation pane, under Metrics choose All metrics.
- In the search field on the All metrics tab, enter a search term
GetMetricData
- Choose Usage By AWS Resource
- Choose the CallCount metric
- From the Graphed Metrics tab, choose Statistic to
SUM
. - Above the graph is a time/date selector. By choosing Custom and then Absolute, a specific time range can be selected for analysis. Alternatively, choosing Relative will allow you to review the last 4 weeks of usage.
- To obtain daily usage values, choose a Period of
1 day
.
Here is a resulting CloudWatch graph:
Figure 3: CloudWatch Graph illustrating daily GetMetricData API call volume
On the 5th February, there was a mark increase in GetMetricData API calls. As a next step, we will be analyzing API calls on this day.
3. Create an Athena table (optional)
In order to use Athena to analyze CloudTrail events, you will first need to create a table before submitting queries. Here is a guide on creating such a table.
If you are using CloudWatch Log Insights to analyze CloudTrail events, you directly query the log group containing the CloudTrail events (there is no need to create a table).
4. Use Athena/CloudWatch Log Insights to Analyze GetMetricData API calls
For ease of use, below are sample Athena and CloudWatch Logs Insights queries with their corresponding output. These queries can be extended with more complex conditions to match your use case.
a. Summarize GetMetricData calls by IAM Role/User
Using the IAM Role/User information, you can identify the relevant resources (e.g. third-party services, Lambda Functions, EC2 instances) and workflows issuing API calls at high volume.
Athena Query:
SELECT
userIdentity.arn AS IAM,
count(*) AS CallCount
FROM "default"."cloudtrail_logs_ops"
WHERE
eventname = 'GetMetricData'
AND eventtime >= '2025-02-05T00:00:00Z' AND eventtime < '2025-02-06T00:00:00Z'
GROUP BY userIdentity.arn
ORDER BY CallCount DESC
Figure 4: Using Athena to isolate IAM Roles issuing GetMetricData API calls
Log Insights Query:
SELECT `userIdentity.arn` as IAM_Role_User, count(*) AS CallCount
FROM `cloudtrail_logs_ops`
WHERE eventName = 'GetMetricData'
GROUP BY `userIdentity.arn`
ORDER BY CallCount DESC;
Figure 5: Using Log Insights to isolate IAM Roles issuing GetMetricData API calls
b. Summarize GetMetricData calls by UserAgent, Client IP Address
Similarly, querying against UserAgent and Client IP Address can pinpoint workflows driving costs.
Athena Query:
SELECT
useragent,
sourceipaddress as ClientIP,
count(*) AS CallCount
FROM "default"."cloudtrail_logs_ops"
WHERE
eventname = 'GetMetricData'
AND eventtime >= '2025-02-05T00:00:00Z' AND eventtime < '2025-02-06T00:00:00Z'
GROUP BY useragent, sourceipaddress
ORDER BY CallCount DESC
Figure 6: Using Athena to identify UserAgents & Client IP Addresses issuing GetMetricData API calls
Log Insights Query:
SELECT `userAgent`, `sourceIPAddress` as ClientIP, count(*) AS CallCount
FROM `cloudtrail_logs_ops`
WHERE eventName = 'GetMetricData'
GROUP BY `userAgent`, `sourceIPAddress`
ORDER BY CallCount DESC;
Figure 7: Using Log Insights to identify UserAgents & Client IP Addresses issuing GetMetricData API calls
c. Summarize GetMetricData calls by AWS Account ID
For multi-account use cases, it can be useful to understand the AWS Accounts issuing the most API calls.
Athena Query:
SELECT
userIdentity.accountId,
count(*) AS CallCount
FROM "default"."cloudtrail_logs_ops"
WHERE
eventname = 'GetMetricData'
AND eventtime >= '2025-02-05T00:00:00Z' AND eventtime < '2025-02-06T00:00:00Z'
GROUP BY userIdentity.accountId
ORDER BY CallCount DESC
Figure 8: Using Athena to identify AWS Account ID issuing GetMetricData API Calls
Log Insights Query:
SELECT `userIdentity.accountId` as Account_ID, count(*) AS CallCount
FROM `cloudtrail_logs_ops`
WHERE eventName = 'GetMetricData'
GROUP BY `userIdentity.accountId`
ORDER BY CallCount DESC;
Figure 9: Using Log Insights to identify AWS Account ID issuing GetMetricData API Calls
Best practices to avoid excessive CloudWatch GetMetricData API charges
Referencing the CloudWatch Pricing guide, GetMetricData is billed on the volume of metrics retrieved (e.g. $0.01 per 1000 metrics requested). Consequently, the greater the number of metrics retrieved, the more expensive the workflow becomes. To reduce or eliminate costs generated by GetMetricData API calls, consider the following:
- Limit the namespaces and metrics retrieved. Only retrieve metrics that are actively being monitored on dashboards or used for alerting.
- Only retrieve metrics from regions with active workflows.
- Revoke permissions for GetMetricData API action for IAM Roles/Users that no longer need to retrieve metrics.
- Increase the interval between API calls to reduce call volume.
- Consider migrating to CloudWatch Metric Streams. This solution will push metric data in near real-time to third party monitoring tools at a lower cost. Metric Streams also eliminates the need to manage API usage and throttling.
Cleanup
Once the entities driving costs have been identified, you can disable CloudTrail Data Events to reduce ongoing charges.
- Remove CloudWatch Metrics data events
- Navigate to CloudTrail console.
- Open the Trails page of the CloudTrail console and choose your trail name.
- Under Data events, choose Edit.
- In the Data event: CloudWatch metric section choose Remove.
Figure 10: Disabling CloudWatch metric Data event in CloudTrail
-
- Choose Save Changes.
- Delete CloudTrail Trail
- Delete Athena Table
- Update the below command with the your table name.
DROP TABLE [IF EXISTS] "default"."cloudtrail_logs_ops"
Conclusion
In this post, we showed how to gain visibility into your Amazon CloudWatch GetMetricData API usage using AWS CloudTrail and Amazon Athena. You learned how to identify high-volume API consumers and implement best practices related to reduce costs.
Want to learn more about CloudWatch cost optimization?
- Watch What is causing GetMetricData costs in Amazon CloudWatch
- Read Analyzing, optimizing, and reducing CloudWatch costs
- Explore CloudWatch Pricing for detailed pricing examples