AWS Storage Blog
Optimizing stateful storage lifecycle on AWS with Kubernetes and Salesforce
Managing storage resources efficiently in cloud environments is a challenge for organizations of all sizes. As businesses scale their operations, they often accumulate unused storage volumes that continue to generate costs without providing value. This ‘orphaned’ storage problem is particularly acute in containerized environments, where the complexity of storage lifecycle management can lead to oversight and inefficiency. Salesforce came up with a solution for this problem after identifying millions of dollars in potential annual savings from unused storage resources within the Hyperforce infrastructure.
In this joint post from AWS and Salesforce, we demonstrate how to implement effective storage lifecycle management using AWS and Kubernetes, showcasing real-world implementations that have the potential of saving millions of dollars in cost savings. We show how to configure StatefulSet Persistent Volume Claim (PVC) auto-delete features, understand storage class policies, and implement automated clean up solutions that can transform your storage management practices. Whether you’re running a small deployment or operating at Salesforce’s scale, these practices help you maintain better control over your storage resources while making sure that critical data remains protected when needed.
The challenge of orphaned Amazon EBS volumes and cost
Prior to Kubernetes 1.27, there was no built-in mechanism to automatically delete Persistent Volume Claims (PVCs) and their associated Persistent Volumes (PVs) when deleting a StatefulSet. While Kubernetes 1.27 introduced this capability, the default behavior remains unchanged, leaving PVCs, PVs, and underlying storage resources (such as EBS volumes) orphaned after StatefulSet deletion. These orphaned resources lead to unnecessary costs for businesses. Figure 1 details a Kubernetes StatefulSet’s storage model. Each Pod of the StatefulSet connects to a unique Persistent Volume Claim, which maps to a Persistent Volume and physical storage, illustrating the persistent one-to-one relationship.
Figure 1: How StatefulSets manage storage
Figure 2 shows the default behavior of deleting a StatefulSet. The Persistent Volume Claims and Persistent Volumes remain, potentially leading to orphaned EBS volumes.
Figure 2: Consequences of deleting StatefulSets without proper cleanup
At Salesforce, we noticed that teams were usually unaware of these orphaned volumes, leading to millions of dollars of costs from unused storage. Then, we identified ways for potentially saving in the millions annually within the Hyperforce infrastructure.
Salesforce solution
By implementing a two-pronged strategy, Salesforce is projected to save millions of dollars in orphaned storage costs in Hyperforce.
1. Custom automation (pre-Kubernetes 1.27)
Salesforce deployed our own custom platform-level solution using an automated job in our fleet, which involves logic to auto-delete orphaned PVCs periodically based on both Amazon CloudWatch and Kubernetes resource checks. This solution alone saved Salesforce millions in costs. We enabled this feature only in lower-end environments to avoid deleting any critical data in production environments without StatefulSet owner input.
Before taking this to production clusters, we considered implementing a more complex StatefulSet owner control mechanism, but instead waited for the native Kubernetes auto-delete feature that offers this fine-grained control.
2. Native Kubernetes auto-delete feature
Through close collaboration between AWS and Salesforce engineering teams, the StatefulSet PVC auto-delete feature was successfully validated during its beta phase, and is now generally available in Kubernetes 1.32 on Amazon Elastic Kubernetes Service (Amazon EKS). This new feature enables StatefulSet owners to automatically delete PVCs when a StatefulSet is deleted or scaled down.
This feature has a few limitations because it doesn’t take care of preexisting orphaned volumes, and doesn’t offer a solution if StatefulSet owners set PVC retention policy to Retain initially and post StatefulSet deletion/scale down decide the PVC is actually no longer needed. This is where a two pronged approach with a custom-built automation has been so effective.
Understand the Kubernetes auto-delete feature
The StatefulSet PVC lifecycle levers provided within Kubernetes offer different possible behaviors that are useful to Salesforce Hyperforce customers. For more information about this feature and other behaviors, see the official deep dive from Kubernetes.
1. PVCs not deleted (default)
By default, the PVC retention policy of StatefulSets mimics the default behavior of no deletions before the auto-delete feature was introduced.
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Retain
This behavior means that the PVCs of the StatefulSet deployments aren’t deleted even when the StatefulSet is deleted, leading to orphaned PVCs. This behavior is recommended if your PVC associated with the StatefulSet has critical data that you need regardless of whether the StatefulSet pod that claims it is present or not.
2. PVCs deleted when StatefulSet is deleted, but not when scaled down
Your PVC retention policy can be changed to automatically delete the PVCs of the StatefulSet when the StatefulSet is deleted, while keeping the PVC if the StatefulSet is scaled down.
persistentVolumeClaimRetentionPolicy:
whenDeleted: Delete
whenScaled: Retain
This behavior is recommended for workloads that want full clean up on deletion of StatefulSets, but want to retain PVCs in the event of scale down events that may happen during patching, autoscaling, or other use cases.
3. PVCs deleted when StatefulSet is deleted and when scaled down
Your PVC retention policy can be set to Delete such that the PVCs associated with the StatefulSet get deleted both during scale down and deletion:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Delete
whenScaled: Delete
At Salesforce, we expect that most stateful services running on the Hyperforce platform may benefit from this behavior when the data is no longer needed beyond the lifetime of the replica. This setting can also lead to the most amount of savings if your Storage Class reclaimPolicy is set to Delete.
4. PVCs deleted when StatefulSet is scaled down, but not when deleted
Your PVC retention policy can also be changed to automatically delete the PVCs of the StatefulSet when the StatefulSet is scaled down, while keeping the PVC if the StatefulSet is deleted.
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Delete
Although Salesforce doesn’t foresee many internal teams using this, it might be useful for workloads that don’t want to re-use PVCs during scale up, but want the data to persist during deletion. Read this Kubernetes post for possible scenarios where this might be useful.
Implementation example
Add these settings to your StatefulSet
deployment helm chart under spec.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: statefulset1
namespace: namespace1
spec:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Delete
whenScaled: Delete
Important note on Storage Classes
Although the Kubernetes auto-delete feature allows users to set PVC deletion policies during StatefulSet deletion or scale-down events, what happens to the associated PVs and EBS volumes depends on the reclaimPolicy of the Storage Class setting.
Delete
(default): Whenever the PVC is deleted, the associated PV and EBS volume are deleted as well, leading to cost savings.Retain
: Whenever the PVC is deleted, the associated PV and EBS volume aren’t auto deleted, which still leads to orphaned EBS volumes, regardless of thepersistentVolumeClaimRetentionPolicy
Figure 3 shows the impact of Kubernetes StorageClass and StatefulSet PersistentVolumeClaim
reclaim policies. Retain
in either results in orphaned volumes and Delete
in both lead to cost savings.
Figure 3: Understand the difference between ‘Retain’ and ‘Delete’ Reclaim policies to manage costs effectively
To allow for cost savings and for facilitating automated deletion of EBS volumes, both the Storage Class reclaim policy and the persistentVolumeClaimRetentionPolicy
need to be set to Delete
.
Alternative solutions
Beyond Salesforce’s two-pronged solution, you could consider periodic AWS Lambda function or manual deletion.
- Lambda function: Automate orphaned EBS volume cleanup. Read more in the AWS post.
- Manual deletion: If you don’t expect this to happen very often or for very few clusters/workloads, then use
kubectl
directly in your cluster for manual PVC deletions.
FAQs
What happens to my data when a PVC is deleted?
When a PVC is deleted, whether the associated PV and data stored on the associated EBS volume are also deleted depends on your Storage Class reclaimPolicy
. If reclaimPolicy: Delete
, then the corresponding PV and EBS volume are deleted as well. Make sure that you have appropriate backups for any critical data before deleting a PVC.
What happens to my existing orphaned volumes after enabling this Auto-Delete feature?
Enabling this feature helps clean up any future PVCs, but doesn’t help clean up preexisting ones.
Can I use the StatefulSet Auto-Delete feature with any storage classes?
Yes, the StatefulSet auto-delete feature works with any storage class that supports dynamic provisioning.
Conclusion
The approach from Salesforce demonstrates proactive stateful storage management in Kubernetes on AWS. Combining custom automation with the native StatefulSet PVC auto-delete feature provides a way to automatically manage the lifecycle of PVCs, reducing the risk of orphaned volumes and unnecessary costs. Evaluate these strategies to optimize your deployments and save money. To learn more about Amazon Elastic Block Store (Amazon EBS), visit the Amazon EBS User Guide.