Best practice configuration of Amazon FSx for NetApp ONTAP for Microsoft SQL Server workloads

When it comes to running enterprise workloads, Microsoft SQL Server sits at the heart of many of these solutions providing the data needed to drive important business decisions. However, what is often not considered is the storage service underpinning the delivery of that critical data, and with the right tools and management applied, how that storage service can bring even more agility and flexibility to Microsoft SQL server deployments.

Amazon FSx for NetApp ONTAP (FSx for ONTAP) file system is a fully managed shared storage service, built on NetApp’s popular ONTAP file system. It provides users access to ONTAP’s enterprise storage features and services, such as thin provisioning, data deduplication, and the Snapshot functionality that drives NetApp’s agility in storage provisioning and data protection. For example, users can create space efficient replicas of databases regardless of size within just a few seconds, share these databases out to non-production development environments, and when the replicas are done, delete them with no impact to production.

For users running Microsoft SQL Server on Amazon Elastic Cloud (Amazon EC2), FSx for ONTAP is fully capable of delivering disk volumes for block based consumption over iSCSI and NVMe-over-TCP. In this post we take a deeper dive into the best practices and configuration elements of the FSx ONTAP file system when it comes to deploying Microsoft SQL Server (MS-SQL) on Amazon EC2. We look at the following:

FSx for ONTAP service deployment options.
High level architectural components and considerations within FSx for ONTAP.
Low level features within ONTAP and their impact on deployments.
Volume layout and MS-SQL database architecture.
Performance and cost.

FSx for ONTAP service deployment options

The following deployment options are available for FSx for ONTAP.

Single-Availability Zone (Single-AZ) and Multi-Availability Zone (Multi-AZ) file systems

When choosing a deployment option for FSx for ONTAP, just after you give your new file system a name, the first decision is the deployment type: Single or Multi-AZ (Availability Zone). This determines how the active and passive file server that makes up the FSx ONTAP file system is deployed.

Figure 1: FSx for ONTAP Single- and Multi-AZ deployment

Figure 1: FSx for ONTAP Single- and Multi-AZ deployment

As seen in figure 1, all FSx for ONTAP file systems are deployed with a primary and standby file server considered a high-availability (HA) pair presenting a shared storage for SQL Server deployment. There are a number of reasons why the deployment type is important:

Cost vs. Risk
- Multi-AZ has a higher cost than Single-AZ. However, Single-AZ has data stored in one AZ, while Multi-AZ preserves two copies of the data across two AZs which offers a higher degree of availability and durability.
Complexity
- FSx for ONTAP and Microsoft SQL Server in the same or adjacent AZ.
Performance
- Multi-AZ file systems have a higher write throughput limit compared to Single-AZ file systems.

What is common across both deployment types is that data replication, specifically calling out writes to the FSx for ONTAP file system, are synchronous. This means that writes to the file servers are only acknowledged back to the client (SQL Server) when the data is secure on both file servers. The way in which the FSx for ONTAP service is deployed in a Region and AZs, as compared to an on-premises single or multi data center deployment, gives the FSx for ONTAP Single-AZ and Multi-AZ deployments equal or better data availability and durability than a single or multi-data center implementation respectively.

Failover and failback between FSx for ONTAP file servers in the event of a service failure are fully managed by the FSx for ONTAP service, regardless of the deployment type. The data across the FSx for ONTAP nodes is consistent with no data loss.

SQL Server considerations: FSx for ONTAP offers the same service features and functionality in both the Single-AZ and Multi-AZ deployment types. You must determine what the requirements are for the SQL Server deployment and how FSx for ONTAP can support these. For testing, development, or disaster recovery data set, given the nature of the requirements, a Single-AZ deployment offers a more cost-efficient solution with lower levels of complexity, while accepting the risk to the workload wouldn’t impact production. Conversely, Multi-AZ deployment offers the highest levels of data availability for a production system, which are essential for mission critical workloads.

High level architectural components and considerations within FSx for ONTAP

If you are familiar with the NetApp ONTAP file system, then this section should come as no surprise. However, if you are new to NetApp and ONTAP, then we cover the high-level file system concepts and how they build up to offering data storage services to an MS-SQL server, as shown in Figure 2.

Figure 2: FSx for ONTAP resources

Figure 2: FSx for ONTAP resources

The ONTAP file system is what provides the features and functionality of the Amazon FSx for NetApp ONTAP service. It is delivered as a fully managed service by AWS, and it orchestrates the policies that govern performance, data protection, capacity management, storage tiering, and other capabilities of the service. The file system itself is a flexible and scalable deployment that provides a level of performance dependent on the selected throughput capacity, which can be adjusted both up and down as your workload dictates.

Running on the ONTAP file system are Storage Virtual Machines (SVMs) which perform the function of a virtual storage controller. The SVM is responsible for hosting the IP addresses that clients connect to that allow access to data stored on the file system. Multiple SVMs can be hosted on a single file system, providing a layer of logical data separation between discreet workloads in a multi-tenant organization. For example, an SVM used for CIFS / SMB access can only be joined to one Active Directory domain at a time, but multiple SVMs on a single file system may each join separate domains. Each SVM consumes IP addresses from the specified range, and the number of IP addresses can be different considering the deployment type.

Finally, volumes are hosted by SVMs, and are where block disks and file shares are provisioned and advertised to the SQL Server Servers, as shown in the following figure. Each volume may host a CIFS / SMB file share, NFS mount, iSCSI or NVMe LUN, or multiple combinations of each. For SQL workloads it is recommended to provision one Volume for each iSCSI LUN that you intend to present to the SQL Server.

Figure 3: Volumes for file and disk

Low level features within ONTAP and their impact on deployments

In the following section we cover IP addresses, SVMs, volumes, storage efficiency, volume security style, and Snapshot policy.

IP addresses

When an FSx for ONTAP file system is deployed it needs a number of IP addresses for ONTAP system functionality. The FSx for ONTAP file system selection and provisioning of IP addresses is managed by AWS as part of the fully managed service. These IP addresses include a management IP and two inter-cluster IPs that can be used at a later date for FSx for ONTAP data transfers between file systems as shown in Figure 4.

Figure 4: FSx for ONTAP endpoint IP addresses

Figure 4: FSx for ONTAP endpoint IP addresses

Each SVM has its own set or resources when it’s created and these must be considered when planning the creation of the FSx for ONTAP file system. For example, a set of IP addresses are required for management, file, and block access, which are automatically selected and deployed by AWS as part of the managed service.

These IP addresses (and DNS names) can be found under the Storage Virtual Machine > Endpoints tab, as shown in figure 5.

Figure 5: SVM endpoint IP addresses

Figure 5: SVM endpoint IP addresses

In total, an FSx for ONTAP file system with one SVM needs a minimum of six IP addresses: three per FSx for ONTAP system, and three per SVM. Each further SVM consumes three IP addresses from the pool. Therefore, you must have enough free IP addresses in the VPC in which the FSx for ONTAP file system resides to meet the demand. Having a secondary CIDR in the VPC dedicated for FSx for ONTAP is a good option.

SVMs

Microsoft SQL Server considerations: How many SVMs are needed is a frequently asked question, and often, the answer is that you only require a single SVM if you only have one SQL Server Server connecting to FSx for ONTAP. However, if you are running multiple SQL Server workloads with various degrees of importance, with multiple Servers and access restriction requirements, then you may choose to separate logical access to the file system by increasing the number of SVMs.

There is a limit to the number of SVMs that can be created on an FSx for ONTAP file system depending on the amount of throughput capacity allocated. Go to this FSx for ONTAP guide to observe the current maximums.

The following factors need to be considered when deploying an SVM:

Quantity
- A single, high throughput SQL Server Server may take up all of the FSx for ONTAP file system resources, in which case a single SVM is sufficient.
- Many low to medium throughput SQL Server Servers can share a single SVM, or if there is a tenancy requirement (see the following section) it can be allocated an SVM per SQL Server instance within the previous limits.
Tenancy
- An SVM is an isolated file server with its own administrative credentials using vsadmin user, endpoints, and unique IP addresses. For example, you can join an SVM to a single Active Directory for authentication purposes.
- If SQL Server workloads are sensitive and need strict segregation, then a separate SVM per SQL Server workload can be deployed.
Quality of Service
- Linked to performance, Quality of Service (QoS) enables FSx for ONTAP to consistently deliver predictable performance to multiple applications and eliminate noisy neighbor applications. QoS policies can be set against SVMs to make sure that noisy neighbors don’t hog resources, and that minimum performance requirements for other workloads are met.

QoS policies can also be applied to volumes, LUNs (covered in a later section), and files.

Volumes

Volumes are a logical container assigned to an SVM. They contain the data that consumes the physical storage of the FSx for ONTAP file system. When a volume is created, it’s typically a read-writable (RW) volume, but it can be a Data Protection (DP) volume that is a read only destination for ONTAP SnapMirror or SnapVault relationship. SnapMirror is an FSx for ONTAP embedded capability to replicate a volume using highly efficient block-level replication across ONTAP file systems. For now, we will focus on volumes of a read-writable nature where SQL Server data is stored, as shown in figures 6 and 7.

Figure 6: Choose your FSx for ONTAP SVM

Figure 6: Choose your FSx for ONTAP SVM

Figure 7: Volume configuration details

Figure 7: Volume configuration details

Starting with the Volume name, it is often useful to give the volume a meaningful name while equally balancing this out with the character length. The volume name has no impact on the form or function of the volume other than to make it easier to identify when it comes to management operations.

Volume style is defaulted to FlexVol with an option to choose FlexGroups. FlexVol is the most common option to choose for FSx ONTAP file systems, with a minimum volume size of 20 MiB scaling up to 300 TBs. FlexGroups enable greater scaling from a minimum of 100GiB up to 20 PiBs and they are tailored to NAS Protocols such as NFS or SMB.

SQL Server considerations: Unless there is an immediate requirement for storage requirements in excess of 300 TBs per volume, FlexVol is the recommended volume style. The 300 TB per volume limit is per volume, and multiple volumes can be created per SVM and aligned to SQL Server Servers.

Storage efficiency

Storage efficiency on a volume means that any physical capacity that can be saved by logically reducing duplicate data, compressing large data sets, and efficiently storing blocks of data together is done by the FSx for ONTAP file system. These settings are disabled by default when creating a volume.

SQL Server considerations: We recommend that FSx for ONTAP storage efficiencies are enabled for SQL Server deployments.

Volume security style

In the case of SQL Server, if database files are hosted on SMB shares, then a volume security style of NTFS is recommended. For volumes hosting iSCSI LUNs, the security style can be left as default.

SQL Server considerations: Running into the next section on snapshots and ultimately data protection, NetApp SnapCenter is the single pane of glass application from NetApp that consolidates all NetApp storage controllers, on-premises and FSx for ONTAP, to take application consistent snapshots of SQL Server volumes/data. A traditional backup represents the whole database at some point in time. To create this backup, either if it’s full or an incremental backup, there are data copy processes involved in taking the necessary time, space, and impact to be run. FSx for ONTAP allows for an application-consistent backup to be triggered using the Snapshot technology on the storage level, orchestrated by the SnapCenter with the plugins, thus creating point-in-time application consistent snapshots, which are faster to create with no impact on production.

NetApp SnapCenter streamlines backup, restore, and clone operations. It’s a single pane of glass with lightweight plugins capable of creating fast, space-efficient, and application-consistent snapshots.

To create application-aware Snapshots, NetApp SnapCenter should be used.

Snapshot policy

Snapshots (native) are space efficient, point-in-time copies of the data that allow for fast volume recovery and cloning. Snapshots have a dependency on the source data, thus they shouldn’t be considered a backup (off file system, fully independent, recoverable copy of the data). Instead they should be for recovery and database clones, because snapshots can enable rapid recovery and database copies in seconds, regardless of the database size. The default snapshot policy creates the following snapshots:

A maximum of six hourly snapshots taken five minutes past the hour.
A maximum of two daily snapshots taken Monday through Saturday at 10 minutes after midnight.
A maximum of two weekly snapshots taken every Sunday at 15 minutes after midnight.

These snapshots do not initially consume storage, but the delta changes between the previous and the next snapshot do. For example, if you have hourly snapshots and take a snapshot at 09:00 and write/modify 10 GB over the course of the next hour, then at 10:00 10 GB of storage capacity is consumed in addition to the capacity prior to the taking of the snapshot. If this data delta is consistent (10 GB), then across the default snapshot policy (six hourly snapshots taken five minutes past the hour) a total of 60 GB of storage capacity is consumed. This must be considered during FSx for ONTAP file system storage capacity allocation and volume creation, because it’s possible to run out of space, which can cause errors and snapshots to fail.

SQL Server considerations: From the preceding example, if the SQL Server database has a high churn rate, then the deltas between snapshots can be very high. Snapshots should be considered against the SQL Server workload with the snapshot policy applied. A high churn rate doesn’t negate having a snapshot policy enabled, but it does necessitate consideration of the storage capacity consumed and the cost to do so (snapshot autodelete is a feature of ONTAP that can delete snapshots when the volume is low on space).

If applicable, the recommendation is to disable the default snapshot policy and use NetApp SnapCenter to coordinate volume/database snapshots.

Performance and cost

FSx for ONTAP has two storage tiers: SSD and capacity pool. The SSD tier, as the name suggests, consists of SSD disks that form the performance tier of the FSx for ONTAP file system, delivering sub-millisecond latency and thousands of IOPs for the active portion of a workload. The SSD tier storage capacity is assigned at the creation of the FSx for ONTAP file system and can be increased at a later date as needed.

The capacity pool tier is an elastic storage tier that can scale to petabytes in size and is for less frequently accessed, colder data that doesn’t need the latency and IOPs of the active workload. The combination of the SSD tier and capacity pool tier allows for a cost optimized approach to data tiering while keeping the data location, availability, management, and feature set all under the control of the FSx for ONTAP file system.

For performance, FSx for ONTAP incorporates a unique in-memory cache and NVMe cache, further enhancing frequently accessed data performance with low latency.

The TieringPolicy for a volume can be decided on based on the native ONTAP tiering policies:

SNAPSHOT_ONLY: (Default value) moves cold snapshots to the capacity pool storage tier.
AUTO: moves cold user data and snapshots to the capacity pool storage tier based on your access patterns.
ALL: moves all user data blocks in both the active file system and Snapshot copies to the storage pool tier (not recommended for database workloads).
NONE: keeps a volume’s data in the primary storage tier, preventing it from being moved to the capacity pool tier.

The AUTO and SNAPSHOT_ONLY tiering policies can have the CoolingPeriod adjusted to reflect when the data can be considered cold. This ranges from between 2 and 183 days, with the defaults being 31 days for AUTO and 2 days for SNAPSHOT_ONLY.

When to consider tiering data

Active/hot data needs to be in the FSx for ONTAP SSD tier to achieve the performance that is needed for an enterprise database solution. However, over time, there is the potential that not all of the data remains active/hot and should be up for consideration of tiering down to a slower, more cost-efficient tier. The following are a number of examples of where tiering could be applicable.

Long running databases: These contain years of data, which is rarely accessed, but needs to be immediately available on request. An AUTO tiering policy on the volume of 180 days would see data that wasn’t accessed over six months moved to the capacity pool.

Log files: These are rarely accessed once written, but they must be available for any kind of database restore. Therefore, a tiering policy of 14 days would see log files moved from the SSD tier, where they were used for quick database commits, down to the capacity pool after two weeks.

Database backups: These full copies of the database are very intensive on space consumption and should be moved down to the capacity pool tier as soon as possible.

SQL Server considerations: As data ages and becomes less frequently accessed, consuming any space in the performance tiers of a storage system effectively becomes a cost point with no tangible benefit. Therefore, this aged data should be tiered down to cheaper storage tiers. Using the preceding tiering policies and understanding the SQL Server workload patterns allows both data .MDF files and the log .LDF files, as well as any backup files, to be in contention for capacity pool tiering. The default SNAPSHOT_ONLY policy is strongly recommended for SQL Server workloads, while the AUTO and ALL policies are recommended for unstructured and archival file data, respectively.

Clones

Most organizations have multiple database environments, such as production databases and more databases for development, testing, pre-production, schema validation.

Typically, each further copy needs the same amount of storage as the original. However, FSx for ONTAP offers a different approach.

FSx for ONTAP creates Snapshots, which are read-only, point-in-time images of data. These Snapshots are lightweight and created instantly.

Snapshots are the foundational technology for creating multiple environments in seconds with a near-zero storage footprint. A cloned volume is a lightweight point-in-time and fully functional copy of a volume. Changes are written within the cloned volume as the new volume is presented to the new environment. Changes made on the parent volume don’t affect the clone. The FlexClones allow the sharing of the same subset of the data (without duplicating it) across environments such as Sandbox, DEV, or Quality without impacting production.

Volume layout and MS-SQL database architecture

FSx for ONTAP provides file, block, and object storage options in the same service. Your SQL Server can be configured for SMB and iSCSI connectivity from the SQL Server Server to FSx for ONTAP. Read more about optimizing protocol selection when using FSx for ONTAP for Microsoft SQL Server.

The best practices for SQL Server data layout from NetApp and Microsoft recommend having the data and log files on separate drives (FSx for ONTAP iSCSI LUNs). The FSx for ONTAP guidance is no different, and having the data (.MDF files) and log (.LDF files) on separate drives allows the performance, manageability, and feature usage of FSx for ONTAP and SQL Server to be optimized and deployed correctly first time. In order to achieve this, multiple volumes must be created, each hosting iSCSI LUN or LUNs that are mounted to the clients.

iSCSI LUNs

Logical Unit Numbers (LUNs) are how ONTAP advertises block storage to compute hosts. Although LUN is the technically accurate term for how FSx for ONTAP presents block based storage to EC2 instances in AWS, the Windows Operating System interprets all FSx for ONTAP LUNs and EC2 Elastic Block Store (EBS) volumes in the same way: raw block-based disks that store data. Under the Windows Disk Management Utility, the disk (LUN) can be initialized, formatted, and assigned a drive letter.

Figure 8: FSx for ONTAP to MS-SQL drive mapping

Figure 8: FSx for ONTAP to MS-SQL drive mapping

LUN to volume to SQL server mapping

LUNs reside in a FlexVol volume, and there are a number of volume options we can choose from regarding volume style, type, efficiency, snapshot, and storage tiering settings. Considering all of this, the most common recommendations for FlexVol volumes and LUNs on FSx for ONTAP are as follows:

A one-to-one FlexVol to LUN mapping for SQL Server data and log files
- 1 x FlexVol containing 1 x LUN for database .MDF files
- 1 x FlexVol containing 1 x LUN for log .LDF files
A volume style of FlexVol
A volume type read-writable
Volume efficiency set to enabled
Snapshots disabled if SnapCenter is being used
- Have the workload understood to avoid free space errors if using a snapshot policy
Snapshot reservation set to 0%
LUN reservation set to enabled
Snapshot autodelete set to enabled – oldest_first
Volume autosize set to on – autogrow
Set volume tiering policies to SNAPSHOT ONLY

Figure 9: LUN to volume to MS-SQL server mapping

Figure 9: LUN to volume to MS-SQL server mapping

To connect the EC2 instance with the LUN in FSx for ONTAP, the host must have a unique iSCSI Initiator. This IQN (iSCSI Qualified Name) is used by FSx for ONTAP to Map/Announce the LUNs accordingly to the Amazon EC2 Host. The iSCSI Initiator establishes several iSCSI sessions with the FSx for ONTAP Targets. In a scale-up deployment, whether Single-AZ or Multi-AZ, there are two targets, with one per node.

EC2 instance selection

The key design element for choosing the right EC2 instance for your SQL Server workload is understanding the change from the performance considerations to network performance considerations. This means that when selecting an instance family and size, the “n” designation for high network throughput (for example, r6in vs r6i) should be a primary consideration along with the vCPU and memory, and not the Amazon EBS capabilities. Sending your FSx for ONTAP storage traffic across the network has seen significant sizing reductions for many users. Coupling Network Capabilities with our Memory Optimized instances is going to deliver the best overall performance for your SQL Server workloads when using FSx for ONTAP.

Conclusion

Amazon FSx for NetApp ONTAP is a storage service from AWS that offers enterprise class storage management features to SQL Server workloads. This brings decades of NetApp ONTAP experience and integration with SQL Server to the AWS database operations of users. A highly available file system that can span AZs with automatic failover and failback, customization, and segregation with SVMs, flexible capacity and efficiency features, and full integration with NetApp BlueXP and SnapCenter for enterprise scale management and data protection respectively.

FSx for ONTAP and NetApp BlueXP can also further optimize SQL Server workloads deployments, such as lowering its costs using SMB or iSCSI Protocol.

Thank you for reading this post. Leave any comments or questions in the comments section.

AWS Storage Blog