AWS Partner Network (APN) Blog
Streamline Unified Data Governance with AWS Lake Formation and Dremio
By Saurabh Shanbhag, Sr. Partner Solutions Architect – AWS
By Leon Stigter, Sr. Technical Product Manager – AWS
By Shrirang Kamat, Director of Product Management – Dremio
![]() |
![]() |
Customers are building large data lakes on Amazon Web Services (AWS) to democratize their access to data. As a result of that, data governance becomes increasingly important. Customers need to know data is accessed at the right time, by the right people, and in the right context. To implement fine-grained data access permissions, customers use AWS Lake Formation. AWS Lake Formation provides data access controls for AWS services like Amazon Redshift, Amazon Athena, and Amazon EMR. It also offers data access controls for AWS Partners like Dremio.
Dremio offers an Open Data Lakehouse platform that accelerates data analytics across diverse data sources. It provides a high-performance SQL query engine that efficiently queries data from cloud storage, databases, and various file formats. With its distributed execution and advanced caching techniques, Dremio delivers ultra-low latency query performance on large datasets. Dremio has recently added support for AWS Lake Formation data governance framework for secure and controlled data access. This integration ensures Dremio is compliant with permissions on Data Catalog resources, which include tag-based access control, data filtering, and cell-level security permissions established in AWS Lake Formation.
This post details how a financial services customer leveraged Dremio and AWS Lake Formation to establish consistent governance, eliminate data silos, and achieve fast analytics.
Why is this integration important?
One of Dremio’s customers, a Fortune 100 financial services organization, needs to effectively balance the imperatives of data access and control to meet stringent regulatory requirements while optimizing data value. This organization efficiently manages data, ensures compliance, and unlocks the full potential of its data resources, by addressing data risks and implementing governing best practices.
Challenges
A Fortune 100 financial services organization faced three main challenges. First, their data was severely underutilized due to data silos, despite its potential value in individual business processes. The organization needed the ability to effectively share and combine data assets to unlock additional potential across the organization.
Secondly, operating within the highly-regulated financial services industry, the organization had to manage data risks and implement robust governance and access controls to prevent potential compliance transgressions.
Finally, the organization struggled with fragmented data analytics. They required a unified data analytics platform that would enable data consumers to better comprehend their data and gain deeper insights. This platform needed to support better data-driven decisions through low-latency reports and dashboards, regardless of whether the data resided in an AWS data lake or other relational/non-relational sources, whether in the cloud or on-premises.
Solution
This financial institution used AWS Lake Formation and AWS Glue Data Catalog for centralized data administration and fine-grained access control to overcome these challenges; Dremio enabled low-latency analytics within a Data Mesh architecture.
Implementation
The organization adopted a data sharing architecture inspired by the concept of a data mesh. They defined data products curated by experts who understood the nuances, management requirements, permissible uses, and limitations of the data. This approach enabled better data governance and management. The organization utilized AWS Lake Formation to centralize and manage granular, fine-grained access control for data sources in the AWS Glue Data Catalog, ensuring regulatory compliance and improving data governance. To enable efficient and data-driven decision-making, the organization implemented Dremio’s high-performance SQL engine for unified self-service analytics across AWS and other data sources. Dremio provided sub-second performance for consistent analytics across all data sources. Dremio was also instrumental in maintaining a consistent governance model for the customer. It inherited and incorporated the granular permissions defined by AWS Lake Formation for data management into the governance and access policies of non-Glue-managed data sources.
How it works?
Dremio adheres to the workflow shown in Figure 1 each time an end user attempts to access, edit, or query datasets with AWS Lake Formation managed privileges.
Figure 1 – Workflow of Dremio integration with AWS Lake Formation
As a prerequisite connect an external identity provider (IdP) through the Security Assertion Markup Language (SAML) 2.0 protocol to IAM Identity Center.
- User authenticates through AWS IAM Identity Center and runs a query in Dremio.
- Dremio checks each table in the query to determine if they are configured to use Lake Formation for security. If one or more datasets leverage Lake Formation, Dremio determines the IAM identifiers, specifically User or Group Amazon Resource Names (ARNs) associated with the IAM Identity Center user.
- Dremio makes a ListPermissions and ListDataCellsFilter API call to Lake Formation for the table.
- AWS Lake Formation returns the list of permissions for the table being queried.
- Permissions are cached in a permission cache to improve performance.
- Dremio validates that user ARN has SELECT Lake Formation permissions.
- If the user does not have permission, the query is rejected with a permission error.
- If authorized, Dremio reads the underlying data from Amazon Simple Storage Service (Amazon S3).
- Amazon S3 returns the data to Dremio.
- Dremio returns the query results to the end user.
Benefits
The integration of Dremio with AWS Lake Formation benefited the organization across multiple fronts. By eliminating data silos and facilitating the exchange and integration of data across various business processes, the organization unlocked latent potential of the data. This led to leading to improved strategic insights and decision-making.
The organization implemented stringent data governance and access controls using AWS Lake Formation. This helped mitigate the organization’s exposure to regulatory risks and avoided potential penalties. Additionally, the organization used Dremio’s unified analytics software to consistently performed low-latency analytics across all their data sources, whether cloud-managed or on-premises.
The adoption of a data mesh architecture provided the necessary flexibility and scalability. This allowed for the integration of a variety of data sources while simultaneously adhering to governance and control standards.
Democratization with Control
The convergence of data governance and analytics showed in this solution reflects broader industry shifts. While data democratization drives innovation, it must be balanced with proper data governance controls. Leading institutions are moving toward “controlled democratization” – where access is broad but governed and audited.
This solution achieves this balance through several key mechanisms:
- Centralized Governance: AWS Lake Formation simplifies data lake governance by centralizing data security and governance.
- Granular Access Control: Fine-grained permissions ensure users have access to the right data down to the row and column level.
- Performance without Compromise: Dremio unified analytics platform delivers low-latency analytics while maintaining compliance with Lake Formation’s governance policies, proving that strong controls need not impede performance.
- Audit and Visibility: Lake Formation tracks data interactions by role and user, and it provides comprehensive data access auditing to verify the right data was accessed by the right users at the right time.
This blueprint shows how organizations can achieve the dual objectives of democratizing data access while maintaining robust governance controls.
Conclusion
We see how financial institutions can evolve from traditional data management approaches to modern data architectures without compromising security or compliance. The financial services organization not only solved its immediate challenges, but also established a foundation for future data initiatives that can adapt to evolving regulatory requirements and business needs.
For organizations facing similar challenges in regulated industries, this implementation provides a blueprint for balancing data democratization with governance, while still offering low-latency analytics. Dremio’s analytics and AWS Lake Formation’s governance create a reliable solution for organizations wanting to fully utilize their data while keeping it secure.
Dremio software release 25.1 and later offer this capability. Dremio is the industry’s leading engine for the Data Lakehouse with Apache Iceberg table format. For additional information regarding the Dremio Unified Data Lakehouse engine, please click here.
Dremio – AWS Partner Spotlight
Dremio is an AWS Data and Analytics Competency Partner and AWS Marketplace Seller. Dremio offers an Open Data Lakehouse platform that accelerates data analytics across diverse data sources. It provides a high-performance SQL query engine that efficiently queries data from cloud storage, databases, and various file formats. With its distributed execution and advanced caching techniques, Dremio delivers ultra-low latency query performance on large datasets.