Data Lakehouse vs Data Warehouse vs My House
Microsoft Fabric offers two enterprise-scale, open-standard format workloads for data storage: Warehouse and Lakehouse. Which service should you choose? In this episode, we dive into the technical components of OneLake, along with some of the decisions you’ll be asked to make as you start to build out your data infrastructure.
These are two good articles we mention in the podcast that could help inform your decision on the services to implement in your OneLake.
- Microsoft Fabric Decision Guide: Choose between Warehouse and Lakehouse - Microsoft Fabric | Microsoft Learn
- Lakehouse vs Data Warehouse vs Real-Time Analytics/KQL Database: Deep Dive into Use Cases, Differences, and Architecture Designs | Microsoft Fabric Blog | Microsoft Fabric
We hope you enjoyed this conversation on the nuances of data storage within Microsoft OneLake! If you have questions or comments, please send them our way. We would love to answer your questions on a future episode. Leave us a comment and some love ❤️ on LinkedIn, X, Facebook, or Instagram. Thank you for listening!
No one at a white board is going to be making decisions that makes it clear which one you should do based off of this. No one starts designing an application, designing a datalake, or whatever they want to use and say, the number one feature we need is multi-table transactions. - Eugene
The whole point of warehouses was to have ETL jobs in a single sole source of (truth? proof?) of how we push that data through so some yokel can't go and mess up a dimension. - Kevin
If you want to live in T-SQL, you can't really do it in the Lakehouse. you can query in T-SQL, you can have a summer home in T-SQL, but you can't live there all year. - Kevin
Meet the Hosts
Carlos Chacon
With more than 10 years of working with SQL Server, Carlos helps businesses ensure their SQL Server environments meet their users’ expectations. He can provide insights on performance, migrations, and disaster recovery. He is also active in the SQL Server community and regularly speaks at user group meetings and conferences. He helps support the free database monitoring tool found at databasehealth.com and provides training through SQL Trail events.
Eugene Meidinger
Eugene works as an independent BI consultant and Pluralsight author, specializing in Power BI and the Azure Data Platform. He has been working with data for over 8 years and speaks regularly at user groups and conferences. He also helps run the GroupBy online conference.
Kevin Feasel
Kevin is a Microsoft Data Platform MVP and proprietor of Catallaxy Services, LLC, where he specializes in T-SQL development, machine learning, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL, president of the Triangle Area SQL Server Users Group, and author of the books PolyBase Revealed (Apress, 2020) and Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python (Apress, 2022). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.
Want to Submit Some Feedback?
Did we miss something or not quite get it right? Want to be a guest or suggest a guest/topic for the podcast?