OneLake - A Deep Dive

In Episode 281, we introduced Microsoft OneLake with a high-level overview. Now we're going deeper with a discussion on the Parquet format, why Microsoft went with the Delta Lake variation, and what Delta Lake format brings to the table (no pun intended). We'll also examine some "behind the scenes" aspects of file management, and why you'll still be using the GUI to create most of your objects.

Onelake is Microsoft's solution to the demand for centralizing all data in one location, eliminating the need to transfer it across multiple systems. We expect this to play out further however, when we consider scenarios like data sovereignty, geographical data distribution, separation of subsidiary data, and even departmental budgets that may necessitate multiple instances of OneLake.

We round out our OneLake deep dive with a conversation on the Direct Lake Mode option for importing data into Power BI and Eugene shares his perspective on why everyone may not be rushing to jump on the bandwagon just yet.

We hope you enjoyed this deep dive into Microsoft OneLake! If you have questions or comments, please send them our way. We would love to answer your questions on a future episode. Leave us a comment and some love ❤️ on LinkedIn, X, Facebook, or Instagram. Thank you for listening!

And this has formed the basis of the Kimball model that we all know and love. If you don’t love it, you probably should love it; I recommend loving it. - Kevin

Now we can actually leverage everything over that same dataset. I can do SQL, no problem; [Kevin] can do Spark, no problem; Eugene wants to do Databricks, no problem. - Carlos

Meet the Hosts

carlos chacon headshot

Carlos Chacon

With more than 10 years of working with SQL Server, Carlos helps businesses ensure their SQL Server environments meet their users’ expectations. He can provide insights on performance, migrations, and disaster recovery. He is also active in the SQL Server community and regularly speaks at user group meetings and conferences. He helps support the free database monitoring tool found at databasehealth.com and provides training through SQL Trail events.

eugene meidinger headshot

Eugene Meidinger

Eugene works as an independent BI consultant and Pluralsight author, specializing in Power BI and the Azure Data Platform. He has been working with data for over 8 years and speaks regularly at user groups and conferences. He also helps run the GroupBy online conference.

kevin feasel headshot

Kevin Feasel

Kevin is a Microsoft Data Platform MVP and proprietor of Catallaxy Services, LLC, where he specializes in T-SQL development, machine learning, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL, president of the Triangle Area SQL Server Users Group, and author of the books PolyBase Revealed (Apress, 2020) and Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python (Apress, 2022). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.

Want to Submit Some Feedback?

Did we miss something or not quite get it right? Want to be a guest or suggest a guest/topic for the podcast?

Let's find what you're looking for