A couple of years ago, Netflix offered a reward for anyone who could improve the algorithm for choosing a movie. The team that came in second took what they learned from it and created a company — Databricks. In this episode, we focus on Databricks; why it is popular, what it can be used for, and considerations for its use.
Listen to Learn
00:38 Intro to the team & topic
01:40 Compañero Shout-Outs
03:06 How Apache Spark came into existence
04:55 The architecture of Databricks
08:13 Spark is the open-source project, Databricks is an expansion and a company
09:53 What is the Apache Software Foundation?
13:05 There is no direct interface – your data has to be in-memory
16:05 Why use Databricks?
18:10 The three levels of pricing for Databricks
21:45 Carlos & Eugene’s takeaways
25:30 It can be scary to leave the Microsoft ecosystem
27:35 You have no monetary excuse to not give Databricks a try
28:51 Closing Thoughts
One of the most common uses of Databricks (machine learning is a pretty common use), but ELT: I’m going to take the data, land it in Databricks and I’m going to reshape it and possibly land that somewhere else.
Meet the Hosts
With more than 10 years of working with SQL Server, Carlos helps businesses ensure their SQL Server environments meet their users’ expectations. He can provide insights on performance, migrations, and disaster recovery. He is also active in the SQL Server community and regularly speaks at user group meetings and conferences. He helps support the free database monitoring tool found at databasehealth.com and provides training through SQL Trail events.
Eugene works as an independent BI consultant and Pluralsight author, specializing in Power BI and the Azure Data Platform. He has been working with data for over 8 years and speaks regularly at user groups and conferences. He also helps run the GroupBy online conference.
Kevin is a Microsoft Data Platform MVP and proprietor of Catallaxy Services, LLC, where he specializes in T-SQL development, machine learning, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL, president of the Triangle Area SQL Server Users Group, and author of the books PolyBase Revealed (Apress, 2020) and Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python (Apress, 2022). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.