PolyBase in SQL Server 2019
This week we heard the news of the release for SQL Server 2019 and a major focus of this release is data virtualization. In this episode, we discuss PolyBase–a topic we covered in Episode 95; however it has really come of age with SQL Server 2019. Our very own Kevin Feasel talks about some of the improvements of the feature and we discuss some of the use cases of when it will be useful and some of the pitfalls you might run into as you implement it. We discuss how PolyBase can help you reduce storage and other costs by avoiding the need for ETL processes that duplicate data in order to make it accessible from one source. Kevin is putting the final touches on his book PolyBase Revealed, and if you are looking for a deep dive into the subject, we invite you to check out his book.
PolyBase Revealed by Kevin Feasel
Kevin’s PolyBase in Action blog
Free supported Java in SQL Server 2019 is now available: Travis Wright’s blog post
“I’d rather have a central server that understands and talks to all of those and allows me to query it using a consistent language and get my results back, and that’s what PolyBase allows you to do.”
“One of the trade-offs of using PolyBase is, don’t expect it to be as fast as if you had designed everything to fit on your SQL Server instance and had appropriately architected all of the pieces.”
“Being able to share out the load of a problem, this, I think, is the ultimate ideal for PolyBase.”
Listen to Learn
00:38 Intro to the team & topic
01:23 Compañero Shout-Outs
02:52 What is PolyBase?
06:42 The world has changed so much, even in a couple years
11:24 The further away your data is, the slower it will be to work with
16:10 It might be worth the extra seconds waiting to save a lot more time and effort
19:46 Let’s talk about Java – do we need it?
24:36 The differences between PolyBase and linked servers
26:30 There is a role for an administrator in security
29:24 Where Kevin sees this going
30:47 Using other technologies with PolyBase
32:06 Last thoughts on PolyBase
33:00 SQL Family Questions
36:18 Closing Thoughts
About Kevin Feasel
Kevin Feasel is a Data Platform MVP and Engineering Manager of the Predictive Analytics team at ChannelAdvisor, where he specializes in T-SQL and R development, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com) and the author of PolyBase Revealed (forthcoming). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather’s nice enough.
Meet the Hosts
With more than 10 years of working with SQL Server, Carlos helps businesses ensure their SQL Server environments meet their users’ expectations. He can provide insights on performance, migrations, and disaster recovery. He is also active in the SQL Server community and regularly speaks at user group meetings and conferences. He helps support the free database monitoring tool found at databasehealth.com and provides training through SQL Trail events.
Eugene works as an independent BI consultant and Pluralsight author, specializing in Power BI and the Azure Data Platform. He has been working with data for over 8 years and speaks regularly at user groups and conferences. He also helps run the GroupBy online conference.
Kevin is a Microsoft Data Platform MVP and proprietor of Catallaxy Services, LLC, where he specializes in T-SQL development, machine learning, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL, president of the Triangle Area SQL Server Users Group, and author of the books PolyBase Revealed (Apress, 2020) and Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python (Apress, 2022). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.