Where should I put my data?

As data size expands and the way we interact with data changes, in many cases we will need more than one way to store and access that data. Numerous products have entered the data storage market to solve particular pain points and in this episode we discuss several of the data storage technologies currently available on the market.

The expansion of data and increased expectations the businesses has for analysis and modeling of data we may need more than one storage type to meet those expectations. As data professionals, it is incumbent upon us to understand how these tools work and put them to their best use–before somebody else puts them to sub-optimal use and we are stuck supporting them. I am joined by Kevin Feasel, a previous guest on the show, who walks us through some of the technologies available and sorts out under what circumstances we want to consider using each one.

In this episode, Kevin gives us his definitions for big and small data and looking to get the right technology for our needs. Let us know if you agree in the comments below!

We touch upon the following technologies in the episode:

Relational Database
Multidimensional Database
Hadoop Cluster
Columnstore Database
In-Memory Cache
Key-Value Database
Document Database

Kevin On Twitter
CuratedSQL.com
Episode 13 – The Apply Operator
SSMSBoost

Our Guest

Kevin Feasel

Kevin Feasel is a Data Platform MVP and Engineering Manager of the Predictive Analytics team at ChannelAdvisor, where he specializes in T-SQL and R development, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com) and the author of PolyBase Revealed (forthcoming). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather’s nice enough.

We have this great expansion of data requirements and data storage mechanisms . . . but there are some major difficulties with a relational database.

Kevin Feasel

Meet the Hosts

Carlos Chacon

With more than 10 years of working with SQL Server, Carlos helps businesses ensure their SQL Server environments meet their users’ expectations. He can provide insights on performance, migrations, and disaster recovery. He is also active in the SQL Server community and regularly speaks at user group meetings and conferences. He helps support the free database monitoring tool found at databasehealth.com and provides training through SQL Trail events.

Eugene Meidinger

Eugene works as an independent BI consultant and Pluralsight author, specializing in Power BI and the Azure Data Platform. He has been working with data for over 8 years and speaks regularly at user groups and conferences. He also helps run the GroupBy online conference.

Kevin Feasel

Kevin is a Microsoft Data Platform MVP and proprietor of Catallaxy Services, LLC, where he specializes in T-SQL development, machine learning, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL, president of the Triangle Area SQL Server Users Group, and author of the books PolyBase Revealed (Apress, 2020) and Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python (Apress, 2022). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.

Want to Submit Some Feedback?

Did we miss something or not quite get it right? Want to be a guest or suggest a guest/topic for the podcast?

Let Us Know

Name

Message

utm_campaign

utm_source

utm_term

utm_medium

Base Is Spam?

CleanTalk Is Spam?

Where should I put my data?

We touch upon the following technologies in the episode:

Our Guest

Kevin Feasel

Or Follow us!

Meet the Hosts

Carlos Chacon

Eugene Meidinger

Kevin Feasel

Want to Submit Some Feedback?