Where should I put my data?
As data size expands and the way we interact with data changes, in many cases we will need more than one way to store and access that data. Numerous products have entered the data storage market to solve particular pain points and in this episode we discuss several of the data storage technologies currently available on the market.
The expansion of data and increased expectations the businesses has for analysis and modeling of data we may need more than one storage type to meet those expectations. As data professionals, it is incumbent upon us to understand how these tools work and put them to their best use–before somebody else puts them to sub-optimal use and we are stuck supporting them. I am joined by Kevin Feasel, a previous guest on the show, who walks us through some of the technologies available and sorts out under what circumstances we want to consider using each one.
In this episode, Kevin gives us his definitions for big and small data and looking to get the right technology for our needs. Let us know if you agree in the comments below!
We touch upon the following technologies in the episode:
- Relational Database
- Multidimensional Database
- Hadoop Cluster
- Columnstore Database
- In-Memory Cache
- Key-Value Database
- Document Database
Kevin Feasel is a Data Platform MVP and Engineering Manager of the Predictive Analytics team at ChannelAdvisor, where he specializes in T-SQL and R development, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com) and the author of PolyBase Revealed (forthcoming). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather’s nice enough.
We have this great expansion of data requirements and data storage mechanisms . . . but there are some major difficulties with a relational database.
Meet the Hosts
With more than 10 years of working with SQL Server, Carlos helps businesses ensure their SQL Server environments meet their users’ expectations. He can provide insights on performance, migrations, and disaster recovery. He is also active in the SQL Server community and regularly speaks at user group meetings and conferences. He helps support the free database monitoring tool found at databasehealth.com and provides training through SQL Trail events.
Eugene works as an independent BI consultant and Pluralsight author, specializing in Power BI and the Azure Data Platform. He has been working with data for over 8 years and speaks regularly at user groups and conferences. He also helps run the GroupBy online conference.
Kevin is a Microsoft Data Platform MVP and proprietor of Catallaxy Services, LLC, where he specializes in T-SQL development, machine learning, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL, president of the Triangle Area SQL Server Users Group, and author of the books PolyBase Revealed (Apress, 2020) and Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python (Apress, 2022). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.