EDI vs ETL

When working on a customer on an import process, she made the comment—if we were doing true EDI, we’d [Do X]. The comment got me thinking—Aren’t we working on an EDI process? As someone who frequently has multiple roles on projects, I perhaps had lumped all the various 3 letter acronyms into the same spaghetti salad. In this episode of the podcast, Eugene and I chat about the differences—and how thy might impact your team. Data projects can get complex with lots of moving pieces and understanding the different pieces can be helpful as you work to solve your specific pieces of the puzzle.

Microsoft MVP Kevin Feasel was not able to join the discussion but did have this to say about the two:

“EDI is an interface, a contract for how things are expected.  I expect data of this file format with this data structure.  It should contain these data elements with these attributes.  For example, I’m expecting Parquet files with 37 columns, the first of which is an integer ID, the second a string description, and so on.

ETL, meanwhile, is all about the process.  We pull data from a source system, transform it in-memory, and then load it into a destination system.  Alternatively, ELT:  pull data from a source system, land it as-is in the destination system, and then transform it using the destination system’s resources into a structure which fits the destination system.  ETL is the venerable technique with a long history of existence.  ELT, meanwhile, is a bit newer of an idea but works really well when we don’t want to tax our source systems very heavily and have a huge amount of data–more than we’d be able to transform in a reasonable amount of time on cheap data mover servers.

With ETL, we of course care about file formats, data elements, and data attributes, but I can talk you through an entire ETL implementation without getting into the details on any of those.  That’s because the techniques are independent of file format, data element, and data attribute.  There are tuning options available based on this–for example, packing data in the right way to maximize how much fits in a single buffer of memory, or minimizing network transit times for data movement.

With EDI, meanwhile, we can talk through the entire interchange without a single word about how data gets from you to me and what I do with it afterward.  This provides us with a nice separation of duties, as we can have business analysts discussing the makeup of these files and data engineers discussing the processing and analysis of the data and outside of a few touchpoints, that can be handled independently.”

3 Takeaways

  1. EDI and ETL serve different audiences.
  2. Integration provides a nice separation of duties.
  3. While it can feel like acronym spaghetti, it can be helpful to define how teams implement data integration.

If video is not working, please find it on YouTube at youtube.com/@SQLDataPartnersPodcast.
Please Like, Subscribe, and hit the bell for notifications. Thank you!

You could become an SSIS expert, but a lot of these other pieces, you really do have to understand business; so much of [EDI] revolves around business objects and concepts, very very distinctly.

Eugene Meidinger

Meet the Hosts

carlos chacon headshot

Carlos Chacon

With more than 10 years of working with SQL Server, Carlos helps businesses ensure their SQL Server environments meet their users’ expectations. He can provide insights on performance, migrations, and disaster recovery. He is also active in the SQL Server community and regularly speaks at user group meetings and conferences. He helps support the free database monitoring tool found at databasehealth.com and provides training through SQL Trail events.

eugene meidinger headshot

Eugene Meidinger

Eugene works as an independent BI consultant and Pluralsight author, specializing in Power BI and the Azure Data Platform. He has been working with data for over 8 years and speaks regularly at user groups and conferences. He also helps run the GroupBy online conference.

kevin feasel headshot

Kevin Feasel

Kevin is a Microsoft Data Platform MVP and proprietor of Catallaxy Services, LLC, where he specializes in T-SQL development, machine learning, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL, president of the Triangle Area SQL Server Users Group, and author of the books PolyBase Revealed (Apress, 2020) and Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python (Apress, 2022). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather's nice enough.

Want to Submit Some Feedback?

Did we miss something or not quite get it right? Want to be a guest or suggest a guest/topic for the podcast?

Let's find what you're looking for