The Data Transformation Dilemma: Don’t Get Tangled in DBT Spaghetti

Picture this: A chaotic jumble of data models scattered across your analytics landscape, each one a unique interpretation of reality, and none of them speaking the same language. Welcome to the wild world of data analytics, where unraveling the mysteries of DBT spaghetti is like navigating a maze blindfolded.

It’s common for data analysts to talk about maintaining over 1,000 DBT models. Yikes. It seems like every data team gets excited about using DBT and all of the flexibility it provides, but wind up in the same predicament every time.

So what’s the answer? Let’s break down the benefits and detractors of DBT modeling, and how to finally streamline the process of organizing your data (for good).

What is DBT data modeling, and what are the benefits of using it?

DBT data modeling revolves around the concept of transforming raw data into a structured, analytics-ready format through a series of modular and reusable components known as models (at least, that’s what it’s supposed to be. More on that in a minute. 🌶️)

These models represent different entities or dimensions within the data, such as customers, products, or transactions, and encapsulate the logic for transforming and aggregating raw data into meaningful insights. DBT follows a SQL-based approach, allowing users to define these transformations using familiar SQL syntax, making it accessible to data engineers, analysts, and data scientists alike.

Some benefits of using DBT include:

Version Control: Version control is crucial for tracking changes in data pipelines and ensuring reproducibility. DBT integrates seamlessly with version control systems like Git, enabling teams to track changes, collaborate effectively, and roll back to previous versions if necessary.

Documentation: Documentation is often overlooked but vital for understanding data transformations. DBT facilitates the generation of documentation automatically, providing insights into the data model, its dependencies, and business logic. This enhances transparency and knowledge sharing within the organization.

Testing and Validation: DBT simplifies the process of data validation and testing by allowing users to define tests directly within the code. Automated testing ensures data quality and integrity throughout the transformation pipeline, reducing the risk of errors and inconsistencies.

ALT TEXT

The darkside of DBT data modeling

The downside of DBT is that managing dependencies between models can be complex, particularly in large-scale data pipelines. Changes to upstream models may have downstream implications, requiring careful coordination and testing to avoid breaking existing processes. As analytics needs change, instead of tweaking and reusing existing models, teams tend to create new ones for fear of breaking their data pipelines and downstream reports. This is how we end up with DBT spaghetti:

IMAGE ALT TEXT

Pliable co-founders Jason and Kait discuss the frustrations of mapping data, aka “DBT spaghetti.”

How do data teams wind up with so much DBT spaghetti and what can they do to avoid it?

In the world of software engineering, architects preach the gospel of reusable components and clean architecture. It’s ingrained in their DNA. Many analysts, on the other hand, start from scratch at each business case and manually work their way through the raw data. They get the answers to their questions, but it takes the same amount of time for question #10 as it does for question #1. If they use DBT, they use it as a tool to save and automatically run their ad-hoc queries, rather than as the robust data modeling framework that it is.

The result? A hot mess of fragmented data models that make finding a needle in a haystack seem like child’s play. Imagine trying to pin down customer demographics across multiple data sources, only to find yourself drowning in a sea of conflicting definitions and formats. No thanks. You’d be better off starting from scratch every time rather than sifting through a sea of conflicting definitions and confusing business logic.

This ultimately winds up being utterly catastrophic for your business for several reasons:

You’re potentially relying on incorrect data to make business decisions. If dashboards need to constantly be manually re-configured to account for different data models, ultimately some will fall out of date. This leaves the operations, marketing, and sales teams that rely on these data models at risk of using untrustworthy data to make critical decisions.

The messy process is slowing down engineers. How many times has an executive pinged the data team with what they think is a simple data question, only to be told the answer will take weeks to come up with? This delay is because analysts have to sift through a bunch of DBT models, sometimes with conflicting definitions.

DBT promises to revolutionize the game with its software engineering-inspired approach. But here’s the kicker: without a solid grasp of core concepts like separation of concerns and reusability, analysts risk falling into the dreaded trap of DBT spaghetti.

How to (actually) make sense of your data

So, what’s the solution? There are a few roads you can go down when it comes to clean, reliable data:

Option 1: You can hire expensive data architects and analytics engineers to structure your data according to data modeling best practices. Employing and scaling a team of data engineers would work, but it is an enormous cost (think, upwards of a million dollar investment) and also time consuming. Not to mention, relying on a team of engineers to ensure data is reliably prepped and organized can add time to the scope of a project, not to mention a lot of tedious tasks to an engineer’s backlog.

Option 2: Another option is leveraging no-code data prep tools like Pliable. Pliable is an opinionated, no-code data modeling layer that runs on top of DBT, supercharging analysts and drastically reducing time-to-insight, all without engineers maintaining data pipelines.

Pliable’s new bidirectional integration with DBT lets you connect your existing DBT project via Github, and use Pliable’s powerful no-code data modeling tools to rapidly develop new DBT models to support your business stakeholders. Pliable spits out DBT-compatible SQL, so you can build your models directly in Pliable and seamlessly integrate into your existing DBT stack.

By leveraging a no-code approach, Pliable enables collaboration with and empowers non-technical stakeholders so they can easily see where things are coming from and what the logic is for generating their report, letting analysts focus on clearing their backlog instead of QA.

Ready for drag and drop data transformation?

Stop wasting time on tedious data cleansing tasks.

Meet Pliable

The Pliable platform is all about democratizing access to best-in-class data analytics capabilities, leveling the playing field for businesses big and small. With Pliable, you can finally unleash the full potential of your data and make data-driven decisions without requiring expensive engineers.

DBT spaghetti may be a thorn in every analyst’s side, but it’s time to take back control. By embracing data modeling best practices and arming yourself with the right tools, you can untangle the complexities of your data and move faster. It’s time to cut through DBT spaghetti and put clean, reliable, and customizable data you can trust on your plate.

It’s time to shake things up and embrace a new era of data governance. At Pliable, we’re on a mission to disrupt the status quo and empower businesses with the tools they need to conquer the chaos of data analytics.