A data model defines how an organization's various data sources are organized, related, and accessed. As data sources are added and complexity grows, the data model makes a big impact on the productivity and efficiency of anyone in the company that works with data.
Between our own analytics work here at Mode and the countless data teams we've helped along the way, we've seen our fair share of data models. Keep your data model under control by following these four tips:
Create consistency
One of the biggest benefits of a centralized data model is that you create universally understood terms that enable more direct discussion.
For example; the idea of a user “session”. This term is pretty commonly understood among product and marketing teams, but the exact definition can vary slightly based on who you ask. Those differences can lead to confusion when analytical questions get really specific. If you define the start of a session as the moment a user clicks into your site, the revenue from their purchase could fall into a completely different attribution category than it would if you define the start of a session as the moment a user logs into their account. The exact definition of a session, and the way your data model relates sessions to ad spend and sales, will make a real difference in both your quarterly revenue and your marketing budget when compounded over thousands of sessions and hundreds of transactions. The data model should make explicit the definition of common terms like sessions, users, etc.
Avoid proprietary abstraction
Abstraction is often included as a feature in the technologies in a data stack. Many popular tools include functionality that, at first glance, offer to remove complexity from a data model by abstracting common tasks. But these features typically require using proprietary terminology in your data model to access these functions. Not only does this create inconsistency in the programming language used to work with the data, this makes the data model harder to read and maintain, and less accessible by workers new to the model.
For example, when we built our data model, we made heavy use of dbt. However, we limited our reliance on their macros feature, which obscure the SQL being executed. As useful of a tool as dbt is, it's more important for us to maintain language consistency in our model so that we don't find ourselves beholden to any one tool.
Optimize for efficiency
A data model can make a significant impact on the efficiency of queries themselves. SQL query optimization begins with a well-structured data model, which allows queries to run much faster. By designing your data model thoughtfully, you can ensure that heavy-duty computations are performed in advance, minimizing the need for costly calculations each time you run an ad-hoc query.
This approach means analysts spend less time waiting for queries to execute and more time gathering valuable insights. By optimizing SQL queries through effective data modeling, analysts gain extra hours to focus on more strategic tasks.
Make it useful
All three of the goals above are primarily achieved by creating a data model comprised of useful buckets. The raw data from many sources often includes far more information than is necessary for an analyst to do their work, and much of that data's default state is messy.
Your data model should eliminate the columns that won't be useful for analysis, standardize things like timestamp data types and other values, and generally reduce the confusion and maintenance required to conduct analysis. Rename column headers to be legible to someone who hasn't worked with the table, and reorder columns so that the data points that are likely to be read together are grouped together, rather than leaving them scattered around the schema.
What to do next
Understanding and building great data models is an entire realm of expertise. If you're not an expert yourself, this should be something you ask about when building your data team. We can help: read 4 Questions to Consider When Making Your Next Analytics Hire.