A data model defines how an organization's various data sources are organized, related, and accessed. As data sources are added and complexity grows, the data model makes a big impact on the productivity and efficiency of anyone in the company that works with data.
Between our own analytics work here at Mode and the countless data teams we've helped along the way, we've seen our fair share of data models. Keep your data model under control by following these four tips:
One of the biggest benefits of a centralized data model is that you create universally understood terms that enable more direct discussion.
For example; the idea of a user “session”. This term is pretty commonly understood among product and marketing teams, but the exact definition can vary slightly based on who you ask. Those differences can lead to confusion when analytical questions get really specific. If you define the start of a session as the moment a user clicks into your site, the revenue from their purchase could fall into a completely different attribution category than it would if you define the start of a session as the moment a user logs into their account. The exact definition of a session, and the way your data model relates sessions to ad spend and sales, will make a real difference in both your quarterly revenue and your marketing budget when compounded over thousands of sessions and hundreds of transactions. The data model should make explicit the definition of common terms like sessions, users, etc.
Avoid proprietary abstraction
Abstraction is often included as a feature in the technologies in a data stack. Many popular tools include functionality that, at first glance, offer to remove complexity from a data model by abstracting common tasks. But these features typically require using proprietary terminology in your data model to access these functions. Not only does this create inconsistency in the programming language used to work with the data, this makes the data model harder to read and maintain, and less accessible by workers new to the model.
For example, when we built our data model, we made heavy use of dbt. However, we limited our reliance on their macros feature, which obscure the SQL being executed. As useful of a tool as dbt is, it's more important for us to maintain language consistency in our model so that we don't find ourselves beholden to any one tool.
Optimize for efficiency
A data model can make a significant impact on the efficiency of queries themselves. With well-structured data, queries can run much faster. Most heavy-duty computations can be done in advance while building the relationships in the data model, so that expensive computational work doesn't have to happen every time you run an ad-hoc query. This can ensure analysts are spending time on gathering insights rather than waiting for queries to execute. The extra hours that analysts get back by having this work done up front can be spent on all kinds of more valuable work.
Make it useful
All three of the goals above are primarily achieved by creating a data model comprised of useful buckets. The raw data from many sources often includes far more information than is necessary for an analyst to do their work, and much of that data's default state is messy.
Your data model should eliminate the columns that won't be useful for analysis, standardize things like timestamp data types and other values, and generally reduce the confusion and maintenance required to conduct analysis. Rename column headers to be legible to someone who hasn't worked with the table, and reorder columns so that the data points that are likely to be read together are grouped together, rather than leaving them scattered around the schema.
What to do next
Understanding and building great data models is an entire realm of expertise. If you're not an expert yourself, this should be something you ask about when building your data team. We can help: read 4 Questions to Consider When Making Your Next Analytics Hire.