July 12, 2022•Updated on June 6, 2023
NaN minute read
So in the last few years, everybody has been talking about “the modern data stack.” If you’re in the data community or even in the tech space, you know this. And if you don’t well here you go: What is the Modern Data Stack, What defines the modern data stack and why you should care, Inside the Modern Data stack, From Data to Insights, What is a modern data stack for growth, (the list goes on, just google modern data stack).
Not only are folks still talking about it, but some are ready to add new lingo to the pot “post modern data stack” or “people-first data stacks.” We’re not here to give you more lingo to throw in this ever-growing pot of modern data stack gumbo...
...but we are here to ungate and consolidate some of our original perspectives on the modern data stack in a single place that’s findable via google search. In this guide, we'll review what the modern data stack is, how it works, and unpack its benefits for companies of different sizes.
Table of contents
Establishing a data stack is one of the most foundational decisions a data team can make. For analysts to deliver value iteratively the right infrastructure needs to be in place first.
Your data stack determines how fast your data team can work and how easily your company can become truly data-driven. It is the core of how your team generates powerful and clarifying analysis across an organization.
Today’s most talked-about data infrastructures aren’t the monolithic (end-to-end) solutions of previous decades, it’s the modern data stack. That’s because there are immediate and long-term benefits to this modular system, and we’re here to help you think through that.
A modern stack is a flexible data infrastructure that’s composed of modular data tooling components—a cloud-based data warehouse, data pipelines, and BI layers—that work harmoniously to help the company arrive at clean or raw data insights fast.
Each component of the stack should be both flexible enough to be swapped out independently of each other and robust enough to excel at the function they’re built for. The right mix of tools will be unique to your business, but the underlying concept is the same: a collection of best-of-breed technologies that you can build upon and modify when needed.
The flexibility of this setup makes continuous improvements faster, more affordable, and more rewarding. For this reason, it can help your team enter the next phase of data maturity without overspending on costly implementations, dealing with vendor lock-in, or diverting engineering resources from core initiatives.
Let's break down more of these benefits through the lens of data team maturity.
Most companies follow a similar pattern as they grow. When starting out, you'll write simple SQL queries against their application database. This early phase is sufficient to get started but starts to quickly break down as 1) data needs grow more complex and 2) the user base expands.
Trying to evaluate BI tools for how well they'll scale with your startup? Use our checklist.
As the business grows, the data collection includes more sources like Google Analytics, Amplitude, Heap Analytics, and in-application reporting tools like Salesforce. What started out as simple SQL queries can grow into a complicated data architecture as files such as spreadsheets, CSVs, and JSON data are brought into the mix.
As more third-party applications are used, a significant amount of engineering time can be spent integrating them with the existing data. In this ad hoc environment, there is no single source of truth and no consistent way to incorporate new data. Thus, a significant amount of time is spent figuring out how to get the data into a suitable format that can be merged with other data sources.
Adjusting large amounts of data to fit other platforms is laborious and not an efficient use of your engineer’s time. With a modern data stack, you are able to swap data sources and tooling in and out more efficiently—minimizing the amount of technical debt you are incurring and minimizing setup costs.
Create a data program that scales.
At high-growth companies, you need to grow quickly to survive. Critical KPI metrics, like growth numbers and reports on revenue that require data from multiple sources can be difficult to consolidate because it is a manual data-gathering process. A modern data stack model automates much of this challenging work.
You can get started with just a few data sources and gigabytes of data and adapt as the team collects terabytes of data from disparate data sources. There is no need to continually engineer the solution during this high-growth period. Mode customers who have a modern data stack include Patreon, Shopify, Rippling, CashApp, and more.
At Shopify, analysts are constantly busy with requests from everyone from project managers to salespeople. To make reporting less cumbersome, Shopify analysts use Mode to quickly load reports for each department without replacing their BI tool or overhauling their data infrastructure. Read the case study
A modern data architecture can also be used by larger organizations that have existing data stacks and warehouses. In this case, they can benefit by introducing components of this structure as part of a longer-term migration strategy or by directly moving to this architecture as part of a wholesale migration.
Learn how Condé Nast uses Mode to house a data application that democratizes data for product and marketing functions.
Create a data program that scales.
1. You can choose the best tools for the job (for data teams and stakeholders)
Data teams need the flexibility to dissect data in whatever way best addresses the question at hand. The best tool for the job will vary based on a number of factors: what language an analyst is most comfortable in, what question they’re trying to answer, and what type of stakeholder is asking for insights.
All-in-one, monolithic data stacks create vendor lock-in, don't provide best-of-breed tooling, and are not guaranteed to accommodate new technologies quickly.
2. You can modify the stack to scale with you
As a company’s growth goals become more ambitious, the data stack should evolve to meet them.
In a modular stack, modifications are easier because many popular tools today are built with dozens of pre-built connectors to common business apps and a REST API for custom integrations. This also helps prevent technical debt by allowing teams to swap in tools when you’re limited by what you can do with your data.
Download our guide on The Modern Data Architecture to learn when it’s time to upgrade your stack.
3. You can protect against vendor lock-in
Switching vendors is painful when a single solution accounts for most of your data infrastructure. It requires migrating large volumes of data without knowing whether the new solution is going to meet your needs, discouraging change by making it risky. Monolithic solutions often require more of an upfront investment of time and budget, creating a sunk-cost mentality among teams that might otherwise choose a more fitting option.
4. You can enable global governance
Since data must be accessible to a range of solutions, governed data should be consumable by all of them and free from disruption when one tool is swapped for another. You can address this by transforming data before it loads into any particular solution, creating scripts with a tool like dbt to ensure that no matter what you add or remove from your stack, you’ll be able to maintain data quality.
Are you assembling your company's first stack? Good news: A basic modern data stack can be built in 30 minutes. Our co-founder and Chief Analytics Officer, Benn Stancil, demonstrates this live in this YouTube video. We also made a step-by-step guide for you to follow along, with markers for each corresponding section in the video and some pros and cons for the basic tooling options.
Here’s what you’ll need:
A cloud-based data warehouse - The first component is a cloud-based, agile warehouse, like Snowflake, Redshift, or BigQuery that serves as the central location to collect all of the organization's incoming data. Learn more about the cost breakdown in our guide (pg 12).
Data pipelines - The second component is one or more pipeline services, like Fivetran, Stitch, or Segment, to ensure that data is fed seamlessly into the warehouse with minimal engineering effort. Learn about the differences of these tools in our guide (pg 14).
An analytics/BI platform - The third component is a powerful data science platform, like Mode, (try a 2-week trial or Mode Studio, our freemium version) that can take advantage of the consolidated data warehouse to analyze data. Learn more about how Mode integrates easily with the other components in our guide (pg 17).
Data prep and transformation layer - The last component is a transformation tool like dbt that allows you to transform and model your data directly in a cloud-based warehouse. Learn more about these types of tooling in our guide (pg 19). Learn how dbt's Semantic Layer works with Mode.
Get a simple step-by-step tutorial on how to assemble a modern data stack in 30 minutes (with a YouTube video for guidance).
So, what does Mode’s stack look like? Our data stack is currently made up of Snowflake, dbt, Stitch (they are also a Mode customer), and Fivetran. We still use Segment, not to sync data from third-party applications into our warehouse but only to log and record events. Read more about how we chose our ETL tooling for our stack.
Your data stack is the key to scaling your data strategy and making business decisions confidently. This translates into building better products, a more competitive go-to-market strategy, and a new level of data maturity.
If you need some inspiration, we recommend pursuing our case studies to see the data stacks of different teams (listed at the top-of-fold in each case study). We hope you’ve found this guide helpful (more resources here), but don’t hesitate to reach out with more questions editor [at] modeanalytics [.com].
Curious about how Mode works? Sit back and watch the video—no reps ;)
Work-related distractions for data enthusiasts.