Analytics architecture
An analytics architecture is any set of tools and technologies that enable people and teams to store and analyze an organization’s data. While an analytics architecture can support a wide range of capabilities, four are particularly common:
-
Data collection - Analytics architectures typically start with tools that record data, including tracking user actions on a website, transactions in a store, support calls made to a help center, and fuel performance on an airplane engine.
-
Transportation and transformation - After it’s recorded, data has to be moved from source systems in a centralized warehouse. It also has to be cleaned and prepped for analysis.
-
Data storage - Data is warehoused in centralized systems, usually a data warehouse or data lake.
-
Analytics - Data consumers need a means for analyzing and visualizing data, as well as a place to host dashboards and reports.
The tools that enable all of these capabilities—plus any others that make use of an organization’s data—make up an analytics architecture.
Opinion: Analytics architectures are data stack layers :
Today, analytics architectures are more commonly referred to as data stacks. Data stacks are layers (hence the term “stack”) of tools that move data through the four steps outlined above. They’re typically represented by diagrams showing a lot of boxes connected by arrows: Data starts here, moves to this tool, then goes to this other tool and that other tool, and so on.
Data stacks can be made up of a number of different tools:
- Databases, like Redshift, BigQuery, Snowflake, MySQL, and Postgres.
- Web analytics tools, like Google Analytics, and Mixpanel, Amplitude.
- ETL tools for ingesting data into a database, like Stitch and Fivetran.
- ETL tools for exporting data back into operational applications, like Census and Hightouch.
- Analytics and BI tools, like Mode, Looker, and Tableau.
- Data science applications, like Jupyter, RStudio, and SAS.
- Data transformation tools, like dbt.
- Orchestration tools, like Airflow and Dagster.
- Machine learning libraries, like TensorFlow and PyTorch.
- Difficult-to-classify multi-purpose technologies, like Databricks and Alteryx.
- Custom-built tools that do any number of things.
- Monitoring tools like Monte Carlo and Bigeye.
- AI-based anomaly detection tools, like Sisu and Outlier.
- Data discovery and documentation tools like Amundsen, Atlan, and Select Star.
When people talk about their data stack, they could be describing any of these tools. In some cases, people could also consider tools that interact with these tools—like Salesforce, or their own internal applications—as part of their data stack as well.
An analytics architecture is any set of tools and technologies that enable people and teams to store and analyze an organization’s data.
Though the toolset is too expansive to describe all of it, it’s worth going into a bit more depth on the four major components: the data warehouse, ETL pipelines, transformation tools, and analytics tools. You can read a more in-depth breakdown in our guide to building a modern data stack in 30 minutes.
Related terms:
Data warehouse, ETL tools