Here at Mode, we put a lot of emphasis on writing good, clean SQL. As a collaborative team, it’s important that our queries can be easily read and understood by our teammates. One convention we’ve adopted is the usage of common table expressions (CTEs).
What are common table expressions?
Common table expressions are temporary named result sets that exist for just one query. They begin with a WITH
, followed by an expression name and a query which defines the result set. Using this convention, you can define multiple expressions within a query. Once defined, these expressions can be referenced repeatedly throughout the rest of your query (just like you’d refer to permanent database tables or views).
While common table expressions operate similarly to subqueries, they have several benefits including:
- The ability to reference the same temporary result set repeatedly across the query
- Improved readability for collaboration and debugging
- Better visibility into commonly used result sets that are good candidates to become permanent tables/views
A practical example
Let’s take an example using data in the demo schema of Mode’s Public Warehouse, which involves a hypothetical paper company. Suppose you work for this company and you want to reach out to large customers that have had poor shipping experiences. After talking with the sales team, you decide to define “large customers” as any account with $100k or more in lifetime revenue, and you define poor shipping experiences as any account that has had multiple orders with failed shipments.
A streamlined approach for identifying accounts that meet both of these criteria is to use a common table expression, as in this example.
WITH
account_failed_shipments as (
SELECT a.id as account_id,
COUNT(1) as failed_shipments
FROM demo.orders__fulfillments f
JOIN demo.orders o
ON o.id = f.order_id
JOIN demo.accounts a
ON a.id = o.account_id
WHERE f.shipment_status = 'failure'
GROUP BY 1
),
account_lifetime_revenue as (
SELECT o.account_id,
SUM(o.total_amt_usd) as amount
FROM demo.orders o
GROUP BY 1
)
SELECT r.name as region,
sr.name as sales_rep,
a.name as account_name,
alr.amount as lifetime_revenue,
afs.failed_shipments
FROM demo.accounts a
JOIN demo.sales_reps sr
ON sr.id = a.sales_rep_id
JOIN demo.region r
ON r.id = sr.region_id
JOIN account_failed_shipments afs
ON afs.account_id = a.id
AND afs.failed_shipments > 1
JOIN account_lifetime_revenue alr
ON alr.account_id = a.id
AND alr.amount >= 100000
ORDER BY 1,2
Breaking it down
- WITH: A
WITH
is necessary to kick off your common table expression query.
WITH
- Expression 1:
account_failed_shipments
. Our first expression returns the count of failed shipments for all accounts that have had at least one. Syntactically, we begin by naming the expression in the formatexpression_name as (
. We’ll now be able to reference this name as though it was a table in our database throughout the remainder of the query. What follows is a typical query, which, in this example, is grouping failed shipments by account. Finally, we conclude the expression with a closed parenthesis. Note that since this is not the last expression in our query, the closed parenthesis is followed by a comma.
account_failed_shipments as (
SELECT a.id as account_id,
COUNT(1) as failed_shipments
FROM demo.orders__fulfillments f
JOIN demo.orders o
ON o.id = f.order_id
JOIN demo.accounts a
ON a.id = o.account_id
WHERE f.shipment_status = 'failure'
GROUP BY 1
),
- Expression 2:
account_lifetime_revenue
. Our second expression returns the total sum of revenue by account. It is identical in structure to Expression 1, with one exception— since it is the final expression in our query, the closed parenthesis is not followed by a comma.
account_lifetime_revenue as (
SELECT o.account_id,
SUM(o.total_amt_usd) as amount
FROM demo.orders o
GROUP BY 1
)
- Final query: Lastly, we have the final query itself that will return our result set. This should look quite similar to any traditional query, with the exception of the final two JOINs that reference expressions we’ve previously defined, as opposed to tables in our database.
SELECT r.name as region,
sr.name as sales_rep,
a.name as account_name,
alr.amount as lifetime_revenue,
afs.failed_shipments
FROM demo.accounts a
JOIN demo.sales_reps sr
ON sr.id = a.sales_rep_id
JOIN demo.region r
ON r.id = sr.region_id
JOIN account_failed_shipments afs
ON afs.account_id = a.id
AND afs.failed_shipments > 1
JOIN account_lifetime_revenue alr
ON alr.account_id = a.id
AND alr.amount >= 100000
ORDER BY 1,2
Common table expressions help us work more efficiently
Common table expressions are highly valuable method for compartmentalizing complex queries, particularly when they involve combinations of multiple concepts. Since each expression is clearly named and defined, debugging is much easier (especially if you’re using highlight-to-run in Mode).