mode helix
NOW LIVEWork faster with Calculated Fields in Mode!Try it now
Analysis

June 16, 2020 4 minute read

Use Common Table Expressions (CTE) to Keep Your SQL Clean [with example]

Image of author

Joel Carron

Data Scientist at Mode

Here at Mode, we put a lot of emphasis on writing good, clean SQL. As a collaborative team, it’s important that our queries can be easily read and understood by our teammates. One convention we’ve adopted is the usage of common table expressions (CTEs).

What are common table expressions?

Common table expressions are temporary named result sets that exist for just one query. They begin with a WITH , followed by an expression name and a query which defines the result set. Using this convention, you can define multiple expressions within a query. Once defined, these expressions can be referenced repeatedly throughout the rest of your query (just like you’d refer to permanent database tables or views).

While common table expressions operate similarly to subqueries, they have several benefits including:

  • The ability to reference the same temporary result set repeatedly across the query
  • Improved readability for collaboration and debugging
  • Better visibility into commonly used result sets that are good candidates to become permanent tables/views

A practical example

Let’s take an example using data in the demo schema of Mode’s Public Warehouse, which involves a hypothetical paper company. Suppose you work for this company and you want to reach out to large customers that have had poor shipping experiences. After talking with the sales team, you decide to define “large customers” as any account with $100k or more in lifetime revenue, and you define poor shipping experiences as any account that has had multiple orders with failed shipments.

A streamlined approach for identifying accounts that meet both of these criteria is to use a common table expression, as in this example.

  WITH 
  
  account_failed_shipments as (
    SELECT a.id as account_id,
           COUNT(1) as failed_shipments
      FROM demo.orders__fulfillments f 
      JOIN demo.orders o 
        ON o.id = f.order_id 
      JOIN demo.accounts a 
        ON a.id = o.account_id 
     WHERE f.shipment_status = 'failure'
     GROUP BY 1
  ),
  
  account_lifetime_revenue as (
    SELECT o.account_id,
           SUM(o.total_amt_usd) as amount
      FROM demo.orders o 
     GROUP BY 1
  )

    SELECT r.name as region,
           sr.name as sales_rep,
           a.name as account_name,
           alr.amount as lifetime_revenue,
           afs.failed_shipments 
      FROM demo.accounts a 
      JOIN demo.sales_reps sr 
        ON sr.id = a.sales_rep_id 
      JOIN demo.region r 
        ON r.id = sr.region_id 
      JOIN account_failed_shipments afs 
        ON afs.account_id = a.id 
       AND afs.failed_shipments > 1 
      JOIN account_lifetime_revenue alr 
        ON alr.account_id = a.id 
       AND alr.amount >= 100000
     ORDER BY 1,2

Breaking it down

  • WITH: A WITH is necessary to kick off your common table expression query.
  WITH 
  • Expression 1: account_failed_shipments. Our first expression returns the count of failed shipments for all accounts that have had at least one. Syntactically, we begin by naming the expression in the format expression_name as (. We’ll now be able to reference this name as though it was a table in our database throughout the remainder of the query. What follows is a typical query, which, in this example, is grouping failed shipments by account. Finally, we conclude the expression with a closed parenthesis. Note that since this is not the last expression in our query, the closed parenthesis is followed by a comma.
  account_failed_shipments as (
    SELECT a.id as account_id,
           COUNT(1) as failed_shipments
      FROM demo.orders__fulfillments f 
      JOIN demo.orders o 
        ON o.id = f.order_id 
      JOIN demo.accounts a 
        ON a.id = o.account_id 
     WHERE f.shipment_status = 'failure'
     GROUP BY 1
  ),
  
  • Expression 2: account_lifetime_revenue. Our second expression returns the total sum of revenue by account. It is identical in structure to Expression 1, with one exception— since it is the final expression in our query, the closed parenthesis is not followed by a comma.
  account_lifetime_revenue as (
    SELECT o.account_id,
           SUM(o.total_amt_usd) as amount
      FROM demo.orders o 
     GROUP BY 1
  )
  • Final query: Lastly, we have the final query itself that will return our result set. This should look quite similar to any traditional query, with the exception of the final two JOINs that reference expressions we’ve previously defined, as opposed to tables in our database.
    SELECT r.name as region,
           sr.name as sales_rep,
           a.name as account_name,
           alr.amount as lifetime_revenue,
           afs.failed_shipments 
      FROM demo.accounts a 
      JOIN demo.sales_reps sr 
        ON sr.id = a.sales_rep_id 
      JOIN demo.region r 
        ON r.id = sr.region_id 
      JOIN account_failed_shipments afs 
        ON afs.account_id = a.id 
       AND afs.failed_shipments > 1 
      JOIN account_lifetime_revenue alr 
        ON alr.account_id = a.id 
       AND alr.amount >= 100000
     ORDER BY 1,2

Common table expressions help us work more efficiently

Common table expressions are highly valuable method for compartmentalizing complex queries, particularly when they involve combinations of multiple concepts. Since each expression is clearly named and defined, debugging is much easier (especially if you’re using highlight-to-run in Mode).

Looks like you've got a thing for cutting-edge data news.

Get the latest.