ThoughtSpot acquires Mode to define the next generation of collaborative BI >>Learn More

Additional Resources

template

Defining Metrics: A Template 
for Working With Stakeholders

Guide

10 Things Modern Business 
Intelligence Should Enable Today

video

Watch a Product Tour of Mode

Get started with Mode

How to Master Anti Joins and Apply Them to Business Problems

Image of author
Joel Carron, Data Scientist at Mode

January 4, 2022

NaN minute read

anti-join-4

Anyone who has tried to learn SQL knows it's tricky to get the hang of joins. You're breezing through SELECT statements and comparison operators and ORDER BY, and wham! you run smack into joins. But, get over this conceptual speed bump, and you're well on your way to becoming proficient at SQL.

Even after learning the principles of innerouterleft, and right joins, you might still have a nagging question: how do I find values from one table that are NOT present in another table?

That's where anti joins come in. They can be helpful in a variety of business situations when you're trying to find something that hasn't happened, such as:

We'll walk through each of these situations later on, but first, here's a primer on the types of anti joins and how to write them.

Types of anti joins

There are two types of anti joins:

  • A

     

    left anti join

     

    : This join returns rows in the left table that have no matching rows in the right table.

  • A

     

    right anti join

     

    : This join returns rows in the right table that have no matching rows in the left table.

We will walk through how to use an anti join, using a left anti join.

How to perform an anti join

Unlike most SQL joins, an anti join doesn't have its own syntax - meaning one actually performs an anti join using a combination of other SQL queries. To find all the values from Table_1 that are not in Table_2, you'll need to use a combination of LEFT JOIN and WHERE.

  1. Select every column from Table_1. Assign Table_1 an alias: t1.

    SELECT * FROM Table_1 t1

  2. LEFT JOIN Table_2 to Table_1 and assign Table_2 an alias: t2. By using a LEFT JOIN, your query will return all values from Table_1 (whether or not they are also present in Table_2).

    SELECT * FROM Table1 t1 LEFT JOIN Table2 t2 ON t1.id = t2.id

  3. Use a WHERE clause to filter out values that are present in Table_2.

    SELECT * FROM Table1 t1 LEFT JOIN Table2 t2 ON t1.id = t2.id WHERE t2.id IS NULL

The entire query will return only values that are in Table_1 but not in Table_2.

Anti join examples

Once you've got the basics down, you can solidify your knowledge by practicing on real-world data. Try working through these three business situations on your own in Mode. Just sign up for an account and click here to write a new query against the tables in the Mode Public Warehouse.

Which customers didn't place an order in August?

The first table (demo.accounts) has a record of all customer accounts. The second table (demo.orders) has a record of every order.

  1. Select the columns you want from the accounts table: id (aliased as customer_id) and name (aliased as customer_name).

    SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa

  2. LEFT JOIN the orders table to the accounts table on the account_id, for orders that occurred in August 2016.

    SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa LEFT JOIN demo.orders o ON a.id = o.account_id AND o.occurred_at BETWEEN '2016-08-01' AND '2016-08-31 23:59:59'

  3. Exclude accounts that did place an order in August by adding a WHERE o.id is NULL clause.

    SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa LEFT JOIN demo.orders o ON a.id = o.account_id AND o.occurred_at BETWEEN '2016-08-01' AND '2016-08-31 23:59:59' WHERE o.id is NULL

Which customers haven't visited your website this year?

The first table (demo.accounts) has a record of all customer accounts. The second table (demo.web_events) has a record of every web visit.

  1. Select the columns you want from the accounts table: id (aliased as customer_id) and name (aliased as customer_name).

    SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa

  2. LEFT JOIN the web events table to the accounts table on the account_id, for visits that occurred in 2016:

    SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa LEFT JOIN demo.web_events we ON we.account_id = a.id AND we.occurred_at BETWEEN '2016-01-01' AND '2016-12-31 23:59:59'

  3. Exclude accounts that did visit the website in 2016 by adding a WHERE we.id is NULL clause.

    SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa LEFT JOIN demo.web_events we ON we.account_id = a.id AND we.occurred_at BETWEEN '2016-01-01' AND '2016-12-31 23:59:59' WHERE we.id is NULL

Which sales reps didn't close a deal from September 5th - September 15th?

For this query, you'll use three tables: demo.accountsdemo.orders, and demo.sales_reps.

  1. While you already have your primary table containing records of all sales reps (demo.sales_reps), you'll need a second table containing a record of all sales reps who did close a deal during the time range in order to perform your anti join. Since this table doesn't exist already, you'll need to start by creating a subquery.

    SELECT o.id AS order_id, sr.id as sales_rep_id FROM demo.orders o JOIN demo.accountsa ON o.account_id = a.id JOIN demo.sales_reps sr ON sr.id = a.sales_rep_id WHERE o.occurred_at BETWEEN '2016-09-05' AND '2016-09-15 23:59:59'

    Now you've got the two things you need: a table containing a record of all sales reps (demo.sales_reps) and a subquery that pulls a record of the sales rep associated with all orders that occurred over the specified time range. You'll be able to join this subquery to a table just like you would join a table to a table.

  2. Select the columns you want from the sales reps table. In this case, you'll pull each sales rep's id, name, and region id.

    SELECT sr.id AS sales_rep_id, sr.name AS sales_rep_name, sr.region_id AS sales_rep_region_id FROM demo.sales_reps sr

  3. LEFT JOIN the subquery (aliased as a) on sales_rep_id.

    SELECT sr.id AS sales_rep_id, sr.name AS sales_rep_name, sr.region_id AS sales_rep_region_id FROM demo.sales_reps sr LEFT JOIN( SELECT o.id AS order_id, sr.id as sales_rep_id FROM demo.orders o JOIN demo.accountsa ON o.account_id = a.id JOIN demo.sales_reps sr ON sr.id = a.sales_rep_id WHERE o.occurred_at BETWEEN '2016-09-05' AND '2016-09-15 23:59:59' ) a ON sr.id = a.sales_rep_id

  4. Finally, exclude sales reps that are associated with an order in your subquery by adding a WHERE a.order_id is NULL clause.

    SELECT sr.id AS sales_rep_id, sr.name AS sales_rep_name, sr.region_id AS sales_rep_region_id FROM demo.sales_reps sr LEFT JOIN( SELECT o.id AS order_id, sr.id as sales_rep_id FROM demo.orders o JOIN demo.accountsa ON o.account_id = a.id JOIN demo.sales_reps sr ON sr.id = a.sales_rep_id WHERE o.occurred_at BETWEEN '2016-09-05' AND '2016-09-15 23:59:59' ) a ON sr.id = a.sales_rep_id WHERE a.order_id is NULL

No coding experience? No problem. Learn SQL and Python using real-world data with our free tutorials.

Get our weekly data newsletter

Work-related distractions for data enthusiasts.

Additional Resources

template

Defining Metrics: A Template 
for Working With Stakeholders

Guide

10 Things Modern Business 
Intelligence Should Enable Today

video

Watch a Product Tour of Mode

Get started with Mode