January 4, 2022
NaN minute read
Anyone who has tried to learn SQL knows it's tricky to get the hang of joins. You're breezing through SELECT
statements and comparison operators and ORDER BY
, and wham! you run smack into joins. But, get over this conceptual speed bump, and you're well on your way to becoming proficient at SQL.
Even after learning the principles of inner, outer, left, and right joins, you might still have a nagging question: how do I find values from one table that are NOT present in another table?
That's where anti joins come in. They can be helpful in a variety of business situations when you're trying to find something that hasn't happened, such as:
We'll walk through each of these situations later on, but first, here's a primer on the types of anti joins and how to write them.
There are two types of anti joins:
A
left anti join
: This join returns rows in the left table that have no matching rows in the right table.
A
right anti join
: This join returns rows in the right table that have no matching rows in the left table.
We will walk through how to use an anti join, using a left anti join.
Unlike most SQL joins, an anti join doesn't have its own syntax - meaning one actually performs an anti join using a combination of other SQL queries. To find all the values from Table_1
that are not in Table_2
, you'll need to use a combination of LEFT JOIN
and WHERE
.
Select every column from Table_1
. Assign Table_1
an alias: t1
.
SELECT * FROM Table_1 t1
LEFT JOIN
Table_2
to Table_1
and assign Table_2
an alias: t2
. By using a LEFT JOIN
, your query will return all values from Table_1
(whether or not they are also present in Table_2
).
SELECT * FROM Table1 t1 LEFT JOIN Table2 t2 ON t1.id = t2.id
Use a WHERE
clause to filter out values that are present in Table_2
.
SELECT * FROM Table1 t1 LEFT JOIN Table2 t2 ON t1.id = t2.id WHERE t2.id IS NULL
The entire query will return only values that are in Table_1
but not in Table_2
.
Once you've got the basics down, you can solidify your knowledge by practicing on real-world data. Try working through these three business situations on your own in Mode. Just sign up for an account and click here to write a new query against the tables in the Mode Public Warehouse.
The first table (demo.accounts
) has a record of all customer accounts. The second table (demo.orders
) has a record of every order.
Select the columns you want from the accounts table: id
(aliased as customer_id
) and name
(aliased as customer_name
).
SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa
LEFT JOIN
the orders table to the accounts table on the account_id
, for orders that occurred in August 2016.
SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa LEFT JOIN demo.orders o ON a.id = o.account_id AND o.occurred_at BETWEEN '2016-08-01' AND '2016-08-31 23:59:59'
Exclude accounts that did place an order in August by adding a WHERE o.id is NULL
clause.
SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa LEFT JOIN demo.orders o ON a.id = o.account_id AND o.occurred_at BETWEEN '2016-08-01' AND '2016-08-31 23:59:59' WHERE o.id is NULL
The first table (demo.accounts
) has a record of all customer accounts. The second table (demo.web_events
) has a record of every web visit.
Select the columns you want from the accounts table: id
(aliased as customer_id
) and name
(aliased as customer_name
).
SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa
LEFT JOIN
the web events table to the accounts table on the account_id
, for visits that occurred in 2016:
SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa LEFT JOIN demo.web_events we ON we.account_id = a.id AND we.occurred_at BETWEEN '2016-01-01' AND '2016-12-31 23:59:59'
Exclude accounts that did visit the website in 2016 by adding a WHERE we.id is NULL
clause.
SELECT a.id as customer_id, a.name as customer_name FROM demo.accountsa LEFT JOIN demo.web_events we ON we.account_id = a.id AND we.occurred_at BETWEEN '2016-01-01' AND '2016-12-31 23:59:59' WHERE we.id is NULL
For this query, you'll use three tables: demo.accounts
, demo.orders
, and demo.sales_reps
.
While you already have your primary table containing records of all sales reps (demo.sales_reps
), you'll need a second table containing a record of all sales reps who did close a deal during the time range in order to perform your anti join. Since this table doesn't exist already, you'll need to start by creating a subquery.
SELECT o.id AS order_id, sr.id as sales_rep_id FROM demo.orders o JOIN demo.accountsa ON o.account_id = a.id JOIN demo.sales_reps sr ON sr.id = a.sales_rep_id WHERE o.occurred_at BETWEEN '2016-09-05' AND '2016-09-15 23:59:59'
Now you've got the two things you need: a table containing a record of all sales reps (demo.sales_reps
) and a subquery that pulls a record of the sales rep associated with all orders that occurred over the specified time range. You'll be able to join this subquery to a table just like you would join a table to a table.
Select the columns you want from the sales reps table. In this case, you'll pull each sales rep's id, name, and region id.
SELECT sr.id AS sales_rep_id, sr.name AS sales_rep_name, sr.region_id AS sales_rep_region_id FROM demo.sales_reps sr
LEFT JOIN
the subquery (aliased as a
) on sales_rep_id
.
SELECT sr.id AS sales_rep_id, sr.name AS sales_rep_name, sr.region_id AS sales_rep_region_id FROM demo.sales_reps sr LEFT JOIN( SELECT o.id AS order_id, sr.id as sales_rep_id FROM demo.orders o JOIN demo.accountsa ON o.account_id = a.id JOIN demo.sales_reps sr ON sr.id = a.sales_rep_id WHERE o.occurred_at BETWEEN '2016-09-05' AND '2016-09-15 23:59:59' ) a ON sr.id = a.sales_rep_id
Finally, exclude sales reps that are associated with an order in your subquery by adding a WHERE a.order_id is NULL
clause.
SELECT sr.id AS sales_rep_id, sr.name AS sales_rep_name, sr.region_id AS sales_rep_region_id FROM demo.sales_reps sr LEFT JOIN( SELECT o.id AS order_id, sr.id as sales_rep_id FROM demo.orders o JOIN demo.accountsa ON o.account_id = a.id JOIN demo.sales_reps sr ON sr.id = a.sales_rep_id WHERE o.occurred_at BETWEEN '2016-09-05' AND '2016-09-15 23:59:59' ) a ON sr.id = a.sales_rep_id WHERE a.order_id is NULL
No coding experience? No problem. Learn SQL and Python using real-world data with our free tutorials.
Work-related distractions for data enthusiasts.