Anyone who has tried to learn SQL knows it's tricky to get the hang of joins. You're breezing through SELECT
statements and comparison operators and ORDER BY
, and wham! you run smack into joins. But, get over this conceptual speed bump, and you're well on your way to becoming proficient at SQL.
Even after learning the principles of inner, outer, left, and right joins, you might still have a nagging question: how do I find values from one table that are NOT present in another table?
That's where anti joins come in. They can be helpful in a variety of business situations when you're trying to find something that hasn't happened, such as:
- Customers who did not place an order
- Customers who have not visited your website
- Salespeople who did not close a deal
We'll walk through each of these situations later on, but first, here's a primer on how to write an anti join.
How to perform an anti join
Unlike most SQL joins, an anti join doesn't have its own syntax. To find all the values from Table_1
that are not in Table_2
, you'll need to use a combination of LEFT JOIN
and WHERE
.
-
Select every column from
Table_1
. AssignTable_1
an alias:t1
.SELECT * FROM Table_1 t1
-
LEFT JOIN
Table_2
toTable_1
and assignTable_2
an alias:t2
. By using aLEFT JOIN
, your query will return all values fromTable_1
(whether or not they are also present inTable_2
).SELECT * FROM Table1 t1 LEFT JOIN Table2 t2 ON t1.id = t2.id
-
Use a
WHERE
clause to filter out values that are present inTable_2
.SELECT * FROM Table1 t1 LEFT JOIN Table2 t2 ON t1.id = t2.id WHERE t2.id IS NULL
The entire query will return only values that are in Table_1
but not in Table_2
.
Anti join examples
Once you've got the basics down, you can solidify your knowledge by practicing on real-world data. Try working through these three business situations on your own in Mode. Just sign up for an account and click here to write a new query against the tables in the Mode Public Warehouse.
Which customers didn't place an order in August?
The first table (demo.accounts
) has a record of all customer accounts. The second table (demo.orders
) has a record of every order.
-
Select the columns you want from the accounts table:
id
(aliased ascustomer_id
) andname
(aliased ascustomer_name
).SELECT a.id as customerid, a.name as customername FROM demo.accounts a
-
LEFT JOIN
the orders table to the accounts table on theaccount_id
, for orders that occurred in August 2016.SELECT a.id as customerid, a.name as customername FROM demo.accounts a LEFT JOIN demo.orders o ON a.id = o.accountid AND o.occurredat BETWEEN '2016-08-01' AND '2016-08-31 23:59:59'
-
Exclude accounts that did place an order in August by adding a
WHERE o.id is NULL
clause.
SELECT a.id as customer_id,
a.name as customer_name
FROM demo.accounts a
LEFT JOIN demo.orders o
ON a.id = o.account_id
AND o.occurred_at BETWEEN '2016-08-01' AND '2016-08-31 23:59:59'
WHERE o.id is NULL
Which customers haven't visited your website this year?
The first table (demo.accounts
) has a record of all customer accounts. The second table (demo.web_events
) has a record of every web visit.
-
Select the columns you want from the accounts table:
id
(aliased ascustomer_id
) andname
(aliased ascustomer_name
).SELECT a.id as customerid, a.name as customername FROM demo.accounts a
-
LEFT JOIN
the web events table to the accounts table on theaccount_id
, for visits that occurred in 2016:SELECT a.id as customer_id, a.name as customer_name FROM demo.accounts a
LEFT JOIN demo.webevents we ON we.accountid = a.id AND we.occurred_at BETWEEN '2016-01-01' AND '2016-12-31 23:59:59'
-
Exclude accounts that did visit the website in 2016 by adding a
WHERE we.id is NULL
clause.SELECT a.id as customer_id, a.name as customer_name FROM demo.accounts a
LEFT JOIN demo.webevents we ON we.accountid = a.id AND we.occurred_at BETWEEN '2016-01-01' AND '2016-12-31 23:59:59' WHERE we.id is NULL
Which sales reps didn't close a deal from September 5th - September 15th?
For this query, you'll use three tables: demo.accounts
, demo.orders
, and demo.sales_reps
.
-
While you already have your primary table containing records of all sales reps (
demo.sales_reps
), you'll need a second table containing a record of all sales reps who did close a deal during the time range in order to perform your anti join. Since this table doesn't exist already, you'll need to start by creating a subquery.SELECT o.id AS orderid, sr.id as salesrepid FROM demo.orders o JOIN demo.accounts a ON o.accountid = a.id JOIN demo.salesreps sr ON sr.id = a.salesrepid WHERE o.occurredat BETWEEN '2016-09-05' AND '2016-09-15 23:59:59'
Now you've got the two things you need: a table containing a record of all sales reps (demo.sales_reps
) and a subquery that pulls a record of the sales rep associated with all orders that occurred over the specified time range. You'll be able to join this subquery to a table just like you would join a table to a table.
-
Select the columns you want from the sales reps table. In this case, you'll pull each sales rep's id, name, and region id.
SELECT sr.id AS salesrepid, sr.name AS salesrepname, sr.regionid AS salesrepregionid FROM demo.sales_reps sr
-
LEFT JOIN
the subquery (aliased asa
) onsales_rep_id
.SELECT sr.id AS sales_rep_id, sr.name AS sales_rep_name, sr.region_id AS sales_rep_region_id FROM demo.sales_reps sr
LEFT JOIN( SELECT o.id AS orderid, sr.id as salesrepid FROM demo.orders o JOIN demo.accounts a ON o.accountid = a.id JOIN demo.salesreps sr ON sr.id = a.salesrepid WHERE o.occurredat BETWEEN '2016-09-05' AND '2016-09-15 23:59:59' ) a ON sr.id = a.salesrepid
-
Finally, exclude sales reps that are associated with an order in your subquery by adding a
WHERE a.order_id is NULL
clause.SELECT sr.id AS sales_rep_id, sr.name AS sales_rep_name, sr.region_id AS sales_rep_region_id FROM demo.sales_reps sr
LEFT JOIN( SELECT o.id AS orderid, sr.id as salesrepid FROM demo.orders o JOIN demo.accounts a ON o.accountid = a.id JOIN demo.salesreps sr ON sr.id = a.salesrepid WHERE o.occurredat BETWEEN '2016-09-05' AND '2016-09-15 23:59:59' ) a ON sr.id = a.salesrepid WHERE a.order_id is NULL