Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.
The SQL FULL JOIN command
You’re not likely to use
FULL JOIN (which can also be written as
FULL OUTER JOIN) too often, but it’s worth covering anyway.
LEFT JOIN and
RIGHT JOIN each return unmatched rows from one of the tables—
FULL JOIN returns unmatched rows from both tables. It is commonly used in conjunction with aggregations to understand the amount of overlap between two tables.
Here’s an example using the Crunchbase companies and acquisitions tables:
SELECT COUNT(CASE WHEN companies.permalink IS NOT NULL AND acquisitions.company_permalink IS NULL
THEN companies.permalink ELSE NULL END) AS companies_only,
COUNT(CASE WHEN companies.permalink IS NOT NULL AND acquisitions.company_permalink IS NOT NULL
THEN companies.permalink ELSE NULL END) AS both_tables,
COUNT(CASE WHEN companies.permalink IS NULL AND acquisitions.company_permalink IS NOT NULL
THEN acquisitions.company_permalink ELSE NULL END) AS acquisitions_only
FROM tutorial.crunchbase_companies companies
FULL JOIN tutorial.crunchbase_acquisitions acquisitions
ON companies.permalink = acquisitions.company_permalink
One important thing to keep in mind is that you must count from the
crunchbase_acquisitions table in order to get unmatched rows in that table—if you were to count
companies.permalink as in the first two columns, you would get a result of 0 in the third column because it would be counting up a bunch of null values.
You might also notice that surprisingly few rows in the
crunchbase_acquisitions table were matched to the
crunchbase_companies table. If this were a real assignment, you’d probably want to look at some individual rows to get a sense of why some of them weren’t matched and whether or not you should consider finding more/better data.
Sharpen your SQL skills
This practice problem uses Crunchbase investment data described in a previous lesson. The Crunchbase Investments table has been split into two parts for the sake of this exercise.
Write a query that joins
tutorial.crunchbase_investments_part1 using a
FULL JOIN. Count up the number of rows that are matched/unmatched as in the example above.