SQL Tutorial
Basic SQL
Intermediate SQL
Putting it together
SQL Aggregate Functions
SQL COUNT
SQL SUM
SQL MIN/MAX
SQL AVG
SQL GROUP BY
SQL HAVING
SQL CASE
SQL DISTINCT
SQL Joins
SQL INNER JOIN
SQL Outer Joins
SQL LEFT JOIN
SQL RIGHT JOIN
SQL Joins Using WHERE or ON
SQL FULL OUTER JOIN
SQL UNION
SQL Joins with Comparison Operators
SQL Joins on Multiple Keys
SQL Self Joins
Advanced SQL
SQL Analytics Training
Python Tutorial
Learn Python for business analysis using real-world data. No coding experience necessary.
Start Now
Mode Studio
The Collaborative Data Science Platform
SQL Joins Using WHERE or ON
Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.
In this lesson we'll cover:
Filtering in the ON clause
Normally, filtering is processed in the WHERE
clause once the two tables have already been joined. It's possible, though that you might want to filter one or both of the tables before joining them. For example, you only want to create matches between the tables under certain circumstances.
Using Crunchbase data, let's take another look at the LEFT JOIN
example from an earlier lesson (this time we'll add an ORDER BY
clause):
SELECT companies.permalink AS companies_permalink,
companies.name AS companies_name,
acquisitions.company_permalink AS acquisitions_permalink,
acquisitions.acquired_at AS acquired_date
FROM tutorial.crunchbase_companies companies
LEFT JOIN tutorial.crunchbase_acquisitions acquisitions
ON companies.permalink = acquisitions.company_permalink
ORDER BY 1
Compare the following query to the previous one and you will see that everything in the tutorial.crunchbase_acquisitions
table was joined on except for the row for which company_permalink
is '/company/1000memories'
:
SELECT companies.permalink AS companies_permalink,
companies.name AS companies_name,
acquisitions.company_permalink AS acquisitions_permalink,
acquisitions.acquired_at AS acquired_date
FROM tutorial.crunchbase_companies companies
LEFT JOIN tutorial.crunchbase_acquisitions acquisitions
ON companies.permalink = acquisitions.company_permalink
AND acquisitions.company_permalink != '/company/1000memories'
ORDER BY 1
What's happening above is that the conditional statement AND...
is evaluated before the join occurs. You can think of it as a WHERE
clause that only applies to one of the tables. You can tell that this is only happening in one of the tables because the 1000memories permalink is still displayed in the column that pulls from the other table:
Filtering in the WHERE clause
If you move the same filter to the WHERE
clause, you will notice that the filter happens after the tables are joined. The result is that the 1000memories row is joined onto the original table, but then it is filtered out entirely (in both tables) in the WHERE
clause before displaying results.
SELECT companies.permalink AS companies_permalink,
companies.name AS companies_name,
acquisitions.company_permalink AS acquisitions_permalink,
acquisitions.acquired_at AS acquired_date
FROM tutorial.crunchbase_companies companies
LEFT JOIN tutorial.crunchbase_acquisitions acquisitions
ON companies.permalink = acquisitions.company_permalink
WHERE acquisitions.company_permalink != '/company/1000memories'
OR acquisitions.company_permalink IS NULL
ORDER BY 1
You can see that the 1000memories line is not returned (it would have been between the two highlighted lines below). Also note that filtering in the WHERE
clause can also filter null values, so we added an extra line to make sure to include the nulls.
Sharpen your SQL skills
For this set of practice problems, we're going to introduce a new dataset: tutorial.crunchbase_investments
. This table is also sourced from Crunchbase and contains much of the same information as the tutorial.crunchbase_companies
data. It it structured differently, though: it contains one row per investment. There can be multiple investments per company—it's even possible that one investor could invest in the same company multiple times. The column names are pretty self-explanatory. What's important is that company_permalink
in the tutorial.crunchbase_investments
table maps to permalink
in the tutorial.crunchbase_companies
table. Keep in mind that some random data has been removed from this table for the sake of this lesson.
It is very likely that you will need to do some exploratory analysis on this table to understand how you might solve the following problems.
Practice Problem
Write a query that shows a company's name, "status" (found in the Companies table), and the number of unique investors in that company. Order by the number of investors from most to fewest. Limit to only companies in the state of New York.
Try it out See the answerPractice Problem
Write a query that lists investors based on the number of companies in which they are invested. Include a row for companies with no investor, and order from most companies to least.
Try it out See the answerNext Lesson
SQL FULL OUTER JOIN