SQL INNER JOIN
Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.
In this lesson we'll cover:
In the previous lesson, you learned the basics of SQL joins using a data about college football players. All of the players in the
players table match to one school in the
teams table. But what if the data isn't so clean? What if there are multiple schools in the
teams table with the same name? Or if a player goes to a school that isn't in the
If there are multiple schools in the
teams table with the same name, each one of those rows will get joined to matching rows in the
players table. Returning to the previous example with Michael Campanaro, if there were three rows in the
teams table where
school_name = 'Wake Forest', the join query above would return three rows with Michael Campanaro.
It's often the case that one or both tables being joined contain rows that don't have matches in the other table. The way this is handled depends on whether you're making an inner join or an outer join.
We'll start with inner joins, which can be written as either
JOIN benn.college_football_teams teams or
INNER JOIN benn.college_football_teams teams. Inner joins eliminate rows from both tables that do not satisfy the join condition set forth in the
ON statement. In mathematical terms, an inner join is the intersection of the two tables.
Therefore, if a player goes to a school that isn't in the
teams table, that player won't be included in the result from an inner join. Similarly, if there are schools in the
teams table that don't match to any schools in the
players table, those rows won't be included in the results either.
When you join two tables, it might be the case that both tables have columns with identical names. In the below example, both tables have columns called
SELECT players.*, teams.* FROM benn.college_football_players players JOIN benn.college_football_teams teams ON teams.school_name = players.school_name
The results can only support one column with a given name—when you include 2 columns of the same name, the results will simply show the exact same result set for both columns even if the two columns should contain different data. You can avoid this by naming the columns individually. It happens that these two columns will actually contain the same data because they are used for the join key, but the following query technically allows these columns to be independent:
SELECT players.school_name AS players_school_name, teams.school_name AS teams_school_name FROM benn.college_football_players players JOIN benn.college_football_teams teams ON teams.school_name = players.school_name
SQL Outer Joins