SQL INNER JOIN
Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.
INNER JOIN
In the previous lesson, you learned the basics of SQL joins using a data about college football players. All of the players in the players
table match to one school in the teams
table. But what if the data isn’t so clean? What if there are multiple schools in the teams
table with the same name? Or if a player goes to a school that isn’t in the teams
table?
If there are multiple schools in the teams
table with the same name, each one of those rows will get joined to matching rows in the players
table. Returning to the previous example with Michael Campanaro, if there were three rows in the teams
table where school_name = 'Wake Forest'
, the join query above would return three rows with Michael Campanaro.
It’s often the case that one or both tables being joined contain rows that don’t have matches in the other table. The way this is handled depends on whether you’re making an inner join or an outer join.
We’ll start with inner joins, which can be written as either JOIN benn.college_football_teams teams
or INNER JOIN benn.college_football_teams teams
. Inner joins eliminate rows from both tables that do not satisfy the join condition set forth in the ON
statement. In mathematical terms, an inner join is the intersection of the two tables.
Therefore, if a player goes to a school that isn’t in the teams
table, that player won’t be included in the result from an inner join. Similarly, if there are schools in the teams
table that don’t match to any schools in the players
table, those rows won’t be included in the results either.
Joining tables with identical column names
When you join two tables, it might be the case that both tables have columns with identical names. In the below example, both tables have columns called school_name
:
SELECT players.*,
teams.*
FROM benn.college_football_players players
JOIN benn.college_football_teams teams
ON teams.school_name = players.school_name
The results can only support one column with a given name—when you include 2 columns of the same name, the results will simply show the exact same result set for both columns even if the two columns should contain different data. You can avoid this by naming the columns individually. It happens that these two columns will actually contain the same data because they are used for the join key, but the following query technically allows these columns to be independent:
SELECT players.school_name AS players_school_name,
teams.school_name AS teams_school_name
FROM benn.college_football_players players
JOIN benn.college_football_teams teams
ON teams.school_name = players.school_name