SQL Outer Joins

Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.

Outer joins

When performing an inner join, rows from either table that are unmatched in the other table are not returned. In an outer join, unmatched rows in one or both tables can be returned. There are a few types of outer joins:

  • LEFT JOIN returns only unmatched rows from the left table.
  • RIGHT JOIN returns only unmatched rows from the right table.
  • FULL OUTER JOIN returns unmatched rows from both tables.

As you work through the following lessons about outer joins, it might be helpful to refer to this JOIN visualization by Patrik Spathon.

The Crunchbase dataset

The data for the following lessons was pulled from Crunchbase, a crowdsourced index of startups, founders, investors, and the activities of all three. It was collected Feb. 5, 2014, and large portions of both tables were randomly dropped for the sake of this lesson. The first table lists a large portion of companies in the database; one row per company. The permalink field is a unique identifier for each row, and also shows the web address. For each company in the table, you can view its online Crunchbase profile by copying/pasting its permalink after Crunchbase’s web domain. For example, the third company in the table, “.Club Domains,” has the permalink “/company/club-domains,” so its profile address would be http://www.crunchbase.com/company/club-domains. The fields with "funding" in the name have to do with how much outside investment (in USD) each company has taken on. The rest of the fields are self-explanatory.

SELECT *
  FROM tutorial.crunchbase_companies

The second table lists acquisitions—one row per acquisition. company_permalink in this table maps to the permalink field in tutorial.crunchbase_companies as described in the previous lesson. Joining these two fields will add information about the company being acquired.

You'll notice that there is a separate field called acquirer_permalink as well. This can also be mapped to the permalink field tutorial.crunchbase_companies to add additional information about the acquiring company.

SELECT *
  FROM tutorial.crunchbase_acquisitions

The foreign key you use to join these two tables will depend entirely on whether you're looking to add information about the acquiring company or the company that was acquired.

It's worth noting that this sort of structure is common. For example, a table showing a list of emails sent might include a sender_email_address and a recipient_email_address, both of which map to a table listing email addresses and the names of their owners.

Next Lesson

SQL LEFT JOIN

Looks like you've got a thing for cutting-edge data news.

Get the latest.

Get started