Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.
When performing an inner join, rows from either table that are unmatched in the other table are not returned. In an outer join, unmatched rows in one or both tables can be returned. There are a few types of outer joins:
LEFT JOIN returns only unmatched rows from the left table.
RIGHT JOIN returns only unmatched rows from the right table.
FULL OUTER JOIN returns unmatched rows from both tables.
As you work through the following lessons about outer joins, it might be helpful to refer to this JOIN visualization by Patrik Spathon.
The Crunchbase dataset
The data for the following lessons was pulled from Crunchbase, a crowdsourced index of startups, founders, investors, and the activities of all three. It was collected Feb. 5, 2014, and large portions of both tables were randomly dropped for the sake of this lesson. The first table lists a large portion of companies in the database; one row per company. The
permalink field is a unique identifier for each row, and also shows the web address. For each company in the table, you can view its online Crunchbase profile by copying/pasting its permalink after Crunchbase’s web domain. For example, the third company in the table, “.Club Domains,” has the permalink “/company/club-domains,” so its profile address would be http://www.crunchbase.com/company/club-domains. The fields with “funding” in the name have to do with how much outside investment (in USD) each company has taken on. The rest of the fields are self-explanatory.
The second table lists acquisitions—one row per acquisition.
company_permalink in this table maps to the
permalink field in
tutorial.crunchbase_companies as described in the previous lesson. Joining these two fields will add information about the company being acquired.
You’ll notice that there is a separate field called
acquirer_permalink as well. This can also be mapped to the
tutorial.crunchbase_companies to add additional information about the acquiring company.
The foreign key you use to join these two tables will depend entirely on whether you’re looking to add information about the acquiring company or the company that was acquired.
It’s worth noting that this sort of structure is common. For example, a table showing a list of emails sent might include a
sender_email_address and a
recipient_email_address, both of which map to a table listing email addresses and the names of their owners.