SQL Tutorial
Basic SQL
Intermediate SQL
Putting it together
SQL Aggregate Functions
SQL COUNT
SQL SUM
SQL MIN/MAX
SQL AVG
SQL GROUP BY
SQL HAVING
SQL CASE
SQL DISTINCT
SQL Joins
SQL INNER JOIN
SQL Outer Joins
SQL LEFT JOIN
SQL RIGHT JOIN
SQL Joins Using WHERE or ON
SQL FULL OUTER JOIN
SQL UNION
SQL Joins with Comparison Operators
SQL Joins on Multiple Keys
SQL Self Joins
Advanced SQL
SQL Analytics Training
Python Tutorial
Learn Python for business analysis using real-world data. No coding experience necessary.
Start Now
Mode Studio
The Collaborative Data Science Platform
SQL UNION
Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.
In this lesson we'll cover:
The SQL UNION operator
SQL joins allow you to combine two datasets side-by-side, but UNION
allows you to stack one dataset on top of the other. Put differently, UNION
allows you to write two separate SELECT
statements, and to have the results of one statement display in the same table as the results from the other statement.
Let's try it out with the Crunchbase investment data, which has been split into two tables for the purposes of this lesson. The following query will display all results from the first portion of the query, then all results from the second portion in the same table:
SELECT *
FROM tutorial.crunchbase_investments_part1
UNION
SELECT *
FROM tutorial.crunchbase_investments_part2
Note that UNION
only appends distinct values. More specifically, when you use UNION
, the dataset is appended, and any rows in the appended table that are exactly identical to rows in the first table are dropped. If you'd like to append all the values from the second table, use UNION ALL
. You'll likely use UNION ALL
far more often than UNION
. In this particular case, there are no duplicate rows, so UNION ALL
will produce the same results:
SELECT *
FROM tutorial.crunchbase_investments_part1
UNION ALL
SELECT *
FROM tutorial.crunchbase_investments_part2
SQL has strict rules for appending data:
- Both tables must have the same number of columns
- The columns must have the same data types in the same order as the first table
While the column names don't necessarily have to be the same, you will find that they typically are. This is because most of the instances in which you'd want to use UNION
involve stitching together different parts of the same dataset (as is the case here).
Since you are writing two separate SELECT
statements, you can treat them differently before appending. For example, you can filter them differently using different WHERE
clauses.
Sharpen your SQL skills
Practice Problem
Write a query that appends the two crunchbase_investments
datasets above (including duplicate values). Filter the first dataset to only companies with names that start with the letter "T", and filter the second to companies with names starting with "M" (both not case-sensitive). Only include the company_permalink
, company_name
, and investor_name
columns.
For a bit more of a challenge:
Practice Problem
Write a query that shows 3 columns. The first indicates which dataset (part 1 or 2) the data comes from, the second shows company status, and the third is a count of the number of investors.
Hint: you will have to use the tutorial.crunchbase_companies
table as well as the investments tables. And you'll want to group by status and dataset.
Next Lesson
SQL Joins with Comparison Operators