mode helix
NEWIntroducing Helix—the first instant, responsive data engine.Learn more

SQL DISTINCT

Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.

Using SQL DISTINCT for viewing unique values

You'll occasionally want to look at only the unique values in a particular column. You can do this using SELECT DISTINCT syntax. To select unique values from the month column in the Apple stock prices dataset, you'd use the following query:

SELECT DISTINCT month
  FROM tutorial.aapl_historical_stock_price

If you include two (or more) columns in a SELECT DISTINCT clause, your results will contain all of the unique pairs of those two columns:

SELECT DISTINCT year, month
  FROM tutorial.aapl_historical_stock_price

Note: You only need to include DISTINCT once in your SELECT clause—you do not need to add it for each column name.

Write a query that returns the unique values in the year column, in chronological order.

DISTINCT can be particularly helpful when exploring a new data set. In many real-world scenarios, you will generally end up writing several preliminary queries in order to figure out the best approach to answering your initial question. Looking at the unique values on each column can help identify how you might want to group or filter the data.

Using DISTINCT in aggregations

You can use DISTINCT when performing an aggregation. You'll probably use it most commonly with the COUNT function.

In this case, you should run the query below that counts the unique values in the month column.

SELECT COUNT(DISTINCT month) AS unique_months
  FROM tutorial.aapl_historical_stock_price

The results show that there are 12 unique values (other examples may be less obvious). That's a small enough number that you might be able to aggregate by month and interpret the results fairly early. For example, you might follow this up by taking average trade volumes by month to get a sense of when Apple stock really moves:

SELECT month,
       AVG(volume) AS avg_trade_volume
  FROM tutorial.aapl_historical_stock_price
 GROUP BY month
 ORDER BY 2 DESC

Okay, back to DISTINCT. You'll notice that DISTINCT goes inside the aggregate function rather than at the beginning of the SELECT clause. Of course, you can SUM or AVG the distinct values in a column, but there are fewer practical applications for them. For MAX and MIN, you probably shouldn't ever use DISTINCT because the results will be the same as without DISTINCT, and the DISTINCT function will make your query substantially slower to return results.

DISTINCT performance

It's worth noting that using DISTINCT, particularly in aggregations, can slow your queries down quite a bit. We'll cover this in greater depth in a later lesson.

Sharpen your SQL skills

Write a query that counts the number of unique values in the month column for each year.
Write a query that separately counts the number of unique values in the month column and the number of unique values in the `year` column.

Next Lesson

SQL CASE

Looks like you've got a thing for cutting-edge data news.

Get the latest.

Get started