SQL Tutorial
Basic SQL
Intermediate SQL
Putting it together
SQL Aggregate Functions
SQL COUNT
SQL SUM
SQL MIN/MAX
SQL AVG
SQL GROUP BY
SQL HAVING
SQL CASE
SQL DISTINCT
SQL Joins
SQL INNER JOIN
SQL Outer Joins
SQL LEFT JOIN
SQL RIGHT JOIN
SQL Joins Using WHERE or ON
SQL FULL OUTER JOIN
SQL UNION
SQL Joins with Comparison Operators
SQL Joins on Multiple Keys
SQL Self Joins
Advanced SQL
SQL Analytics Training
Python Tutorial
Learn Python for business analysis using real-world data. No coding experience necessary.
Start Now
Mode Studio
The Collaborative Data Science Platform
SQL GROUP BY
Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.
In this lesson we'll cover:
- The SQL GROUP BY clause
- GROUP BY column numbers
- Using GROUP BY with ORDER BY
- Using GROUP BY with LIMIT
- Practice problems
The SQL GROUP BY clause
SQL aggregate function like COUNT
, AVG
, and SUM
have something in common: they all aggregate across the entire table. But what if you want to aggregate only part of a table? For example, you might want to count the number of entries for each year.
In situations like this, you'd need to use the GROUP BY
clause. GROUP BY
allows you to separate data into groups, which can be aggregated independently of one another. Here's an example using the Apple stock prices dataset:
SELECT year,
COUNT(*) AS count
FROM tutorial.aapl_historical_stock_price
GROUP BY year
You can group by multiple columns, but you have to separate column names with commas—just as with ORDER BY
):
SELECT year,
month,
COUNT(*) AS count
FROM tutorial.aapl_historical_stock_price
GROUP BY year, month
Practice Problem
Calculate the total number of shares traded each month. Order your results chronologically.
Try it out See the answerGROUP BY column numbers
As with ORDER BY
, you can substitute numbers for column names in the GROUP BY
clause. It's generally recommended to do this only when you're grouping many columns, or if something else is causing the text in the GROUP BY
clause to be excessively long:
SELECT year,
month,
COUNT(*) AS count
FROM tutorial.aapl_historical_stock_price
GROUP BY 1, 2
Note: this functionality (numbering columns instead of using names) is supported by Mode, but not by every flavor of SQL, so if you're using another system or connected to certain types of databases, it may not work.
Using GROUP BY with ORDER BY
The order of column names in your GROUP BY
clause doesn't matter—the results will be the same regardless. If you want to control how the aggregations are grouped together, use ORDER BY
. Try running the query below, then reverse the column names in the ORDER BY
statement and see how it looks:
SELECT year,
month,
COUNT(*) AS count
FROM tutorial.aapl_historical_stock_price
GROUP BY year, month
ORDER BY month, year
Using GROUP BY with LIMIT
There's one thing to be aware of as you group by multiple columns: SQL evaluates the aggregations before the LIMIT
clause. If you don't group by any columns, you'll get a 1-row result—no problem there. If you group by a column with enough unique values that it exceeds the LIMIT
number, the aggregates will be calculated, and then some rows will simply be omitted from the results.
This is actually a nice way to do things because you know you're going to get the correct aggregates. If SQL cuts the table down to 100 rows, then performed the aggregations, your results would be substantially different. The above query's results exceed 100 rows, so it's a perfect example. Try removing the limit and running it again to see what changes.
Sharpen your SQL skills
Practice Problem
Write a query to calculate the average daily price change in Apple stock, grouped by year.
Try it out See the answerPractice Problem
Write a query that calculates the lowest and highest prices that Apple stock achieved each month.
Try it out See the answerLearn about the difference between "Group By" in SQL and Python.
Next Lesson
SQL HAVING