Python Tutorial
Learn Python for business analysis using real-world data. No coding experience necessary.
Start Now
Mode Studio
The Collaborative Data Science Platform
SQL Date Format
Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning.
In this lesson we'll cover:
This lesson uses the same data from previous lessons, which was pulled from Crunchbase on Feb. 5, 2014. Learn more about this dataset.
Why dates are formatted year-first
If you live in the United States, you're probably used to seeing dates formatted as MM-DD-YYYY or a similar, month-first format. It's an odd convention compared to the rest of the world's standards, but it's not necessarily any worse than DD-MM-YYYY. The problem with both of these formats is that when they are stored as strings, they don't sort in chronological order. For example, here's a date field stored as a string. Because the month is listed first, the ORDER BY
statement doesn't produce a chronological list:
SELECT permalink,
founded_at
FROM tutorial.crunchbase_companies_clean_date
ORDER BY founded_at
You might think that converting these values from string
to date
might solve the problem, but it's actually not quite so simple. Mode (and most relational databases) format dates as YYYY-MM-DD, a format that makes a lot of sense because it will sort in the same order whether it's stored as a date or as a string. Excel is notorious for producing date formats that don't play nicely with other systems, so if you're exporting Excel files to CSV and uploading them to Mode, you may run into this a lot.
Here's an example from the same table, but with a field that has a cleaned date. Note that the cleaned date field is actually stored as a string, but still sorts in chronological order anyway:
SELECT permalink,
founded_at,
founded_at_clean
FROM tutorial.crunchbase_companies_clean_date
ORDER BY founded_at_clean
The lesson on data cleaning provides some examples for converting poorly formatted dates into proper date-formatted fields.
Crazy rules for dates and times
Assuming you've got some dates properly stored as a date
or time
data type, you can do some pretty powerful things. Maybe you'd like to calculate a field of dates a week after an existing field. Or maybe you'd like to create a field that indicates how many days apart the values in two other date fields are. These are trivially simple, but it's important to keep in mind that the data type of your results will depend on exactly what you are doing to the dates.
When you perform arithmetic on dates (such as subtracting one date from another), the results are often stored as the interval
data type—a series of integers that represent a period of time. The following query uses date subtraction to determine how long it took companies to be acquired (unacquired companies and those without dates entered were filtered out). Note that because the companies.founded_at_clean
column is stored as a string, it must be cast as a timestamp before it can be subtracted from another timestamp.
SELECT companies.permalink,
companies.founded_at_clean,
acquisitions.acquired_at_cleaned,
acquisitions.acquired_at_cleaned -
companies.founded_at_clean::timestamp AS time_to_acquisition
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
ON acquisitions.company_permalink = companies.permalink
WHERE founded_at_clean IS NOT NULL
In the example above, you can see that the time_to_acquisition
column is an interval, not another date.
You can introduce intervals using the INTERVAL
function as well:
SELECT companies.permalink,
companies.founded_at_clean,
companies.founded_at_clean::timestamp +
INTERVAL '1 week' AS plus_one_week
FROM tutorial.crunchbase_companies_clean_date companies
WHERE founded_at_clean IS NOT NULL
The interval is defined using plain-English terms like '10 seconds' or '5 months'. Also note that adding or subtracting a date
column and an interval
column results in another date
column as in the above query.
You can add the current time (at the time you run the query) into your code using the NOW()
function:
SELECT companies.permalink,
companies.founded_at_clean,
NOW() - companies.founded_at_clean::timestamp AS founded_time_ago
FROM tutorial.crunchbase_companies_clean_date companies
WHERE founded_at_clean IS NOT NULL
Sharpen your SQL skills
If you're interested in diving deeper and testing your skills, explore our 7 ways to use date functions for business analysis and take a practice problem below:
Practice Problem
Write a query that counts the number of companies acquired within 3 years, 5 years, and 10 years of being founded (in 3 separate columns). Include a column for total companies acquired as well. Group by category and limit to only rows with a founding date.
Try it out See the answerNext Lesson
Data Wrangling with SQL