mode helix
NOW LIVEEmpower your end users with Explorations in Mode.Try it now

Interesting Data Sets

A robust data set is usually the first step toward answering a question. We've collected articles including whacky and useful data sets for training machine learning models, practicing an analytical language, or finding compelling insights.

Developing a Database of Structural Racism–Related State Laws for Health Equity Research and Practice in the United States

“Although U.S. state laws shape population health and health equity, few studies have examined how state laws affect the health of marginalized racial/ethnic groups (e.g., Black, Indigenous, and Latinx populations) and racial/ethnic health inequities.”-SAGE Journals

From gmailr to the Google Books API

This fun project catalogs the 214 children’s books a mother and her daughters read in 2020, using library hold arrival notifications.-Piping Hot Data

Simple Anomaly Detection Using Plain SQL

Detect critical bugs in your code, without all the ad-hoc tools and dependencies.-Haki Benita

Teaching Different R Syntaxes to Beginners

Is base, formula, or tidy(verse) syntax best for teaching Intro to Statistics?-Amelia McNamara

Data Science Practice 101: Always Leave an Analysis Paper Trail

Any analysis deliverable should travel with documentation that shows the full path the analysis took, from the raw data pull all the way to the deliverable, including queries and code, and links to previous analyses and raw data dumps.-Counting Stuff

Basic Data Science Q&A

This is a great thread to scroll through: “What is a data science or career question that you're afraid is way too beginner, but would love to ask someone in the field?”-Data Science Renee

Data is Plural Archive

Find an interesting dataset by trawling through the backlog of Data is Plural, a weekly newsletter of interesting datasets.-Amelia Wattenberger

Webinar Recap: Datasets That We Wanted to Take a Second Look at in 2020

World events in 2020 put a big spotlight on data. From COVID maps to election updates to economy tracking, many of us kept refreshing data charts to get answers on what was coming next.-Mode Blog

The Impact of COVID-19 on Black Communities

In many states, the Black percentage of total cases/deaths is greater than the Black percentage of the state population.-Data for Black Lives

We’re Sharing Coronavirus Case Data for Every U.S. County

With no detailed government database on where the thousands of coronavirus cases have been reported, a team of New York Times journalists is attempting to track every case. You can download the county-level data on Github.-The New York Times

Benchmark Dataset for Data-driven Weather Forecasting

Global weather forecasting is done with physical models, which are very good for most applications but not good at predicting specific events. AI weather forecasting may provide a solution, and this dataset gets us one step closer.-Pangeo Data

3DPeople Dataset

The first large scale dataset of dressed humans with 3D clothes. It contains approximately two million frames of 80 people performing 70 actions.-3DPeople Dataset

UR-FUNNY

Who says humor is subjective? This dataset (built using TED Talk transcripts with laughter cues) can be used for humor detection and other humor analyses.-Rochester Human-Computer Interaction

NYC Squirrel Census

Get your paws on this data set and go nuts!-Tidy Tuesday

Level 5 Dataset

Lyft open-sourced their autonomous driving dataset from their Level 5 self-driving fleet, including raw sensor camera and LiDAR inputs.-Lyft

Earth Engine Data Catalog

Google Earth's public data archive includes more than forty years of historical imagery and scientific datasets, spanning climate, weather, and night-time light.-Google Developers

Where Does the U.S. Government Keep Its Data?

The U.S. Federal government's statistical work doesn't end with the Census Bureau. In fact, there are 13 principal agencies that are key to data collection. Here's a list of all the data they publish (in API form where possible).-Sam Tyner

Election integrity data archive

Twitter published the full dataset of 9 million tweets from Russian troll farms and 1 million tweets from Iranian ones. Unbox your 8TB drive and get crackin'.-Twitter

More Cool Public Datasets and Lots of Ideas for Exploring Them

In the spirit of encouraging data discovery and exploration, here are 5 public datasets, along with some questions you might ask and interesting visualizations you could make for each.-Mode

The Strawberry Capital of the World is the early death capital of the U.S.: lessons from a landmark dataset

The U.S. National Center for Health Statistics has released the most detailed local health data ever. See how your neighborhood stacks up against the national average life expectancy.-Wonkblog

Why We’re Sharing 3 Million Russian Troll Tweets

In concert with two Clemson University professors, FiveThirtyEight has opened up the fullest empirical record to date of the “troll factory” Internet Research Agency's actions on social media.-FiveThirtyEight

Census Oddities

So many analyses are built on data from the U.S. Census and American Community Survey, but those datasets have their own quirks you need to watch out for.-Carto Blog

US House PSCI Social Media Ads

Last Thursday, Democratic members of the House Intelligence Committee released 8.8 gigabytes of information about Facebook ads paid for by Russians attempting to interfere in American politics. The data has since been converted to a CSV, so you can explore it for yourself.-data.world

Need a ratings boost? Make a Halloween episode.

This analysis of over 24,000 episode ratings from 184 television shows proves that Halloween TV episodes aren’t just filler.-Kaylin Walker

The Anatomy of a Thousand Typefaces

Say goodbye to endlessly scrolling through the font menu in your word processor. Instead, use this database of typefaces, classified by characteristics like width, spacing, and stroke contrast.-Florian Schulz

9 Elements of Deal-Closing Sales Demos, According to New Data

Forward this one to your sales team. This is yet another good example of a company using their proprietary dataset (in this case, recordings of sales calls) to tell stories and generate interest in their brand.-Gong.io

Quick, Draw! The Data

These doodles are a unique data set that can help developers train new neural networks, help researchers see patterns in how people around the world draw, and help artists create things we haven’t begun to think of.-Google

We’re Sharing A Vast Trove Of Federal Payroll Records

Buzzfeed, via the Freedom of Information Act, got their hands on a dataset comprising four decades of salaries, titles, and demographic details about millions of U.S. government employees, as well as how they moved through the federal bureaucracy.-Buzzfeed

3 Million Instacart Orders, Open Sourced

Instacart has released an anonymized dataset containing a sample of over 3 million grocery orders from more than 200,000 users. Download the data and dig in.-Engineering at Instacart

Executive Office of the President Open Data Archive Backup

Data downloaded from the White House website on January 20, 2017.-Maxwell Ogden

TrumpWorld Data

Buzzfeed put together a dataset to shed light on Trump’s giant network of businesses, investments, and corporate connections. Right now, it includes more than 1,700 people and organizations. Explore the data yourself via Github or Google Sheets.-Buzzfeed

CoolDatasets

Follow this brand new Twitter account for tons of open, online datasets.-Twitter

The DataRefuge Project

DataRescue events create trustworthy copies of federal climate and environmental data, while the Internet Archive, datarefuge.org, and a consortium of major research libraries holds these copies.-PPEH Lab

Academic Torrents

Getting your hands on interesting data can be a chore. Some clever folks at the University of Massachusetts put together a platform for distributing datasets and research papers with BitTorrent technology.-Academic Torrents

20 Weird & Wonderful Datasets for Machine Learning

Getting your hands on a robust dataset is the hardest part of machine learning. Finding interesting datasets is tougher still. From UFO sightings to beautiful Flickr photos, you’re sure to find something to train your model.-Oliver Cameron

San Francisco Housing Construction History

When someone mentions San Francisco’s housing shortage, they usually cite a limited dataset containing San Francisco Chronicle rental listings from 1979-2001. Eric Fischer took it upon himself to collect decades of new information by transcribing Chronicle rental ads from 1948-1979 and Craigslist rental listings from 2001 onward.-Eric Fischer

Zika Data Guide

It’s surprisingly hard to find data on the Zika virus outbreak. That’s why Buzzfeed’s Jeremy Singer-Vine put together a collection of links to of Zika datasets for people to contribute to and use for reference.-Buzzfeed

Yahoo News Feed

A collection of 110 billion Yahoo News user actions, and the largest publicly-released machine learning dataset to date.-Yahoo Labs

decorative particle

Get our weekly data newsletter

Work-related distractions for every data enthusiast.