Learning Data Science
Many people have landed jobs as data scientists without any formal training because the internet is abundant in free resources for learning data science. This section includes tutorials for analytical languages such as SQL, Python, and R, career advice, and how-to posts about performing common tasks like A/B testing and
Art from Code
This workshop provides a hands-on introduction to generative art in R. You’ll learn artistic techniques that generative artists use regularly in their work including flow fields, iterative function systems, and more.-Danielle Navarro
The Data Config
A humble YAML file, with ambitions for more.-benn.substack
Python’s super() Considered Super!
If you aren’t wowed by Python’s super() builtin, chances are you don’t really know what it’s capable of doing or how to use it effectively.-Deep Thoughts by Raymond Hettinger
Success Metrics for Product Analytics
Product data scientists are not only there to inform decisions making with data, but also to quantify risks.-The Corner
Critical Dataset Studies Reading List
Datasets powerfully construct model worldviews, so they're important to study.-Knowing Machines
5 Tips for Using pins with R
The pins package makes it easy to share data, models, and other R objects across projects and with your colleagues.-sellorm
A Beginner's Introduction to Mixed Effects Models
For when you need to venture beyond the safe and comfortable land of basic linear regression models.-Meghan Hall
PostgreSQL Lessons We Learned the Hard Way
Investigating, understanding, fixing, and preventing troublesome database locks.-Compass True North
The Guide to Data Versioning
Already familiar with versioning code with git? Here’s how it works to version data using the same abstractions.-Whispering Data
A First Look at PyScript: Python in the Web Browser
This tutorial will get you up to speed with the much-loved PyScript, while the official documentation is still in the making.-Real Python
The Future of Data Science Anthology
“We named this project to intentionally reflect our desire to design for a future in data science: a future with more of a focus on creativity, yes, but also a future with more transparency, inclusiveness, and personal responsibility.”-Data Science by Design
Data Quality and Testing Frameworks
A short introduction to open-source data quality & testing tools — dbt, Deequ, and Great Expectations.-Servian
Software Development Resources for Data Scientists
Version control, automated testing, and other dev skills help create reproducible, production-ready code and tools.-R Studio Blog
Data Is an Art, Not Just a Science—and Storytelling Is the Key
Data science is a balancing act. Math and science have their role to play, but so do art and communication.-Shopify Engineering
Preston’s Paradox
Suppose every woman has fewer children than her mother. Average fertility would decrease and population growth would slow, right? Actually, no. According to Preston's paradox, fertility could increase, decrease, or stay the same.-Probably Overthinking It
Data Types in Arrow and R
Apache Arrow is a powerful multi-language toolbox for data exchange and data analysis. But to use it effectively you do need to learn more low-level concepts that R users like to skim over.-Notes from a Data Witch
Data Tests and the Broken Windows Theory
Despite all their promise, data tests too often end up not living up to their full potential.-Inside Data by Mikkel Dengsøe
Honeypot
This end-to-end real-time event collection, pipelining, and aggregation system allows you to rapidly bootstrap streaming analytics.-Silverton.io
A Better Way to Lie With Statistics
It’s not the data that gets us, but the adjectives that describe it.-benn.substack
Faking It: How to Simulate Complex Data Generation Processes in R, Tidyverse Edition
Data simulation is central to much of a social scientist’s work. Unfortunately, it’s rarely taught in graduate schools. This post attempts to rectify that.-A. Jordan Nafa
Bring Back Scenario Analysis!
Scenario analysis, while often being quantitatively not-that-complex, is different from descriptive and prescriptive statistics in one key way that is culturally very hard for data people: it forces you to assume things you don’t know.-The Analytics Engineering Roundup
IVS 0.1.0
Check out this new R package for working with intervals (like date ranges)!-Davis Vaughan
Transitioning from Academia to Industry
Some thoughts on a strange transformation that a few data folks go through, from the perspective of someone who did it slightly later than most.-Notes from a Data Witch
Lessons from the COVID Data Wizards
Data dashboards have been an important part of pandemic response and planning. What have their developers learned about communicating science in a crisis?-Nature
The Most Important Thing to Understand About Queues
“It’s counterintuitive, but once you understand it, you’ll have deeper insight into the behavior not just of CPUs and database thread pools, but also grocery store checkout lines, ticket queues, highways–really just a mind-blowing collection of systems.”-Dan Slimmon
The Froyo Data Shop
Data democratization doesn’t start at data discovery—it starts at data testing and context building.-Sarah’s Newsletter
The Ghosts in the Data Stack
An OLAP cube exorcism.-benn.substack
Cohort Analysis: An Introductory Guide for Better Retention
When done right, cohort analysis can unearth valuable information about different user segments that can help go-to-market and product departments focus their efforts with nuanced insights.-Mode Blog
Best (Artistic) Practices in R
Parameter restriction, reproducibility, and documentation are key.-Nicola Rennie
Programming as a Vehicle for Math
Most programs that rely on mathematics have an algorithm or formula at their core that’s indecipherable without learning the math it’s based on. But somehow it’s simple once you understand the math itself.-Halfspace
How I Became a "Not-Beginner" in R
How do you refer to yourself when you’re not a beginner anymore, but not yet an expert?-The Tidy Trekker
Building a Team of Internal R Packages
As your organization builds R packages for internal use, bear in mind that they have unique challenges (such as a smaller developer community) and opportunities (such as an intimate understanding of the problem space and over-arching organizational goals).-Emily Riederer
Low Process Culture, High Process Culture
This is a useful read for literally anyone, not just data folks.-Accidentally in Code
What's in a Name? The Data Scientist Vs. Machine Learning Engineer Title Bore
After almost 10 years of frankly insane hype around artificial intelligence and machine learning it may feel a little strange to have the industry reorienting around a sentiment that 90% of machine learning development is just bog-standard software engineering, but here we are.-Jonathon Belotti
Who’s Behind the Numbers? A Conversation with Dr. Angela Baltes – Data Scientist, Informaticist & Senior Research Analyst
In this interview, Dr. Baltes shares how she made her way to data science as an African-American woman on the autism spectrum.-Mode Blog
startr: A Template for Data Journalism Projects in R
This project structures the data analysis process, reducing the amount of time you'll spend setting up and maintaining a project. Essentially, it's an "opinionated framework" like Django, Ruby on Rails or React, but for data journalism.-The Globe and Mail
The Easiest Way to Create an Interactive Dashboard in Python
It take just a little bit of code to turn Pandas pipelines into a dashboard using hvPlot .interactive.-Towards Data Science
Why I Quit Data Science
It’s always interesting to hear why someone left their field. This author shares their reasons for ditching machine learning in favor of software engineering.-Nirant Kasliwal
PostgreSQL Date Functions: 7 Business Analysis Examples
To separate the useful from the obscure, we're sharing how-tos for the most frequently used Postgres date functions and business scenarios where they come in handy.-Mode Blog
Using Databases with Shiny
Many analysts may be familiar with querying relational databases to retrieve data, but managing a database for use with a web application is slightly more complex. You’ll find yourself needing to define tables, secure data, and manage connections.-The Data Leader’s Survival
Why is “Data Scientist” Such a Controversial Title?
The role of a “scientist” is something that industry has yet to capitalize on, and that if we imprudently get rid of it, we’ll be missing an opportunity to foster a valuable mindset within the product development process.-The Data Leader’s Survival
How to Switch from Excel to SQL
SQL databases can handle enormous amounts of data without suffering the performance issues of Excel and have an orderly structure that protects the integrity of your data.-Mode Blog
Introduction to R
This is for the folks out there who have heard about R or want to learn, but have no idea where to start.-Jenny Sloane
Thinking About Failure in Data Analysis
The traditional notions of success and failure would seem to suggest that we should favor success over failure. But in the data analysis context, what we need to consider is how an analysis can go from success to failure and vice versa.-Simply Statistics
How to Leverage a Data Science Background as a Product Manager
PMs with a data science background are uniquely positioned to push for early investment in a proactive data culture.-Anita Mehrotra
Analyzing JSON Data With SQL
Here’s a detailed, practical example on how to restructure JSON into a tabular format.-Mode Blog
Thinking Outside the Grid - A “Bare Bones” Intro to Rtistry Concepts in R Using ggplot
This is a great intro to creative coding and generative art in R. It’s not as intimidating as you think!-The Tidy Trekker
Data from Images
This R package can help you extract data that’s trapped in images or PDFs.-Lisa DeBruine
Free Resources to Learn Python
A father asked Twitter for recommendations, so he can learn Python alongside his 14-year-old son. And boy, he got a lot of responses!-Michael McGill
Python Packages
Python packages are how you create organized, reusable, and shareable code in Python. This open source book describes modern and efficient workflows for creating those packages.-Tomas Beuzen & Tiffany Timbers
My First API: Which NFL Games Should I Watch?
This is a clever idea for figuring out which games will be the most satisfying to watch, when there are just too many games going on in a given week. Importantly, the suggestions stay spoiler-free.-Jenny Sloane
Teaching the tidyverse in 2021
Teaching R and the tidyverse soon? Here's a blog post rounding up the updates to the tidyverse over the year, with tips on how you might incorporate them in your lessons.-Tidyverse
Nine Tools I Wish I Mastered Before My PhD in Machine Learning
Whether you’re building a start up or making scientific breakthroughs, these tools will bring your machine learning pipeline to the next level.-Towards Data Science
Twelve Software Design Tips for Data Scientists
“Since many big programs begin life as small scripts, knowing how to design in the large helps ensure that what's done in the small is pointed in the right direction.”-Greg Wilson
An Old Hacker's Tips On Staying Employed
This advice from an engineer (with a career spanning over three decades) is applicable to anyone. The “Two-and-Done Rule” is particularly insightful.-The Mad Ned Memo
Exploring R² and Regression Variance With Euler/Venn Diagrams
Speaking of R²... here’s how to create diagrams for explaining shared variation in regression models using R.-Andrew Heiss
How We Use Pseudo-R² to Automate Analysis Suggestions at Heap
“[This post] shares the math behind one of my favorite model evaluation metrics, McFadden's pseudo-R², and how we've used it to build a feature that proactively guides an analysis.”-Heap Blog
introverse
This package is a one-stop-shop for computing novices (read: no coding experience, period) heading into R and the tidyverse.-Stephanie J. Spielman
Keep Your R Scripts Locally Sourced
A really bad debugging session completely broke this person’s mental model for how one bit of R code should work.-Higher Order Functions
A Survey of Mathy Jobs
What sort of “mathy” jobs there are, beyond the standard of academia and teaching?-Halfspace
Making Shiny Apps Mobile Friendly
Making Shiny apps work on mobile is actually very easy with the Bootstrap framework. You just need to work a little bit with HTML and CSS.-Jacqueline Nolis
We the Purple People
The data world needs more generalists who can navigate both the business context and the modern data stack.-dbt
Statistical Significance, p-Values, and the Reporting of Uncertainty
Statistical significance has been over-emphasized in empirical research. In many cases where decision makers are faced with deciding whether to implement a new policy or not, confidence intervals are a more useful way of communicating uncertainty of point estimates.-American Economic Association
10 Communities for Underrepresented Data Scientists
Some of these groups, which provide resources and host technical and professional meetups, are long-established with large membership numbers. Others are still emerging. Each one is worth a look.-Built In
How to Become a Better R Code Detective?
How do you go from viewing the code as a big messy ball of wool to a logical diagram that you can bend to your will? These tips will help you get better at debugging and reading R code written by someone else (or, you know, you from two years ago).-Maëlle Salmon
Tracking Impact and Measuring Success in Data Education Events
There are so many educational events for training new data scientists these days. If you’re facilitating such an event, how do you know what you’re doing worked?-Code for Science & Society Event Fund
Types of Product Usage Segmentation
Product usage segmentation is a powerful method for developing rich, actionable insights from your product usage data. Done correctly, it can unlock personalized experiences, increase user retention, and inform your product roadmap.-Mode Blog
Regression, Fire, and Dangerous Things (1/3)
Okay, so correlation does not imply causation. What now?-Elements of Evolutionary Anthropology
A Practitioner's Guide for Measuring Movements
The Sunrise Movement’s Data Director shares five potent metrics for assessing people power that anyone with a data warehouse and basic SQL skills can replicate for their own organization.-Brittany Bennett
The Case Against SQL Formatting
Our job as query writers isn’t to be mechanical scribes; it’s to format our work so that it’s easy to interpret.-benn.substack
The Humble Hash Aggregate
“If I had to pick a single programming concept where understanding it is like a superpower, it would probably be the hash map (aka in Python, the humble dictionary) because I've seen the pattern come up in almost every kind of data/programming work I've ever done.”-Vicki Boykis
One Little Thing: Reusing Code Chunks and Chunk Options with knitr
These methods make it more flexible to author and program a knitr document.-Yihui Xie
Understanding the Data (Error) Generating Processes for Data Validation
“But understanding some of the key failure modes faced by data producers can support data validation by helping consumers develop more realistic theories and expectations for the ways data may ‘break’ and how to refine strategies for detection them.”-Emily Riederer
Flat Data
If you’re already sharing and working with data in GitHub, Flat Data is for you. It makes it easy to get your datasets into GitHub repositories and version and share them.-OCTO
The Downfall of the Data Engineer
While there are many positive aspects of data engineering, there are a lot of downsides too: boredom, change management, and not having a seat at the table.-Maxime Beauchemin
A Guide to Marketplace Survival and Growth
Data leaders at Lyft, Domain, and Patreon share how they helped teams make better decisions with real-time data, leading and lagging growth indicators, and data heroes.-Mode
Lessons Learned from Building a Silly Twitter Bot
“Learning new tech skills is difficult. Therefore, you should tackle problems that bring as much joy as possible while also avoiding email entanglements or late-night Slack boondoggles.”-Brad Weiner
Hosting SQLite Databases on Github Pages
This post gets a hearty recommendation: “This is mind-blowing. If your static page needs to run some queries on a large dataset but without having to load the whole dataset with the page, you can statically host the dataset–and query it using SQLite. It uses HTTP range requests to fetch what it needs.”-phiresky
The Difference Between “Prevalence” and “Incidence” and Why We Care
You may have seen these terms in the news lately in relation to COVID-19. They’re important for helping you understand the impact of a disease on your community.-Data Literacy
What I Learned From Attending DataOps Unleashed 2021
Couldn’t make this conference? This excellent post covers talks about establishing data predictability, increasing reliability, and creating economic efficiencies in data pipelines.-James Le
Comprehensive Date-Time Handling for R
clock is a new package providing a comprehensive set of tools for working with date-times, including new date-time types built to reduce the agony of working with time zones.-Tidyverse
How to Avoid Inflated A/B Test Results
For most companies, the process for analyzing A/B tests is based on “null hypothesis significance testing.” But this process often comes up short.-Mode Blog
Time to Switch?
“A controlled randomized experiment is often seen as a gold standard in experimental research. But in many situations ethical considerations make it impossible to conduct such a ‘real’ experiment... The ‘switching-replication’ design can come to the rescue!”-Reproducible Stats in Education Sciences
What’s Missing? Reduce Bias by Addressing Data Gaps in Your Analysis Process
This series covers real world examples of how data can go missing at places like the Oregon Health Authority, Science Magazine, and Spotify—and what you can do about it.-This Is Important
Favorite Applied Articles Using Bayesian Statistics
A good Twitter thread to scroll through!-Solomon Kurz
The Data is in the Details
A great R tutorial for how to extract data from difficult Excel files, complete with a public repo for you to test the code on.-Gavin Masterson
Introductory Time Series Forecasting With Torch
“In this post, we build a network that uses a sequence of observations to predict a value for the very next point in time. What if we’d like to forecast a sequence of values, corresponding to, say, a week or a month of measurements?”-RStudio AI Blog
Data Science is Different
Told from the perspective of a naive junior data scientist, this story is way, way too real.-Kenny Ning
Getting More Comfortable with Git and GitHub in RStudio
A super useful tutorial (with video!) for those new to using Github with R.-Lisa Lendway
Too Big a Word
“The ethics of technology looks very different outside the tech industry than it does on the inside, and not simply because of conflicting principles or values between techies and their critics.”-Data & Society: Points
Just Where Is the Minimal Stats Bar for Data Science?
We’re not talking about production model-building roles here. If you’re looking for an entry-level position working on product development problems (which are pretty common!), you might not need to take that advanced stats class.-Counting Stuff
What the F*ck Python!
This fun project attempts to explain what exactly is happening under the hood with some counter-intuitive snippets and lesser-known features in Python.-Satwik Kansal
Sports Analytics 101: Descriptive vs. Predictive
Future performance doesn’t always resemble past performance, and that’s why we need both descriptive and prescriptive metrics.-Brendan Kent
Causal Design Patterns for Data Analysts
Everyone needs to be able to understand the difference between causality and correlation. But because the methods for doing so are scattered across disciplines like epidemiology and economics, there’s a high barrier to entry for those outside such fields. This post aims to break down those barriers.-Emily Riederer
How to Scope Down PRs
Smaller pull request are easier to test, easier to iterate, and easier to review.-Netlify
a gRadual intRoduction to Shiny
This two-hour workshop will get you up and running with the basics of Shiny. You’ll need an intermediate understanding of R.-Ted Laderas & Jessica Minnier
Foundations of Statistics with R
This free ebook uses a simulations-based approach. You’ll need to brush up on your Calculus II before you start.-Darrin Speegle & Bryan Clair
Data Science Portfolios 101
Some tips and tricks for putting together a data science portfolio to help you get hired.-R-Ladies Dallas
How to Be a Good Storyteller
One way to increase the likelihood that your analysis will positively change business actions and decisions is by improving how you communicate about your data insights.-Anson Whitmer
Where to Go Next to Level Up Your R Skills
This thread has a lot of great suggestions for R tutorials and courses to build your skillset.-Jenny Richmond
Giving More Tools to Software Engineers: the Reorganization of the Factory
There’s a lot in here that applies to data scientist productivity too.-Erik Bernhardsson
rjs: R in JavaScript
A package for inserting R code directly into websites!-Karandeep Singh
The Four Jobs of the Data Scientist
A good data scientist is required to be four different people: a scientist, statistician, systems engineer, and politician.-Simply Statistics
Save Your Hands and Save Your Time: Rethinking How to Use a Keyboard
This video tutorial gets high praise from the Lead Data Scientist at OnlineMedEd (https://twitter.com/beeonaposy/status/1329560582765342720): “If you want to learn more about setting up powerful keyboard shortcuts, this is absolutely worth 19 minutes of your time. Chock-full of practical examples.”-egghead.io
Communicating Data is About Handling Egos and Emotions
Having a bit of emotional intelligence and being prepared for scenarios where people may be openly rude and hostile goes a long way toward your analysis landing.-Evergreen Data
The Best Parts of Data Science Isn't Even the Tech
If you’re feeling like a “data lackey” that never gets to work on “sexy technical problems,” remember this: true impact is measured by the mark it leaves on the world, not the tool used to make it.-Counting Stuff
A Review of Spatial Causal Inference Methods for Environmental and Epidemiological Applications
“Spatial casual inference poses analytic challenges due to complex correlation structures and interference between the treatment at one location and the outcomes at others.”-DeepAI
I Got 7 Job Offers During the Worst Job Market in History. Here’s the Data.
This data scientist hand-tracked all his interview and application data and came away with some interesting insights you might apply to your next job search.-Jeff Li
Imposter Syndrome in Data Science
Why is imposter syndrome is so prevalent in data science? How can you deal with it personally and encourage others who also feel like imposters?-Caitlin Hudon
The 4 Hard Truths Data Science Blogs Don't Teach You About
Skill only gets you so far.-Franccesco Orozco
How Eugenics Shaped Statistics
“The various upheavals happening in statistics today—methodological and symbolic—should properly be understood as parts of a larger story, a reinvention of the discipline and a reckoning with its origins. The buildings and lectures are the monuments to eugenics we can see. The less visible ones are embedded in the language, logic, and philosophy of statistics itself.”-Nautilus
linne: Write CSS in R
Are you an R user wary of CSS? This package is a great way to get your feet wet.-John Coene
Teaching Python to Beginners
When you can teach a skill to others, you’ve mastered it. But how do you teach what you’ve forgotten?-Data for Breakfast
Why Have a Data Science Portfolio and What It Shows
Most articles on data science portfolios focus on how to build one and get a job. This post focuses on the “what” and the “why.”-Eugene Yan
Bayes Rules! An Introduction to Bayesian Modeling with R
The first five chapters of this book are available for free. Says Max Kuhn of RStudio (https://twitter.com/topepos/status/1315953559071084545): “This is one of the most clearly written books on Bayesian analysis that I’ve seen. Lots of practical advice and intuition.”-Bayes Rules!
S, R, and Data Science
R’s data analysis roots run deep. Did you know R was written to replicate a software called S, written by researchers data analysis researchers at Bell Labs?-The R Journal
Happy Git and GitHub for the useR
How to integrate Git and GitHub into your daily work with R and R Markdown.-Jenny Bryan
A Tale of Query Optimization
A query took 24 minutes to run. This post details the steps taken to optimize it to run in two seconds.-Plumbers of Data Science
Understanding Entanglement With SVD
“Entanglement” is full of meaning in physics, but the linear algebra behind it is quite simple. Ever used singular value decomposition? Then you're nearly there!-Math3ma
Data Organization in Spreadsheets
Get some practical tips for organizing spreadsheet data to reduce errors and make later analyses easier.-The American Statistician
Array Programming with NumPy
NumPy is one of the oldest Python packages around, and it still plays a central and leading role today in scientific computing across a ton of fields.-Nature
Debunking Narrative Fallacies with Empirically-Justified Explanations
Stitch Fix’s Chief Algorithms Officer Emeritus says it best (https://twitter.com/ericcolson/status/1306384818314178564?s=20): “When observing trends in metrics, beware of the narrative fallacy. We are wired to find explanation even when there is none. We make up stories to match the data.”-MultiThreaded
Pycon Africa Talks
Expand your Python knowledge! There are nearly 50 talks in this Youtube playlist to choose from.-Pycon Africa
What Can Data Scientists Learn From DevOps?
This article may be eight years old, but it still holds good advice for data scientists today: think about ways to make your analytical work easy to replicate, build upon, and scale.-RedMonk
D3 to R to D3
This is a great tutorial for R users looking to use D3. It breaks down the key differences between creating a plot in both languages.-Maya Gans
Big Book of R
This collection of over 100 R books might be the last bookmark you’ll ever need!-Oscar Baruffa
Preventing SQL Injection Attacks in Postgres
Last month’s “Meow” attack wiped out thousands of unsecured databases. These hackers used SQL injection, a technique for gaining access to business data and personal information and changing or deleting database content.-Crunchy Data
Python Typosquatting for Fun not Profit
Because packages in popular languages like Python are used as software dependencies, they can be susceptible to supply chain attacks. This analysis examines which packages are most vulnerable to typosquatting—an attack that relies on typo mistakes—to identify which packages have been compromised already and how to prevent these attacks.-William Bengtson
“Playing the Whole Game”: A Data Collection and Analysis Exercise With Google Calendar
There are many teaching resources focused on data analysis, but not data collection. This exercise tackles both and is suitable for an early introduction in an undergraduate statistics or data science course.-Taylor & Francis Online
The Top 5 Most Popular Window Functions and How to Use Them
We were curious to understand which window functions are most commonly used, so we built a window (pun intended) into our customer’s usage of these types of calculations.-Mode Blog
7 Questions You Should Ask Yourself Before Starting Any Data Science Project
Learning the technical skills is only part of becoming a data scientist. You need to think like data scientist, which means always questioning… basically everything.-Towards Data Science
Adventures in R
All you need is a basic knowledge of stats and programming to jump into this online, 8-week, college-level course.-Adventures in R
[Whitepaper] How to do Linear Regression in SQL
One of our Senior Data Scientists shares a guide for getting mileage out of doing regression analyses directly in SQL. If your measurements are clean and meaningful, if your A/B tests are well designed and powerful, you might rarely need anything more than the “basics.”-Mode
Decision-Making in a Time of Crisis
A bad outcome doesn’t mean a bad decision. This essay provides conceptual and cognitive tools for the process of making decisions under uncertainty.-O’Reilly Radar
Use Common Table Expressions (CTE) to Keep Your SQL Clean [with example]
Using common table expressions makes it easier for your teammates to debug and collaborate with you.-Mode Blog
Getting in to a Causal Flow
Most, if not all, business analytics questions, are inquiries of cause and effect. Which is why you should dive into this excellent explainer series about causal inference.-Causal Flows
Data as Protest: Data for Black Lives with Yeshi Milner
How can we claim agency over data systems to fight for racial justice? Learn more about the movement of scientists and activists who are trying to make data a tool for social change instead of a weapon of political oppression.-The Radical AI Podcast
Exploring Missing Values in naniar
“Exploring and thinking critically about missing data is an important and often overlooked part of exploratory data analysis that can help us to understand what data are missing and why, so that we choose an appropriate method for handling them.”-Allison Horst
What Can We Learn From a Country's Diplomatic Gifts?
It’s a two-for-one! Learn how to do text analysis in R and who gives the best gifts.-Alex Cookson
Data Scientists in Academia
As he transitions from a tenure track to an industry role, this data scientist shares some parting thoughts about the situation data scientists face in academia right now.-Travis Gerke
Prediction is Hard
What exactly does “cubic fit” mean?-Stats Chat
Sentiment Analysis with tidymodels and TidyTuesday Animal Crossing Reviews
If you can pull yourself away from your Animal Crossing island, this fun and straightforward tutorial is a great way to try out some text analysis techniques in R.-Julia Silge
Boosting A/B Test Power With Panel Data Models
Depending on the characteristics of your data, some basic tools from econometrics can improve your A/B tests.-Kyle Carlson
10 Tips for Making Sense of COVID-19 Models for Decision-Making
Since our lives are livelihoods are impact by the results of COVID-19 models, now’s a good time to learn what makes a model, and what they can and can’t do.-Johns Hopkins
Using Python to Cheat at Scrabble
What do you do when you get sick of losing Scrabble to your mother? Turn to Python, of course!-Ari Lamstein
Things I Wished More Developers Knew About Databases
“Even though it is impossible to ignore how databases work, the problems that application developers foresee and experience will often be just the tip of the iceberg.”-Jaana B. Dogan
When Plotting Epidemic Curves or Death Totals, Should We Divide by Population Size?
Should you use per-capita or absolute measures when plotting epidemic curves? This Twitter thread proposes that you can do either—so long as you do so properly.-Carl T. Bergstrom
Let's Talk Rice Measurements
No metric is perfect, not even something we take for granted like the meter, or a ‘cup’ of rice.-Counting Stuff
The Landscape of R Packages for Automated Exploratory Data Analysis
What R packages help automate analysis in large and noisy datasets? And what are the areas of opportunity for improving automated data exploration?-GroundAI
You Can’t Avoid Problems You Can’t See
This is good illustration of the problems that come up when you don’t truly know the ins and outs of the data itself… but you think you do.-BI Polar
Understanding Data and Statistics in the Medical Literature
If you’ve been going deep on COVID-19 research, this four-hour class will be well worth your time.-Leanpub
Keeping Data Inclusivity Without Diluting your Results
How can we be inclusive without making minority categories so small that only the majority data has statistical relevance?-We All Count
Exponential Growth and Epidemics
Time for a math refresher. Exponential growth is a common phrase in our society but it's shocking how bad our intuitions can be at recognizing what it actually means.-3Blue1Brown
Data Science Learning Resources
Another day, another nicely curated list of books and articles you should read.-Bradley Boehmke
Ten Research Challenge Areas in Data Science
These are some good discussion starters for considering what a broad research agenda for data science might look like.-Columbia University
Twitter for R Programmers
Twitter is a real-time pulse of the R community. You can learn a lot about the R language, about new approaches to problems, make friends, and even land your next job.-Twitter for R Programmers
How to Collect User Data About Gender Identity — and When Not to
Gender data can be valuable for a variety of reasons. But before you ask, consider why you’re asking, and how you frame the question.-Built In
Comparing Apples, Oranges, and Bananas
How do you know if your recommender model is doing a good job? Check out these tips for defining metrics and evaluation criteria.-Ssense Tech
The Challenge of Identifying UX Success Metrics
Start by identifying the outcome, and work from there.-Jared Spool
Data Science is Different Now
“For the past couple years, I've been telling people who ask me for advice not to go into data science. Here's why: The data science job market is way oversaturated. Here's what they should do instead.”-Vicki Boykis
The Missing Semester of Your CS Education
This course digs into a rarely-covered (but essential!) topic: using tools like the command-line, text editors, version control systems, and more.-MIT Computer Science & Artificial Intelligence Lab
The Big List of Data Science Interview Resources
Interviews are hard, but the silver lining is: they serve as a forcing function for learning.-Conor Dewey
The Rise and Fall of the OLAP Cube
If you’ve had a data analytics career over the last 30 years, you might be skeptical of the shift to columnar data warehouses. Is it just a fad, or is it here to stay?-Holistics Blog
How I Became a Data Analyst by Optimizing the Right Place and Time
Here’s what you can do to make the most of opportunities that come your way.-Towards Data Science
Falsehoods Programmers Believe About Names
“I have never seen a computer system which handles names properly and doubt one exists, anywhere.”-Kalzumeus Software
Are Your Coding Skills Good Enough for a Data Science Job?
Here are five things to check to make sure your code is... up to code.-Towards Data Science
A Cool SQL Problem: Avoiding For-Loops
This is a perfect screening or whiteboarding question for many quant finance jobs, and it’s still a great problem for many roles that have nothing to do with finance.-r x y, r
R Cookbook
A great, free resource (for new R users especially) that's full of several how-to recipes, each solving a specific problem.-R Cookbook
Getting Help In R: Do As I Say, Not As I've Done
The next time you get stuck in R, run through this list of tips. You’ll become a more self-sufficient R user because of it.-Sam Tyner
Using Docker to Deploy an R plumber API
As a data scientist, you sometimes want to have code running in places that are not your computer. The good news: setting up a virtual machine isn’t as scary as you think!-T-Mobile Product and Technology
The User-Agent — That Crazy String Underpinning a Bunch of Analytics
How did this one piece of data become so vital to pretty much all web analytics? How is it used? And what caveats does it come with?-Randy Au
Emails from R: Blastula 0.3
This new R package makes it easy for you to send coworkers emails showcasing beautiful plots (and emoji subject lines!).-R Studio Blog
The Mind at Work: Guido Van Rossum on How Python Makes Thinking in Code Easier
A wonderful read for any Pythonista.-Work in Progress
Calculating New and Returning Customers in R
This step-by-step tutorial sets you up with a simple and clean calculation for new and returning customers.-Towards Data Science
A New palette() for R
R got a glow up! Here’s how to take the new color palette for a spin.-R Developer Blog
How to Read and Write Data Files in Python
Bookmark this and save yourself a Google search.-End-to-End Machine Learning Library
The Problem with “Biased Data”
If you asked one hundred people what “biased data” means to them, you might just get back one hundred different answers. To make progress, we need an agreed-upon language and framework for understanding bias in machine learning.-Harini Suresh
Data Science Foundations: Know Your Data. Really, Really, Know It
Really knowing your data means more than just understanding the data layout or organization. You need to go all the way down to get a look at how the data is collected and generated, too.-Towards Data Science
Data Science Archetypes
Are you a Generalist? A Detective? An Oracle? A Maker?-End-to-End Machine Learning Library
Character Encodings — The Pain That Won’t Go Away
Go deep with this series on how character encoding quirks can thwart your analyses.-Better Programming
We’ll Do It Live: Updating Machine Learning Models on Flask/uWSGI with No Downtime
It’s trickier than you may think. This tutorial with code examples will walk you through the nitty gritty.-WW Tech Blog
Questions to Ask About Your Data
Print this comic out and keep it handy for anytime you’re doing exploratory analysis.-Julia Evans
What Data Patterns Can Lie Behind a Correlation Coefficient?
To interpret a correlation coefficient, you're gonna need the corresponding scatterplot.-Jan Vanhove
List of Time Series Databases
If you’re building a product to support many large-scale time-series users, you’ll need to shop around for the right database. This list will get you started, with open-source and proprietary options.-Misframe
Exploring Your Data With Just 1 Line of Python
Short, sweet, and to the point.-Towards Data Science
almanac
This new package allows R users to do things like construct a business calendar, which they can then use to shift dates forward and skip over weekends and holidays.-Davis Vaughan
How Much Have You Spent on Amazon? Analyzing Amazon Data
The Director of Data Science at HelioCampus gives this tutorial gets a hearty recommendation: “This is the kind of project I mean when I talk about project-driven learning—it has relevance to you, so there's motivation to find out the answers beyond just learning a technical skill. And plenty of variations of Qs to ask.”-Dataquest
How to Spot Red Flags in a Data Science Job Opportunity
What signs should you look for to detect work-life balance problems, a lack of data science understanding, or a wimpy manager?-Towards Data Science
Data Integrity in Survey Collection
This thread recounts how one research study was infiltrated by bots (it happens more often than you might think!) and suggests tips for ensuring better data quality in online surveys.-Melissa Simone
janitor
janitor has simple functions for Examine and clean dirty data faster and save your thinking for the fun stuff. Built with beginning and intermediate R users in mind!-Sam Firke
loadtest: an R Package for Load Testing
“As APIs become more accessible to the data science community, so should engineering best practices around those APIs. However, most load testing tools are crafted for engineers or testing specialists–so we fixed that.”-T-Mobile Tech
How Much Do Data Scientists Make?
Who knew Walmart and Ancestry.com paid so well?-Towards Data Science
Where to Learn Statistics
Pick a handful of these resources to try out, and get started!-End-to-End Machine Learning
Best Practices for Analyzing Large-scale Health Data From Wearables and Smartphone Apps
If you’re working with health data, you need to be mindful of the privacy, selection bias, and policy implications.-Nature
Teacups, Giraffes, and Statistics
Whether you’re starting from scratch or want to deepen your familiarity with statistics, this tutorial is worth checking out for the playful approach and delightful illustrations.-Teacups, Giraffes, and Statistics
Mastering Shiny
If you’ve been wanting to try out Shiny, a framework for creating web applications using R code, here’s your chance! From academia to big pharma to Silicon Valley, Shiny is now used in almost as many niches and industries as R itself.-Hadley Wickham
Reproducible Data Workflows With Drake
Learn how to use drake, an R package that provides a powerful, flexible workflow management tool for reproducible data analysis pipelines.-Garrick Aden-Buie
Pandas Tricks
New tips every weekday morning that will help you to work faster, write better code, and impress your friends!-Kevin Markham
Introducing the Funneljoin Package
Do you work with data consisting of events with their time and associated user? Often find yourself asking “first this then that” questions? You probably have a problem funneljoin can help with.-Hooked on Data
Tidylo: Tidy log odds ratio weighted by uninformative prior
Use this R package in your everyday workflow when you want to compare how the frequency of some feature differs across some set or group.-Julia Silge
Practical Psychology for Data Scientists
How to recognize and sidestep eight common cognitive biases.-Towards Data Science
Data Helpers
Check out this list of data professionals who have volunteered to answer questions, promote, or mentor newcomers in data science, engineering, and analysis.-Angela Bassa
Why You Swipe Right
Using two months of swipes from his Tinder profile, one man evaluated his dating preferences. You might feel a bit voyeuristic, but this full series is worth a read for its examination of race and online dating.-Ajay Sharma
Crushed It! Landing a Data Science Job
How to crack your first (or fourth) round of data science interviews.-Erin Shellman
Python is Weird (an Unabashedly Biased Intro to Python for R Users)
Most Pandas tutorials start with a solid assumption that you know Python and you’re completely devoted to the religious tenant of being Pythonic. This one doesn’t, and it’ll help you wrap your head around this new way of thinking.-Eric R. Scott
Instagram Data Analysis Using Panoply and Mode
A thorough walkthrough, from connecting a database through the final visualizations.-Towards Data Science
Type Stable Estimation
This paper argues that code objects in statistical software should match up with the actual mathematical objects involved in formal data modeling.-Aleatoric
Build Your Career in Data Science
While there are lots of good blog posts on individual topics, there really isn't one place people can go to get a better understanding of a data science career. Until this book!-Manning Publications
Matrices as Tensor Network Diagrams
This framework is great way to wrap your head around matrices (and it makes proofs cleaner and simpler!).-Math3ma
Lyft Data Scientist Shares Five Pieces of Career Advice
A nice quick read that covers topics like starting a new role, stakeholder management, and building a consultancy.-Towards Data Science
Find Your Slow with profvis
profvis identifies problem patches in your R code that are slowing everything down, shaving satisfying seconds off the running time of your project.-Megan Stodel
Javascript Statistics Snippets
Ever wanted to do something simple—like generate random values from a distribution—without importing a whole new library? This repo's got your back.-Nick Strayer
gpt-2-simple
This Python package allows you to easily retrain OpenAI's GPT-2 text-generating model on new texts, like Buzzfeed article titles.-Max Woolf
Follow-up: I Found Two Identical Packs of Skittles, Among 468 Packs With a Total of 27,740 Skittles
Analyzing packs of Skittles (or sometimes M&Ms) seems to be a very common exercise in introductory statistics. But what's the likelihood of identifying two identical packs of Skittles? And how does that likelihood stack up against reality?-Possibly Wrong
How to Filter in R: A Detailed Introduction to the dplyr Filter Function
There are many ways to filter in R. Consider dplyr filter for its user-friendly syntax, how easy it is to work with, and how nicely it plays with the other dplyr functions.-Michael Toth
Why software projects take longer than you think – a statistical model
“How much time do you need?” has a whole new meaning now.-Erik Bernhardsson
A Simple Approach To Templated SQL Queries In Python
As Instacart's former VP of Data Science put it: “Almost every data product I’ve built has had parameterized SQL in it, and this is a great guide for how to do it well!”-Towards Data Science
Excel Error, but Could Happen in Any Tool
JD Long sums up this post perfectly: it “walks through an analysis where the results depend on how null values are handled. Good reminder to 1) understand if data has nulls 2) be thoughtful about handling nulls 3) compare groups with/without nulls.”-Junk Charts
Escaping Excel Hell with Python and Pandas
Tune into this fun discussion about introducing the Excel users in your life to Python.-TalkPython
10 things R can do that might surprise you
R is building on its solid data analysis foundations and is rapidly becoming an all-purpose connective language for data science.-Simply Statistics
Advice for New Data Scientists
While this post is intended primarily for data scientists embedded in product teams, many of the tips can be generalized to any new hire in a tech role.-Airbnb Engineering & Data Science
Using Deep Learning to “Read Your Thoughts” — With Keras and EEG
Saying a word in one’s mind, even if not spoken aloud, can result in the firing of the nerves controlling the muscles involved in speech. With some readily available equipment, you can train a model to classify these sub-vocalized words in less than a day.-Justin Alvey
Tidy Tuesday Screencast: Tidying and Analyzing US PhDs in R
If you spend a lot of time importing Excel spreadsheets, don't miss this episode: it focuses on the process of importing, cleaning, and tidying messy data.-David Robinson
Journey to Data Science
Need a dose of inspiration? This thread of folks who recently became data scientists will give you the warm fuzzies.-Twitter
SQL: One of the Most Valuable Skills
SQL is permanent. SQL is flexible. SQL can be your super power.-Craig Kerstiens
The Ultimate List of Data Science Podcasts
When they say “ultimate,” they really mean it. Fire one of these up the next time you're exercising at the gym, commuting to work, or doing chores.-Real Python
Minimally Sufficient Pandas
Limiting Pandas to a small subset can keep your focus on the actual data analysis and not on the syntax. This detailed guide offers a single approach to completing a variety of common data analysis tasks.-Dunder Data
Learning From Eight Years of Data Science Mistakes
This talk covers mistakes made during analyses (including communication when delivering results) team and infrastructure mistakes, plus some advice for incoming data scientists.-rstudio::conf 2019
Data science curriculum roadmap
This set of topic recommendations is a good starting point for data-centric academic programs looking to revise their curriculum or start a completely new one.-Brandon Rohrer
Statistics: P values are just the tip of the iceberg
Ridding science of shoddy statistics will require scrutiny of every step, not merely the last one.-Nature
Rstudio::conf 2019: lessons learned
Couldn't make Rstudio::conf? Get caught up on what you missed with this summary of five major themes from the talks there.-Brooke Watson
Solving the Model Representation Problem With broom
An introduction to the broom package, which aims to create a framework for representing statistical models, estimation methods, and fits with R objects.-Alex Hayes
Going Off the Map: Exploring purrr’s Other Functions
Learn how to use some of purrr’s lesser known functions to write cleaner and more concise code.-Hooked on Data
Preparing for a Tech Talk, Part 1: Motivation
This series covers the process preparing for a tech talk—from conceiving the idea to the actual day of the presentation. Up first: why and how to pick a topic.-Overreacted
How Do I? …
The ultimate reference material for R folks: a searchable table of 190+ R-stats tasks with code snippets.-Sharon Machlis
Selection Effects
Why do you often feel like you’re in the slower of two lanes during rush hour? Or why you feel like the bus is taking forever to get here? These scenarios (and many others!) can be explained by selection bias.-Carl T. Bergstrom
The ‘knight on an infinite chessboard’ puzzle: efficient simulation in R
A nice walkthrough of solving a chess conundrum, with an eye on keeping the simulation fast and interpretable.-Variance Explained
How to Develop the Five Soft Skills That Will Make You a Great Analyst
Soft skills tend to be more difficult to learn than hard skills, which is exactly why we all need to work on them. Here's a framework for assessing yourself and improving those skills.-Mode
Real-time Process for Completing a Task in R
“I'm sitting down to start a task in R. I don't entirely know how to complete it. I'm going to try to document my process in this thread in real time.” This thread is really enlightening (and a relief for anyone who feels like half their job is Googling).-We are R-Ladies
Level up from `cron` to Airflow with R on Your Macbook
You can use Airflow in the same way you might use cron to schedule and execute jobs. Here's how to get it up and running.-Cerebral Mastication
The Lesser Known Stars of the Tidyverse
A walkthrough of how to use some more obscure R packages and functions in exploratory analysis.-Hooked on Data
Battling the Bots
An interesting profile of a musicologist who turned his background identifying fraud in music composition into a job examining how online propaganda campaigns work.-Foreign Policy
Tidy Tuesday
Through this weekly data project, members of the RStats community get a new dataset on which to practice their wrangling and data visualization skills. Catch up on what's been done so far (https://twitter.com/hashtag/tidytuesday?src=hash) or get ready for tomorrow's dataset.-R for Data Science
Tidyeval
Trying to wrap your head around non-standard evaluation? Check out this tutorial for tidy evaluation in R.-Ian Lyttle
Some Important Data Science Tools that aren’t Python, R, SQL or Math
Data scientists don’t exist in a vacuum. Here are some of the tools you'll need to be capable of building production-ready applications.-Towards Data Science
How Becoming Not a Data Scientist Made Me a Better Data Scientist
Working as a software engineer helped Joel Grus understand how to write better code... as a data scientist.-Joel Grus
The hacker's guide to uncertainty estimates
Estimating uncertainty is easier said than done. This post covers a whole arsenal of tricks, including confidence intervals, Monte Carlo methods, and inverse Hessians.-Erik Bernhardsson
Scipy Lecture Notes
Sebastian Raschka surfaced this well-maintained hub of knowledge with a ringing endorsement: “I think this is really an under-appreciated resource. Probably the most comprehensive guide out there, it's free, and constantly updated!”-Scipy Lecture Notes
Chromebook Data Science
These free MOOCs exist so anyone with the ability to read, write, and do basic math can get into data science using nothing but a web browser and an internet connection.-Simply Statistics
JOINs in SQL, Python, and R
Though SQL has long been the industry standard for accessing relational data, nowadays, it’s more and more common to do this same work in a scripting language like Python or R. Here's how.-Mode
Strata speaker slides & videos
Catch up on all the presentations from the Strata Data Conference in NYC last week.-O'Reilly Conferences
What Data Scientists Really Do, According to 35 Data Scientists
Here are the common themes that have emerged that have emerged from speaking with data scientists both in and outside tech.-Harvard Business Review
4 Things You Should Stop Doing in SQL and Start Doing in Python
Each language has its strengths and we’ve often pondered the distinctions, but there are some actions in SQL that are simply more efficient in Python.-Mode
What You Need to Know Before Considering a PhD
This post suggests pondering the practical experience offered by industry jobs and the disproportionately high rate of depression amongst graduate students before you apply for a PhD program in machine learning or data science.-fast.ai
Who wrote the anti-Trump New York Times op-ed? Using tidytext to find document similarity
The New York Times anonymous op-ed has spurred a bevy of data scientists to try to uncover the author using natural language processing. Here’s one attempt that serves as a nice R tutorial to boot.-Variance Explained
Get More From Your Salesforce Data: 4 SQL Queries to Write First
This post walks through a sample report replicating common Salesforce CRM reporting in SQL, so you can more easily audit, adjust, and extend that analysis.-Mode
Guidelines For A/B Testing
There are many ways A/B Testing can go wrong, but most of them won’t be obvious. Here are 12 guidelines that will help you guard against some common mistakes and set you up for success.-Hooked on Data
The Podcast of Small Differences
The first episode of this new “data science flavored” podcast covers what the two hosts—of physics and economics backgrounds—wished they knew on day one of their first data science jobs.-Otis Anderson & Ian Blumenfeld
Knowing Your Blindspot as an Analyst
“I thought that being the one with access to data made me the arbiter of truth, and that I was right by default when talking with someone who wasn’t using quantitative information to back up their ideas. I was wrong.”-Mode
Partitioning the Variation in Data
“Why do things vary?” is one of the fundamental questions you can ask during any exploratory data analysis. Here's how to gauge if the variation you're witnessing is fixed or random.-Simply Statistics
Analyzing IMDb Data The Intended Way, with R and ggplot2
IMDb has made their official dataset more accessible to analyze just for fun. Check out the data with this step-by-step tutorial, chock-full of code examples.-Max Woolf
Speed up your R Work
A tutorial on how to speed up work in R by partitioning data and process-level parallelization with rqdatatable, data.table, and dplyr.-Win-Vector Blog
Red Flags In Data Science Interviews
Companies will never straight up tell you they are bad to work for. Here are 12 warning signs to look out for when you’re interviewing at a company with multiple data scientists or analysts.-Hooked on Data
Add Constrained Optimization To Your Toolbelt
Stitch Fix shows how they use constrained optimization to get work to stylists and warehouses in a manner that’s fair and efficient, without cutting corners on client experience. For those fluent in Python: you should be able to model your own business problem by the end of this post.-Multithreaded
A year as a Data Scientist right after college: An honest review
A fresh-faced data scientist shares his experience in the workforce—what lived up to his expectations, and what didn’t.-Towards Data Science
Advice For Applying To Data Science Jobs
This the most thorough, well-organized post on the data science job application process we’ve ever seen. Bookmark it immediately.-Hooked on Data
Trustworthy Data Analysis
The manner in which you present the results of an analysis is part of the analysis and plays a large role in determining whether people trust your work or not.-Simply Statistics
Rethinking Academic Data Sharing
“There is a reasonable debate going on regarding whether companies should be able to share [personal] data and for what purposes. Academics have to realize that they are also part of this debate and that any decisions made in that domain will likely affect them.”-Simply Statistics
UTC is Enough for Everyone, Right?
Although this post is aimed at programmers, there's a lot in here for analysts as well, especially in regards to how to properly store time in databases.-Zach Holman
Seven Strategies for Optimizing Numerical Code
Some advice on how to use reporting as a means to create strong stakeholder relationships in your organization.-Locally Optimistic
7 R Data Science Influencers to Follow
Whether you’re new to the R community, or you’re already an active package-creator or analyst, listening in on R Twitter conversations is great way to stay up to speed.-Mode
Exploring The Structure and Dependencies of An R Package
pkgnet is an R library designed for the analysis of… R libraries! With a graph representation of a package and its dependencies, you can prioritize functions to unit test and examine recursive dependencies you take on by using a given package.-UptakeOpenSource
A Shiny App to Visualize and Share My Dogs’ Medical History
What’s a digital nomad and R-user to do when she needs to share her dogs’ medical records with multiple vets? Build a Shiny app, of course! Here’s how.-Jenna Allen
One Analyst’s Guide for going from Good to Great
Junior analysts, hearken! This guide is perfect for breaking through your skill plateau.-Fishtown Analytics
Lumpers and Splitters: Tensions in Taxonomies
“As data scientists tasked with segmenting clients and products, we find ourselves in the same boat with species taxonomists, straddling the line between lumping individuals into broad groups and splitting into small segments.”-MultiThreaded
5 Data Scientists on Making the Leap from Academia to Industry
We asked five leaders who transitioned from research backgrounds to data science jobs to share their thoughts on the process, from how they landed their first job to things they wished they'd known.-Mode
Stats 337: Readings in Applied Data Science
An excellent reading list!-Hadley Wickham
Set Operations in SQL and Python: a Comparison
Set operations take center stage in the latest installment of our Bridge the Gap series. Learn how to compare and combine data sets in both SQL and Python, so you can choose the best tool for the job.-Mode
Data-driven unit testing for data scientists and quant developers alike
The key to good unit testing is paying attention to the data in your tests and focus on testing the most important parts of your model or system. These guidelines will help you streamline your unit testing and avoid ambiguous results.-Cartesian Faith
SQL for Data Analyst
By the end of this free and beginner-friendly course, you’ll be able to write efficient SQL queries to successfully handle a variety of data analysis tasks.-Udacity
How to rewrite your SQL queries in Pandas, and more
A phrasebook that you'll come back to time and time again. Bookmark it!-codeburst.io
Conversations with Future Data Scientists
Ryan Swanstrom put together a YouTube playlist of his answers to questions from aspiring data scientists like “How do I transition to data science?” or “Why should I start a data science project?” Most of these videos are under the 2-minute mark.-Data Science 101
Semantics of timezone-aware datetime arithmetic
One reason why you can't "just use UTC" all the time is that you often need "wall time" semantics—the relationship between two times as displayed by the clock on the wall, regardless of the absolute elapsed duration between them. Here's how to deal with that in Python.-Paul Ganssle
Resources for Data Science Job Seekers
Here’s what you need to help you nail the job hunt and land a role you’ll love.-Mode
Bridge the Gap: Window Functions in Python and SQL
When we understand how Python and SQL overlap, we can make smarter decisions about which to use and when. Our new Bridge the Gap series explores just that, starting with a tool that most of us use everyday.-Mode
My Journey Into Data Science and Bio-Informatics — Part 1: Programming
One year ago, the author of this post had never executed a single line of code. Today, he works on a team trying to understand the underlying genetic alterations of neuroblastoma, a devastating tumor that affects young children. These are the resources and courses he used to get there.-O’Reilly Media
Data Science at the Command Line
Clear your weekend. O’Reilly has put this hands-on-guide online, for free!-O’Reilly Media
Introducing DataFramed, a Data Science Podcast
Here’s something new for your ears. This podcast promises to explore what modern data science looks like in practice via in-depth conversations with practitioners.-DataCamp
Myths and mistakes of PyCon proposals
Here are some tips for getting your proposal accepted from a bonafide member of the PyCon Program Committee.-Irina Truong
Why old-school PostgreSQL is so hip again
How did a 21-year-old piece of technology become the world’s fourth most popular database?-InfoWorld
Don’t Ignore Bears: The Pitfalls of Summarizing Data with Medians
Some folks are big fans of the median as a summary statistic. But it has some big downsides—as all statistics do.-Towards Data Science
It Came from the Data Lake
Do you really need a data lake for that project? Or can you replace Hadoop with your laptop? Check out this presentation to learn how to use Python to process larger data sets (5-10 GB) on your local machine.-Vicki Boykis
An Interactive Tutorial on Numerical Optimization
People often implement numerical optimization algorithms in machine learning projects without much thought as to how they work. This post aims to change that with interactive visual representations of each algorithm.-Ben Frederickson
Changepoint Analysis of Time Series Data
Learn how you can use the changepoint R package to identify when a video switches from one scene to the next.-Uru
Causal Inference With pandas.DataFrames
There's now a causality package in Python to make causal inference more accessible so analysts and data scientists can incorporate it into their day-to-day. The intro is worth reading, even if you're not a Python user.-Adam Kelleher
How do you convince other people to use R?
Tired of being the lone R user in your organization? Try out these arguments on your colleagues.-Simply Statistics
Python's strftime directives
This reference for changing date/time formats in Python is so handy that one Twitter user (https://twitter.com/kscottz/status/922627756914962433) said of its creator: “This person has saved the world a thousand years of human effort. This person deserves a beer.”-strftime.org
Landing a Data Science Gig in New York City
Trying to break into the NYC data science job market? Sans a PhD? This guide was tailor-made for you.-Ground Truth
R for Journalists
This site is a great launch pad for anyone who's new to R, journalist or not. Each post provides step-by-step instructions and code for making a visualization with data about a current event.-R for Journalists
R Studio Community
R Studio recently opened up a forum. It's a great place to hang out with other R users, talk with R package developers during open office hours, or ask newbie questions if you're intimidated by Stack Overflow.-R Studio
Fast GeoSpatial Analysis in Python
If you get frustrated by the sluggishness of Python's GeoSpatial stack, check out this experiment. Combining Cython, Dask, and GeoPandas sped up the mapping of 120 million geospatial data points by 30x.-Matthew Rocklin
Becoming a 10x Data Scientist
Whether or not you believe 10x developers exist, data scientists can learn a ton from seasoned developers who are considered incredibly prolific and proficient.-Algorithmia
Practical Data Science for Stats
Many aspects of day-to-day analytics work are missing from the conventional statistics literature and curriculum. This bookmark-worthy collection aims to solve that problem, with tons of preprints on modern analytical workflows.-PeerJ
Python Cheat Sheet for Data Science: Intermediate
A handy reference for Pythonistas who have been around the block a few times.-Dataquest
Buggy Python Code: The 10 Most Common Mistakes That Python Developers Make
This list ain’t for rookies. Here are some of the subtle, harder-to-catch errors that have even advanced Python users tearing their hair out.-Toptal
Giving Your First Data Science Talk
Here’s why you should consider giving a talk and how to prep. Our favorite insight: your audience is the you from six months (or one year or five years) ago. -Hooked on Data
Using optaplanner to plan water supplies
Does your job involve a lot of resource planning? Learn how to use OptaPlanner—an open-source constraint satisfaction solver—in the most Silicon Valley way possible: planning out water logistics for Burning Man.-Richard Weiss
Craft Your Python Like Poetry
The Python style guide PEP 8 specifies line length at 79 characters, but that doesn't mean you should wrap lines when they hit an arbitrary length. If you need to sharpen your poetic sensibilities, these code examples will teach you how to write readable, beautiful Python.-Trey Hunner
I have data. I need insights. Where do I start?
What to do when your boss dumps a bunch of data in your lap and says “tell me something interesting.”-Towards Data Science
You Say Data, I Say System
Every spreadsheet or database view or visualization is the result of an entire system of decisions: how to collect, compute, and represent the data. This article provides an excellent framework for being mindful of the choices that shape the end product you see on your screen.-Hacker Noon
Py 2.0
Check this out if you’ve got an iPhone and want to learn to code Python, SQL, HTML—actually, pretty much any language—on the go.-Product Hunt
29 common beginner Python errors on one page
Beginner or not, you’ll want to print out this flowchart and keep it at your desk.-Python for Biologist
How to Call B.S. on Big Data: A Practical Guide
One of our favorite tips in here: 'If you’d ask [a question] at a car dealership, you should ask it online, too.'-The New Yorker
4 steps to conducting a proper root cause analysis
Whip out this guide the next time your boss asks you a question like “Why is revenue down?”-Outlier AI
The Hitchhiker’s Guide to d3.js
Intimidated by the long list of functions in d3’s API documentation? Paralyzed by choosing from dozens of d3 tutorials? Start here.-Ian Johnson
Methodologies as Vanity Metrics
“When you work on learning new methods (Now I know Random Forest! Now I know K-L Divergence! Now I know Deep Learning!) it feels good—you’re exercising your brain, you know something you didn’t before—and it’s easy to think you’re progressing. But methods don’t in and of themselves drive value.”-Ian Blumenfeld
Profiling a Dataset of Craft Beers
Learn how to summarize a dataset with descriptive statistics using this fun Python tutorial.-Jean-Nicholas Hould
Setting up SQL for beginners is hard
SQL’s human-language-like syntax and declarative nature make it the perfect language for people with no coding experience. But getting data available in the right structure presents a major barrier to entry. Here’s how to quickly build a stack for teaching SQL to others.-Vicki Boykis
Alternatives to a Degree to Prove Yourself in Deep Learning
Why blogging might be the best way to land a job offer.-fast.ai
The Etymology of Trig Functions
Way more engaging than your high school math class.-Matthew Conlen
How to ask questions data science can solve
Asking the right questions is half the battle. This post takes a different approach to formulating questions, by mapping them to the tools of the trade.-Towards Data Science
1,000+ Women in Data Science
Your Twitter feed just got so much better.-Renee Teate
Group-by From Scratch
What’s the best way to split-apply-combine in Python? Although pandas groupby() is the widely-accepted default answer, there are situations where using built-in Python operations and NumPy and SciPy operations are more effective.-Jake VanderPlas
Taking Prophet for a Spin
Been meaning to try Prophet? Check out this walkthrough of Facebook’s Bayesian-influenced time series forecasting package (for both R and Python!).-Fast Forward Labs
How to Make the Leap from Excel to SQL
Learning SQL is easier when you have Excel in your toolbelt. And moving your analysis into SQL will seriously speed up your workflow.-Mode
Blinded by Statistical Significance
Putting too much stock in an arbitrary threshold may lead to bad decisions.-KelloggInsight Blog
Not Even Scientists Can Easily Explain P-values
We want to know if results are right, but a p-value doesn’t measure that. It can’t tell you the magnitude of an effect, the strength of the evidence or the probability that the finding was the result of chance.-FiveThirtyEight
What’s Wrong With My Time Series
When you want to test a model’s predictive power, cross validation is usually the way to go. However, since data points in a time series are dependent on each other, randomly selecting subsets for training and testing won’t do. Check out these other ways to determine error sources in time series.-MultiThreaded
Mathematicians becoming data scientists: Should you? How to?
Tips for determining if you’ll actually like the work data scientists do and positioning your mathematics background as an asset when you’re interviewing.-Quomodocumque
How to change careers and become a data scientist - one quant’s experience
One quant shares her story of switching from energy trading to data science: the resources she used, the classes she took, her decision to move to the Bay Area, and her advice for handling tech culture shock.-fast.ai
The Zero Bug
Hidden errors can be worse than visible errors. This post presents a fallacy that plagues many data analysts: common data aggregation tools usually can’t “count to zero” from examples.-Win-Vector Blog
I ranked every Intro to Data Science course on the internet, based on thousands of data points
There are a ton of data science training options online, but which one is the best?-freeCodeCamp
Unlearning descriptive statistics
If you’ve ever used an arithmetic mean, a Pearson correlation, or a standard deviation to describe a dataset, this post is for you.-Stijn Debrouwere
Guide to Encoding Categorical Values in Python
There are a ton of ways to turn categorical variables from text attributes into numerical values. Here’s how to implement the many options offered by pandas and scikit-learn on your own datasets.-Practical Business Python
Data Science for Beginners
“These videos are basic but useful, whether you’re interested in doing data science or you work with data scientists.”-Microsoft Azure
Intro to Data Science for Academics
From Reed College to Revenue at Twitter, one data scientist shares his insights on how academics can be successful in industry—by finding ways to create value in every corner of the business.-Noah Pepper
The best R package for learning to “think about visualization”
Spoiler alert: it’s ggplot2.-Sharp Sight Labs
My Experience as a Freelance Data Scientist
Itching to strike out on your own? Read up on the pros and cons before you give your two weeks notice.-Greg Reda
Matching to estimate the causal effects of firing an NFL coach
To fire or not to fire? When a football team gives their coach the boot, are they better off for it? (Bonus: a nice primer on causal inference.)-StatsbyLopez
How These Three Women Made Mid-Career Pivots Into Data Science
How do we narrow the gender gap in data science? Early STEM education for girls isn’t the only solution. Here are the journeys of three women who switched from creative jobs to data roles mid-career.-Fast Company
What’s the state of the job market in data science and machine learning?
“Th[e] proliferation of courses, resources, books and startups would hint that machine learning is becoming more and more accessible to the average programmer and that the market is on track to getting saturated quickly. Is this the current trend?”-Hacker News
What library do you use for information theory in Python?
This thread is a goldmine if you’re looking to calculate entropy, mutual information, or any other information theory metric.-Randy Olson
Time Series Analysis in Python- Linear Models to GARCH
A well-written, comprehensive primer on the time series models available in Python.-BlackArbs
The Game Theory of the Yankee Swap
Want to get the best present at this year’s White Elephant gift exchange? Prep for total domination with these Python models.-Ben Casselman
How the Circle Line rogue train was caught with data
When a series of signal interferences led to massive disruptions on a Singapore subway line, a team of data scientists stepped in to solve the mystery… with Python!-Data.gov.sg
Building a Financial Model with Pandas
Expand your knowledge of Python and Pandas and analyze your mortgage payment options. Two birds, one stone.-Practical Business Python
Text Analysis and Visualization
Ever wanted to try text analysis in Python, but didn’t know where to start? Here’s your launch pad.-Irene Ros
8 Data Science Skills That Every Employee Needs
A nice primer to share with your colleagues.-Amplitude
Is Bayesian A/B Testing Immune to Peeking? Not Exactly
A common A/B testing mistake is to monitor the test and stop it when the p-value reaches a certain threshold. Many have suggested that using Bayesian methods eliminates this “peeking problem,” but all is not as it appears.-Variance Explained
Practical advice for analysis of large, complex data sets
“This document has been read more than anything else I’ve done at Google over the last eleven years. Even four years after the last major update, I find that there are multiple Googlers with the document open any time I check.”-The Unofficial Google Data Science Blog
PostgreSQL Date Functions (and 7 Ways to Use Them in Business Analysis)
PostgreSQL date functions (like DATE_TRUNC, EXTRACT, and AGE) make wrangling timestamps much easier. Here are 7 examples of applying these date functions to business scenarios.-Mode
How to Master Anti Joins and Apply Them to Business Problems
How to perform an anti join using LEFT JOIN and WHERE. Plus three examples of using anti joins in business scenarios.-Mode
What Would It Take To Turn Blue States Red?
Explore this interactive data visualization to see how small voting shifts among different demographics can impact the Presidential election.-FiveThirtyEight
Farmers Markets
Can you find real maple syrup outside of Vermont? Or seafood in the midwest? Or pet food anywhere? Check out these interactive visualizations to see what you’re most likely to find at a farmers market near you.-Susie Lu
On Average
Does the average person actually exist? Probably not, as it turns out. Learn how the concept of “average” influences product design, and why that’s not always a good thing.-99% Invisible
Goodbye, Ivory Tower. Hello, Silicon Valley Candy Store.
Some economists are trading in their professorships for tech jobs: 'Instead of thinking about national or global trends, they are studying the data trails of consumer behavior to help digital companies make smart decisions that strengthen their online marketplaces in areas like advertising, movies, music, travel and lodging.'-New York Times
Asking good questions is hard (but worth it)
Although this framework is written from a programmer’s perspective, it’s a great read for analysts and the folks who ask them questions day-in and day-out.-Julia Evans
Postgres Data Types to Redshift Data Types
Switching from one flavor of SQL to another can be a major pain. This table translates Postgres data types to their equivalent in Redshift. Definitely worth starring on Github.-Rob Story
The Three Faces of Bayes
The term “Bayesian” can refer to a variety of philosophies and ideas. Read this article before the next quant-heavy cocktail party you attend, so you’ll know what’s what.-Slackpropagation
R Psychologist
Puzzled by p-values? Confounded by confidence intervals? Stumped by significance testing? This site is a bevy of interactive visualizations illustrating tricky statistical concepts. Even if you’re a statistical genius, it’s worth a visit to play around.-Kristoffer Magnusson
70+ Resources for Transitioning to a Data Science Career
Considering a career in data science? Time to read up. Here's a list of tutorials, tips for interviewing, and stories from people who've made it.-Mode
Forget Python vs. R: how they can work together
Apparently we can all get along. The folks at Civis Analytics share the benefits of using both languages and give an example of how you can use C as a bridge to both Python and R. (Slides and a video from the original SciPy talk are also available.)-Civis Analytics
3 Reasons Counting is the Hardest Thing in Data Science
Counting isn’t technically difficult; the real challenge lies in managing relationships and office politics that surround the task.-Dayne Batten
Top 20 Pandas, NumPy, and SciPy Functions on Github
Some of the most popular Python functions, visualized in Python.-Alexander Galea
Build Algorithms Like You Give a Damn
Discussions at the 2016 WrangleConf focused on data science ethics and strategies for combatting harm by opening communication, recognizing bias, and fighting indifference.-Mode
Ethics for powerful algorithms
Contrary to a ProPublica investigation, COMPAS—a proprietary algorithm used to predict police recidivism and inform parole—isn’t statistically biased against black people. However, that doesn’t mean COMPAS isn’t deeply unfair. This is the first of four posts digging into data science ethics.-Abe Gong
Understanding Bias: A Pre-requisite For Trustworthy Results
“What causes bias? How can we correct it, and how does our picture of how the world works factor in to that?”-Adam Kelleher
A visual guide to Bayesian thinking
The best single source we’ve found for demystifying how Bayes’ Rule works, the intuition behind it, and how you can use it to inform your thinking.-Julia Galef
Thinking in SQL vs Thinking in Python
Using a new language requires a new mindset. Our chief analyst shares his learnings from adding Python to his SQL workflow.-Mode
The Theorem Every Data Scientist Should Know
Quick! Define the Central Limit Theorem. Scratching your head? You’re not alone. And yet, this theorem is key to what data scientists do every day: make statistical inferences about data.-Jean-Nicholas Hould
If Correlation Doesn’t Imply Causation, Then What Does?
This tweet sums up our feelings on this article exactly: 'Love that it gives a framework for thinking about correlations that isn’t just ¯ (ツ)_/¯'-Adam Kelleher
Building a data science portfolio
Much like writers and designers, data scientists are now expected to provide portfolios when they apply for jobs. Here’s what you need to know to get started.-Dataquest
Escaping Excel Hell with Python & Pandas
A great presentation on the problems that arise from spreadsheet analysis and how you can ditch Excel by learning some Python.-Chris Moffitt
10 Useful Python Data Visualization Libraries for Any Discipline
While many Python data visualizations libraries are narrowly focused on accomplishing a certain task, these libraries can be used regardless of your field.-Mode
Scientific Python Cheat Sheet
For those moments when you forget how to make a contour line plot in matplotlib or write a function in pure Python.-Institut de Physique du Globe de Paris
What SQL Analysts Need to Know About Python
Here's some info on the importance of Python and how to use it in day-to-day analysis.-Segment
Easier data analysis in Python with pandas
A series of video tutorials for pandas newbies who know some Python. Each video answers a student-posed question using real-world data.-Data School
PyData London Conference Presentations
A few weekends ago PyData hosted a conference in London, and they just released videos and slides of a bunch of the presentations.-PyData
Modern Pandas
This tutorial is great for experienced Python users looking to stay sharp on pandas. One Twitter user summed it up perfectly as “the abbreviated Strunk & White of data analysis.”-Tom Augspurger
Spreadsheet Thinking vs. Database Thinking
This a great read for anyone who’s new to working with relational databases.-eagereyes
SQL Joins Visualizer
Many a learner has embarked on the quest to learn SQL, only to be thwarted by the task of mastering joins. Never again. Click the type of join you want to execute and this site will generate the right code.-SQL Joins Visualizer
10+2 Data Science Methods that Every Data Scientist Should Know in 2016
Forgive the click-baity title. This is actually a really well-done roundup of the statistical and machine learning methods data scientists use daily, with Python and R scripts for each.-Takashi J. Ozaki
An Introduction to Inference
A good first step for those who work with data frequently and want to learn more about Bayesian statistical methods. From the author: 'It will be a bit mathy, but nothing beyond kahn-level probability.'-Vincent D. Warmerdam
6 Lesser Known Python Data Analysis Libraries
You’ve heard of NumPy and Pandas and matplotlib. Now check out these other handy libraries for dealing with data.-Jyotiska Khasnabish
How to Find Correlative Metrics For Conversion Optimization
A thorough walk-through of how to find correlative metrics and leverage them for conversion. It’s jam-packed with examples and advice from experts, plus a handy list of tools.-ConversionXL
This is the difference between statistics and data science
Another blog post trying to define data science? We know. We know. BUT! This one presents an interesting angle: the difference between a data scientist and a statistician comes down to product knowledge.-Mixpanel
Not So Standard Deviations: Episode 11 - Start and Stop
If you haven’t listened to NSSD yet, you’re missing out on an inside look at how data scientists work in industry and academia. In this episode, statisticians Hilary Parker and Dr. Roger Peng discuss their methods for tackling the beginning and ending parts of analyses (discussion starts at 20:43).-Not So Standard Deviations
Lift analysis - A data scientist’s secret weapon
Learn how to spot flaws in machine learning models with lift analysis (and why you should add it to your list of evaluation metrics).-Andy Goldschmidt
A Practical Guide to Anonymizing Datasets with Python & Faker
Sometimes you just want to show off an analysis or chart you built for your company… without revealing your company’s data. Now you can.-District Data Labs
Writing Data—an introduction to choosing & using data formats
JSON, CSV, or HDF5? This guide outlines the perks and pitfalls of file formats for alphanumeric data.-Build Things Together
Friction Between Programming Professionals and Beginners
In many technical forums, there’s a pattern of beginners asking a vague question and forum veterans responding with snarky or curt replies. Here are some suggestions both parties can use to keep conversations productive.-Programming for Beginners
Practical skills that practical data scientists need
Last week, Noah Lorang of Basecamp wrote that, most of the time, data scientists don’t need AI to solve business problems. They just need simple arithmetic. In this post, he elaborates on the skills he uses and questions he asks every day.-Signal v. Noise
Data scientists mostly just do arithmetic and that’s a good thing
A vast majority of the time, businesses don’t need machine learning to solve their problems. They need accurate, actionable data and people who consider context, know basic math, write SQL, and understand what makes businesses tick.-Signal v. Noise
LowClass Python—Style Guide for Data Scientists
This style guide is meant for use by advanced beginner to advanced intermediate developers of scientific code in Python. In other words, non-professional programmers...for example, data scientists.-Columbia University Applied Data Science
The Elements of Python Style
This document goes beyond PEP8 to cover the core of what I think of as great Python style. It is opinionated, but not too opinionated. It goes beyond mere issues of syntax and module layout, and into areas of paradigm, organization, and architecture.-Andrew Montalenti
The Art of Naming Things
Nothing’s worse than when you open a new dataset only to find it’s full of indecipherable labels. This two-part article provides suggestions to keep your naming convention consistent, concise, and informative while preventing data loss and a whole lot of headaches.-Penn State
A menagerie of messed up data analyses and how to avoid them
Don’t let mistakes botch your analyses. This post outlines six examples and offers advice for taking proactive measures against them.-Simply Statistics
Guess the Correlation
How good are you at gauging the correlation between two variables in a scatter plot? Find out!-Omar Wagih
Writing More Legible SQL
It’s easy to get lazy when writing SQL. Here are a few tips for cleaning up your queries so others can actually read your work.-Craig Kerstiens
AMA Data Scientist—Jake Porway of DataKind
Highlights of the discussion include advice for budding data scientists, ethical challenges, and opportunities to do good with data.-Reddit
Getting to the “Plateau of Productivity” with Python
Using the Gartner Hype Cycle as a framework, this post provides a load of context and tips for anyone who wants to pursue Python. As an added benefit, you could apply this structure to learning any technical language or tool.-Practical Business Python
The Missing 11th of the Month
According to Google’s Ngrams database, the 11th is mentioned significantly less than other monthly ordinals. But why? We don’t want to spoil the conclusion, but this post is a good reminder of why you shouldn’t blindly trust data.-Dr. David Hagen
The Field Guide to Data Science
Booz Allen just released The Second Edition of The Field Guide to Data Science, which walks you through how to use data to generate value for your organization. The guide includes practical advice, tested processes, and insights that are helpful for anyone who touches data, whether you’re a senior exec, a practioner, or a newbie.-Booz Allen Hamilton
Big Data Still Requires Humans To Make Meaningful Connections
It’s easy to get swept up in the exciting opportunities big data presents and forget that data alone isn’t a solution—it’s a tool to help solve problems. This article hits on a sentiment we’ve been hearing a lot lately—“we still need humans to help make sense of the data we are collecting.”-TechCrunch