Data News
From stories about how data is changing our world to thoughtfully presented pieces of data journalism, this section is devoted to all things data news.
Emerging Architectures for Modern Data Infrastructure
“The growth of the data infrastructure industry has continued unabated since we published a set of reference architectures in late 2020. Nearly all key industry metrics hit record highs during the past year, and new product categories appeared faster than most data teams could reasonably keep track.”-Future
A Very Big Deal
Snowflake goes shopping, and buys the store.-benn.substack
The FTC’s New Enforcement Weapon Spells Death for Algorithms
The Federal Trade Commission might have a new standard for penalizing tech companies that violate privacy and use deceptive data practices: make them destroy their algorithms.-Protocol
Disclose Your Angel Investments
The community has given us a lot. We should be transparent about it.-benn.substack
The Next Billion Programmers
The next product Mode’s Chief Analytics Officer would build? Excel, for everything.-benn.substack
Business in the Back, Party in the Front
Over the last decade, we’ve reached consensus on how the back of the data stack should look: get the data in with ELT, store and transform the data in cloud data warehouses, etc. How we handle the front of the data stack — the consumption layer — is still very much up for debate.-benn.substack
The Metadata Money Corporation
Selling the potential to be a standard lets companies spin stories about how their growth curves can go vertical. But standards that are only standards until a better idea comes along aren’t really standards at all.-benn.substack
Data's Trillion Dollar Question Mark
How a data warehouse could become a data platform—and an organizational brain.-benn.substack
Lies, Damned Lies, and Rankings: the Problem With Bloomberg's COVID Resilience Ranking
Despite embedded bias, scores aren’t going away. What’s important is recognizing the embedded bias and regularly reviewing the choice of factors and weights to ensure the bias is aligned with your goals and minimizes unintended consequences.-Zata Novo
File Not Found
There’s a generational divide in how we access information. Professors organize files with folders. Students search. But directory structure remains incredibly important in tech and STEM fields, leading new coders to constantly come up against “file not found” errors.-The Verge
What Really Happened When Google Ousted Timnit Gebru
This is the most in-depth piece we’ve seen about Google’s unceremonious dismantling of its Ethical AI team and the tensions inherent in an industry’s efforts to research the downsides of its favorite technology.-WIRED
Google Is Poisoning Its Reputation With AI Researchers
“Google has worked for years to position itself as a responsible steward of AI... But now its reputation has been badly, perhaps irreversibly damaged, just as the company is struggling to put a politically palatable face on its empire of data."-The Verge
Your Local Police Department Might Have Used This Facial Recognition Tool To Surveil You. Find Out Here.
This database shows if the police department in your community is among the hundreds of taxpayer-funded entities that used Clearview AI’s facial recognition.-BuzzFeed News
Why the Pandemic Experts Failed
Data-driven thinking isn’t necessarily more accurate than other forms of reasoning, and if you do not understand how data are made, their seams and scars, they might even be more likely to mislead you.-The Atlantic
Python Developers Survey 2020 Results
Here’s one of many interesting tidbits: “Only 32% of the Python developers involved in Data analysis and Machine learning consider themselves to be Data Scientists.”-JetBrains
Developing a Database of Structural Racism–Related State Laws for Health Equity Research and Practice in the United States
“Although U.S. state laws shape population health and health equity, few studies have examined how state laws affect the health of marginalized racial/ethnic groups (e.g., Black, Indigenous, and Latinx populations) and racial/ethnic health inequities.”-SAGE Journals
Data Feminism
“Illustrating data feminism in action, D'Ignazio and Klein show how challenges to the male/female binary can help challenge other hierarchical (and empirically wrong) classification systems. They explain how, for example, an understanding of emotion can expand our ideas about effective data visualization, and how the concept of invisible labor can expose the significant human efforts required by our automated systems.”-Catherine D'Ignazio and Lauren F. Klein
This Is How We Lost Control of Our Faces
Over the last 43 years, facial-recognition researchers gradually abandoned asking for people’s consent. Now, more and more personal photos are used in datasets without their owners knowledge.-MIT Technology Review
What Is Data Justice? The Case for Connecting Digital Rights and Freedoms Globally
“This paper posits that just as an idea of justice is needed in order to establish the rule of law, an idea of data justice – fairness in the way people are made visible, represented and treated as a result of their production of digital data – is necessary to determine ethical paths through a datafying world.”-Big Data & Society
COVID-19 Vaccine Distribution Algorithms May Cement Health Care Inequalities
Many of the algorithms used by federal and state governments rely on data from the U.S. Census. The U.S. Census regularly undercounts vulnerable populations.-VentureBeat
How Our Data Encodes Systematic Racism
“I’ve often been told, ‘The data does not lie.’ However, that has never been my experience. For me, the data nearly always lies."-MIT Technology Review
Google Employees Say Scientist's Ouster Was 'Unprecedented Research Censorship'
Until last Wednesday, Timnit Gebru was a co-lead of the Ethical AI team at Google. She is one of the few Black women working in this field. Her firing brings up an often-raised question: can a company be trusted to hold its technology accountable?-NPR
Emerging Architectures for Modern Data Infrastructure
“In the last two years, we talked to hundreds of founders, corporate data leaders, and other experts – including interviewing 20+ practitioners on their current data stacks – in an attempt to codify emerging best practices and draw up a common vocabulary around data infrastructure.”-Andreessen Horowitz
Towards Decolonising Computational Sciences
“We see this struggle as requiring two basic steps: a) realisation that the present-day system has inherited, and still enacts, hostile, conservative, and oppressive behaviours and principles towards women of colour (WoC); and b) rejection of the idea that centering individual people is a solution to system-level problems.”-arXiv.org
‘People of Colour Aren’t Empowered to Make Changes They’re Brought in to Make’
Inioluwa Deborah Raji of the AI Now Institute talks about how she got started in AI ethics and why tech companies aren’t doing enough to address systemic bias in their products.-Silicon Republic
IBM Walked Away from Facial Recognition. What About Amazon and Microsoft?
While this decision comes amidst the nationwide focus on police brutality, the folks at Algorithmic Justice League having been beating the drum about facial recognition bias for years.-VentureBeat
Don’t Be Fooled by America’s Flattening Curve
At first glance, the national and state-wide new COVID-19 cases appear to be leveling off. But removing major metropolitan areas (where cases are declining) from the calculation reveals a series of regional “mini-epidemics” are on the rise.-The New York Times
Female Pioneers in Computer Science You May Not Know
These women paved the way for computer and data science as we know it today.-Re-work
New Research Suggests the US Unemployment Rate is About to Become Useless
For this crisis, the employment to population ratio may be a better measure to assess the job market.-Quartz
Data on COVID-19 Testing
Comparing confirmed cases across countries is a complicated task because there’s no unified definition of what a confirmed case is. In Germany, it’s samples tested. In the U.K., it’s people. And in some countries, the units are unclear or inconsistent.-Our World in Data
Data Centers Are the New Oil
What connects politics, Utah, and Matthew McConaughey? Rooms upon rooms of servers.-Normcore Tech
Data Science Careers for Baltimore’s Underserved Community Members
This inspiring initiative offers a viable model for providing data science training to those who might not be able to access it otherwise.-Hopkins Bloomberg Public Health Magazine
Racial Bias in a Medical Algorithm Favors White Patients Over Sicker Black Patients
“Correcting the bias would more than double the number of black patients flagged as at risk of complicated medical needs..."-The Washington Post
The Spy in Your Wallet: Credit Cards Have a Privacy Problem
What happens to your data after you swipe your card?-The Washington Post (https://www.washingtonpost.com/)
Estimating the success of re-identifications in incomplete datasets using generative models
“Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.”-Nature
How R-Ladies made data science inclusive
Just 14% of R users are women, but that’s actually unusually high for a programming language. And we have R-Ladies to thank!-Quartz
A Turbulent Year: The 2019 Data & AI Landscape
“In a world where data-driven automation becomes the rule (automated products, automated cars, automated enterprises), what is the new nature of work? How do we handle the social impact? How do we think about privacy, security, freedom?”-Matt Turck
Why Hadoop Failed and Where We Go from Here
Hadoop was excellent at economically harnessing data types that were constantly evolving. Managing the core data of an enterprise? Not so much.-Teradata
Python's Caduceus syndrome
What happens when a programming language grows up?-Normcore Tech
Grocery Bills Can Predict Diabetes Rates by Neighborhood
Dietary habits are notoriously difficult to monitor. By analyzing sales figures from London’s biggest grocer, data scientists were able to link eating patterns with local rates of high blood pressure, high cholesterol, and high blood sugar.-MIT Technology Review
An Algorithm Wipes Clean the Criminal Pasts of Thousands
When you see “criminal” and “algorithm” in a headline together, it's usually a sign the article will be about unfair bias. But not in this case! Code for America used an algorithm to automatically remove cannabis convictions from Californians' records, reducing a process that would have taken months to mere minutes.-BBC
A Weather Tech Startup Wants to Do Forecasts Based on Cell Phone Signals
Speaking of making your own data... ClimaCell is developing a new mathematical model that turns cell phone signals into weather data that's way more accurate than your local weatherperson.-MIT Technology Review
Writing a Letter to DataCamp
In the wake of a case of sexual misconduct at DataCamp, one instructor reflects on her relationship with the company, and why she doesn't want you to take her courses.-Julia Silge
Coding Is for Everyone—as Long as You Speak English
If people can translate programming languages easily enough into esoteric versions like LOLCODE, why are there only four programming languages widely available in multilingual versions?-Wired
Scientists rise up against statistical significance
“We must learn to embrace uncertainty. One practical way to do so is to rename confidence intervals as ‘compatibility intervals’ and interpret them in a way that avoids overconfidence.”-Nature
Facial recognition's 'dirty little secret': Millions of online photos scraped without consent
Earlier this year IBM released a dataset of 1 million photos of people's faces designed to reduce bias in facial recognition software. These photos were obtained from Flickr, without users' knowledge or consent.-NBC News
How Your Health Information Is Sold and Turned Into ‘Risk Scores’
Companies such as LexisNexis have collected personal data to help doctors make informed decisions about prescribing opioids. And since no law prohibits collecting such data or using it in the exam room, it's happening without patient consent.-Politico
Why There Will Be No Data Science Job Titles By 2029
“The only thing that is certain is change, and there are changes coming to data science. One way to be on top of this trend is to not only invest in data science and machine learning skills but to also embrace soft skills.”-Forbes
Demand and Salaries for Data Scientists Continue to Climb
Data science job openings are expanding faster than the number of technologists looking for them.-IEEE
I Gave a Bounty Hunter $300. Then He Located Our Phone
T-Mobile, Sprint, and AT&T are selling access to their customers’ location data, and that data is ending up in the hands of bounty hunters and others not authorized to possess it, letting them track most phones in the country.-Motherboard
Amazon scraps secret AI recruiting tool that showed bias against women
Don't just read this article—read the discussions around it too. Peter Aldhous makes a great point that “This is being reported as a problem with machine learning, but there's another way of looking at it: The algorithm exposed bias in their existing hiring practices.”-Reuters
Who needs democracy when you have data?
“As far as we know, there is no single master blueprint linking technology and governance in China. But there are several initiatives that share a common strategy of harvesting data about people and companies to inform decision-making and create systems of incentives and punishments to influence behavior.”-MIT Technology Review
To work for society, data scientists need a hippocratic oath with teeth
Guess who was totally unsurprised by the unfolding data scandals surrounding Cambridge Analytica and Facebook? Cathy O’Neil, author of Weapons of Math Destruction. In this interview, O’Neil shares her vision for combatting the silent, society-wide bureaucracy governed by algorithms and big data.-Wired
A Code of Ethics for Data Science
Speaking of the responsible use of data… the former U.S. Chief Data Scientist has issued a rally cry for the data science community to band together and take a leadership role in defining right from wrong. If you’re interested in contributing to the conversation, join the Data for Democracy Slack group.-DJ Patil
Fitness tracking app Strava gives away location of secret US army bases
In a case of content marketing gone wrong, fitness tracker Strava shared a heatmap of every single user activity ever uploaded to the app. Although pretty, the map is detailed enough for someone to clearly identify internal layouts of foreign US army bases in countries such as Afghanistan, Djibouti, and Syria.-The Guardian
What is the Future of Pandas
A must-watch talk for any pandas developer.-PyData
Five ways to fix statistics
The debate rumbles on and on: how much is bad statistics to blame for poor reproducibility? Nature asked influential statisticians to recommend one change to improve science and found the problem is not numbers, but ourselves.-Nature
Why is this company tracking where you are on Thanksgiving?
A data study of how political divisions affected 2016's Thanksgiving celebrations is raising some eyebrows within the data science community, including those of former U.S. Chief Data Scientist DJ Patil. SafeGraph provided the researchers with 17 trillion very specific location markers for 10 million smartphones, despite claiming that the data they collect is anonymized.-The Outline
When Data Science Destabilizes Democracy and Facilitates Genocide
Last week’s Senate Intelligence hearing with Facebook, Twitter, and Google shined a bright light on the ethical responsibility of tech companies—and their data scientists.-fast.ai
Not a revolution (yet): Data journalism hasn’t changed that much in 4 years, a new paper finds
Exploring the news through an interactive visualization can feel cutting-edge, but data journalism's labor intensity and reliance on officially collected data make it “more likely to complement traditional reporting than to replace it on a broad scale.”-NiemanLab
R for Journalists
This site is a great launch pad for anyone who's new to R, journalist or not. Each post provides step-by-step instructions and code for making a visualization with data about a current event.-R for Journalists
The ‘Nate Silver Effect’ Is Changing Journalism. Is That Good?
“Political journalism has become infatuated with opinion polls... and yet news organizations remain ill-equipped to make sense of the flood of data.”-Politico Magazine
The Media Has A Probability Problem
In the final installment of a series reviewing news coverage of the 2016 general election, Nate Silver explores the challenges of calculating, interpreting, and communicating probabilities to the public.-FiveThirtyEight
A Tale of Two Industries: How Programming Languages Differ Between Wealthy and Developing Countries
The latest analysis from Stack Overflow found correlations between certain technologies and GDP per capita. Particularly interesting: questions regarding two data science powerhouses, R and Python, are asked more frequently in high-income countries.-Stack Overflow
Data On Drug Use Is Disappearing Just When We Need It Most
“We’re simply flying blind when it comes to data collection, and it’s costing lives.”-FiveThirtyEight
Dissecting Trump’s Most Rabid Online Following
Come for the “subreddit math,” stay for the latent sentiment analysis methodology.-FiveThirtyEight
Airbnb’s worst problems are confirmed by its own data
While roughly 71 percent of hosts rented out their home for three months or less, there were still thousands of 'whole units', meaning an entire house or apartment, which were rented for six months or more during the last year.-The Verge
Airbnb Says Data Dump Shows Misuse of Service Is Rare
With its release of a trove of data this week, the short-term rental company Airbnb sought to underscore how the majority of its hosts in New York City are playing by the rules.-New York Times
Hans Rosling: An Appreciation
[Hans Rosling] He championed the idea of showing people what the world was really like – and how it was different from their preconceptions–using data and visualization.-eagereyes
Remembering Hans Rosling, the visualization pioneer who made data dance
Rosling's work was a driver of some of the explosion of interest in data visualization in the news and nonprofit sectors starting in the early 2000s. His BBC special and TED Talks sparked an interest in 'storytelling with data,' rather than just with words.-Wonkblog
What It Takes to Truly Delete Data
Can an entire dataset of important information really be deleted, just like that?-FiveThirtyEight
States Move to Protect Their Immigration Data from the Trump Administration
Washington’s governor has asked staff to figure out how to keep data from being used for mass deportations-The Verge
How statistics lost their power – and why we should fear what comes next
“Not only are statistics viewed by many as untrustworthy, there appears to be something almost insulting or arrogant about them. Reducing social and economic issues to numerical aggregates and averages seems to violate some people’s sense of political decency.”-Guardian
Finally, Uber Releases Data to Help Cities With Transit Planning
But it’s not the highly coveted numbers cities need. How helpful is the company’s new data tool?-CityLab
Uber Extends an Olive Branch to Local Governments: Its Data
The ride-hailing company Uber and local governments often do not play well together. But now, with a new data-focused product, Uber is offering a tiny olive branch to its municipal critics.-New York Times
A non-comprehensive list of awesome things other people did in 2016
Here’s a good year-in-review for all you stats lovers out there.-Simply Statistics
Scientists are frantically copying U.S. climate data, fearing it might vanish under Trump
Alarmed that decades of crucial climate measurements could vanish under a hostile Trump administration, scientists have begun a feverish attempt to copy reams of government data onto independent servers in hopes of safeguarding it from any political interference.-Washington Post
How Trump’s White House Could Mess With Government Data
Outright manipulation may be unlikely, but there are subtler things the administration could do.-FiveThirtyEight
White House Special with DJ Patil, US Chief Data Scientist
In this interview, DJ talks about the government’s relationship with Silicon Valley, the White House’s position on data ethics, and why George Washington was actually the first U.S. Chief Data Scientist.-Partially Derivative
2016: A Year of Data-Driven Confusion
“We need strong mechanisms for ethical and fair practices within teams and organisations, and a culture where pushing back on conclusions is well-received and seen as a sign of strength, not of defiance.”-Model View Culture
Yes, the election polls were wrong. Here's why
We treat polls like weather forecasts – but voters are inherently unpredictable. A hunger for certainty sets expectations that are impossible to meet.-Guardian
Meet a Polling Analyst Who Got the 2016 Election Totally Wrong
Sam Wang opens up about political forecasting, eating crickets on live television, and what we can all learn from Hillary Clinton’s shocking loss.-Pacific Standard
How Data Failed Us in Calling an Election
It was a rough night for number crunchers. And for the faith that people in every field — business, politics, sports and academia — have increasingly placed in the power of data.-New York Times
Why are we so surprised?
In theory, we should not be surprised by the outcome of the 2016 presidential election, but in practice we are.-Probably Overthinking It
Data Sets Are The New Server Rooms
As Foursquare has proven, collecting proprietary data from the get-go can lead to a major competitive advantage in the long run. But doing so requires cash, and lots of it.-John Nussbaum
Ethics for powerful algorithms
Contrary to a ProPublica investigation, COMPAS—a proprietary algorithm used to predict police recidivism and inform parole—isn’t statistically biased against black people. However, that doesn’t mean COMPAS isn’t deeply unfair. This is the first of four posts digging into data science ethics.-Abe Gong
The Genomics Inflection Point: Implications for Healthcare
Genomics has the potential to massively improve on our collective health. Although cost has dropped significantly and technology has improved, genomics hasn’t yet been widely adopted by the public. This survey of 1,000 consumers sheds light on the challenges genomics faces before becoming a normal part of everyday healthcare.-Rock Health
Data Journalism Awards 2016: what the winners tell us about the state of the data nation
The Data Journalism Award winners were announced last Thursday. The director of the awards reflects on what these winners reveal about the state of data journalism.-Simon Rogers
Uber Checks Into Foursquare’s Massive Location Database
Uber will now tap into Foursquare's location data, especially its "point of interest" data (restaurants, stores, landmarks, etc.) to enhance its database of locations.-Fortune
Uber taps Foursquare’s Places data so you never have to type an address again
Foursquare is providing points of interest data to Uber so that riders can type in venue names to specify their pick-up and drop-off locations.-TechCrunch
What’s driving Silicon Valley to become ‘radicalized’
The fallout from Apple vs. the FBI has the tech industry rattled. More and more companies are upping security—collecting less information, investing in tougher encryption, and giving customers the keys to their own data.-Washington Post
When newsrooms don’t own their data, other companies profit
Companies like Foursquare have proven that there’s power in building proprietary datatsets. And that raises the question: how might news publishers aggregate information to create enterprise data models of their own?-Poynter
An unlikely source predicted Chipotle's disastrous quarter, and it says a lot about the future of investing
Not everyone was caught off guard by the scale of the drop in same-store sales at Chipotle. Using foot traffic data, Foursquare called it.-Business Insider
Microsoft’s Tay is an Example of Bad Design
0r Why Interaction Design Matters, and so does QA-ing.-Caroline Sinders
Here's How We Prevent The Next Racist Chatbot
Tay.ai is the consequence of poor training-Popular Science
Why Microsoft Accidentally Unleashed a Neo-Nazi Sexbot
It’s not surprising that Microsoft’s chatbot spewed racist invective, but here’s how it could have been avoided.-MIT Technology Review
Moneyball for Book Publishers: A Detailed Look at How We Read
Publishers are now using reader behavior data collected from e-readers to inform decisions about advertising budgets and marketing tactics. Obviously, the impact of reading analytics presents concerns for authors and readers alike.-New York Times
We Now Have Algorithms To Predict Police Misconduct
You’ve probably heard of predictive policing, but what about predictive policing for the police? One police department teamed up with researchers to test an algorithm that detects troublesome behavior of officers early on.-FiveThirtyEight
Why data journalism tries, and fails, to go global
With the success of data blogs like The Upshot and data publications like FiveThirtyEight, it feels like data journalism is making a big impact. But in countries where data journalism could do the most good, there are obstacles that bootcamps and hackathons can’t overcome.-Sunlight Foundation
The Ethical Data Scientist
Even though the ethics of data science have been bubbling up in conversation lately, we don’t talk about them nearly as much as we should. Why is that? And how can we go about fixing it?-Slate
Let’s Move Beyond Open Data Portals
Open data portals have been integral to making government more transparent. So why is a man who spent much of his career opening data now arguing that we should abandon open data portals altogether?-Abhi Nemani
On research parasites and internet mobs - let’s try to solve the real problem.
The New England Journal of Medicine recently published an editorial about data sharing which referred to people who use data secondhand as “research parasites.”-Simply Statistics
The Experiment Experiment
When psychologist Brian Nosek tried to reproduce the results of 100 studies published in the top peer-reviewed scientific journals, only 39 could be replicated. Might the scientific community have an unconscious bias toward publishing positive results? Find out.-Planet Money
The Future of Big Data and Analytics in K-12 Education
At edtech startup AltSchool’s private campuses, student actions are recorded every day. AltSchool’s software and algorithms search this data for patterns and make suggestions for how to improve student performance. If you only read one article today, this is it.-Education Week
Georgia Tech Researchers Demonstrate How the Brain Can Handle So Much Data
Random projection is frequently used in machine learning to make sense of big, diverse data. It turns out this method could be one of the ways that humans learn, too.-Georgia Tech
Your Doctor Doesn’t Want to Hear About Your Fitness-Tracker Data
While your Fitbit or Apple Watch can be great for tracking your activity and weight loss, it might not help your doc too much. From these doctor’s perspectives, the most promising wearables are yet to come.-MIT Technology Review