On Wednesday over 500 members of the data science community gathered at the Village in San Francisco for CrowdFlower’s Rich Data Summit.
It was great to see familiar faces, meet new people, and hear data scientists share perspectives. Despite a wide array of vantage points, the talks coalesced around some exciting ideas for the future of data science.
Benn Stancil, Chief Analyst at Mode Analytics, presenting on the importance of data storytelling
1. Everyone's still trying to define data science
“Enjoy that data science is poorly defined. People don't know what to expect, so that gives us freedom to try things and experiment. The definition is vague, so we can contribute to processes in new ways.” - Bruce Smith, Senior Data Scientist at Intuit
Throughout the day, speakers confirmed that we’re still grappling with the definition of data science. In the “The State of Data Science” panel, Bruce Smith (Intuit), Daniela Braga (VoiceBox), and Tim Converse (eBay) discussed how they’d been doing data science years before the term itself emerged.
Smith said that data science is made up of a large group of people who are hard to categorize, but that will change eventually: “Data science will focus. What were computer scientists like 20 years ago? You would have gotten a lot of definitions. Now it’s a lot more consistent. The same thing will happen.”
2. Machine learning + humans = better together
“Human intelligence and machine intelligence aren't in competition; they're natural complements that reinforce each other” - Lukas Biewald, Founder and CEO of CrowdFlower
The effective balance of algorithms and human experience was a common theme. From suggesting an Uber rider's destination to helping companies automate replies to support inquiries, machine learning is becoming more and more sophisticated. Will it eventually render us obsolete and out of jobs? Many of the speakers seemed to think not.
Nate Silver cited advanced chess as an example where humans and machines are stronger together: “Computers can do things more reliably and quickly than we can, but they're still subject to the programs we design.”
Lukas Biewald from CrowdFlower pointed out that we’re introducing new technology in pieces. Instead of introducing a car that’s completely self-driven, Tesla recently launched Autopilot to make driving in traffic safer. But drivers still keep their hands on the wheel.
3. Don’t just do well. Do something good.
“Data by itself isn’t worth anything unless there’s a problem to solve and a community to solve it.” - Beth Noveck, Founder and Professor at The GovLab
If the number of nonprofit organizations on the roster were any indication, we’ve come a long way since 2011 when Jeff Hammerbacher of Cloudera said “The greatest minds of my generation are thinking about how to make people click ads. That sucks.”
Catherine Bracy (Code for America), Beth Noveck (The GovLab), and Wendy Kan (Kaggle) all shared examples of how data scientists are using quantitative skills to solve social problems. Perhaps the most passionate talk came from Eric Schles, who uses Python to stop human trafficking by scraping and analyzing text from sites frequented by perpetrators.
Crowdsourcing is the foundation of many of these organizations. The GovLab is crowdsourcing experts in evacuation planning and redevelopment to help city officials in Quito, Ecuador prepare for the likely eruption of a nearby volcano. Schles’s Hacking Against Slavery project depends on the contributions of many to eventually build software to combat slavery.
Here a few ways you can use data science for good:
- Join Eric Schles in his mission to catch human traffickers. Check out the Hacking Against Slavery site for ways to help out.
- Get involved with Code for America by joining a brigade to solve local civic issues or choosing a project to contribute to on GitHub.
- Attend one of NYU’s GovLab sessions to find out how you can help Quito officials prepare for the Cotopaxi eruption.
- Participate in a Kaggle competition that promotes social good, such as this project to identify endangered whales in aerial photographs.
4. Communication—and open data—will lead to big opportunities
"Get others engaged with data storytelling. You'll create champions constantly on the lookout for new applications of your work." - Bruce Smith, Senior Data Scientist at Intuit
Right now, most job descriptions would lead you to believe that a data scientist’s responsibilities end after the analysis is done and the charts are made. But as Benn Stancil pointed out in his talk on data storytelling: “As data scientists, we aren't usually the decision makers. So we need to be able to explain data to someone else.”
So how do we better communicate our work? Stancil suggests data scientists hone their communication skills by following in the example of skilled data journalists at FiveThirtyEight and The New York Times.
Another way to increase transparency is to open up our data and methods with more open source tools. When asked where she thought the field was going, Daniela Braga said she’s noticed a “huge move to open source.”
And the more we open data, the more people we can empower to find solutions faster. Beth Noveck dialed into why transparency, especially in government data, is so important: “Open data are helping to push us toward a world where we have greater access to and use of information to solve hard problems.”