Data are becoming the new raw material of business
The Economist


Data Sources for Cool Data Science Projects Part 6

startup-593324_960_720Links to Part 1Part 2Part 3Part 4, Part 5

At The Data Incubator, we run a free eight week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are a few cool public data sources you can use for your next project:

Government/Politics 

  1. Presidential Newspaper Endorsements: Noah Veltman has published a lot of cool data projects, one of them being all presidential endorsements of over 100 newspapers from 1980 till now. You can see it as a formatted table or spreadsheet.  
  2. Medicare Beneficiaries: There are more than 55 Americans covered by Medicare and the Medicare Health Outcomes Survey measures the ‘physical and mental health and well-being’ of beneficiaries for a 2 year period. The data set covers recipients from 1998-2014.
  3. American Manufacturing: The Census Bureau publishes the Annual Survey of Manufactures (ASM). This is a state and industry level data set for America’s manufacturing sector.

Travel

  1. Take Flight: OpenFlights.org has compiled data on over 60,000 flight routes and almost 1,000 iteneraries from the world’s busiest airport, Atlanta Hartsfield-Jackson International Airport. Each route includes the airline, departing airport, arriving airport, stops, and the type of plane.
  2. TSA Confiscation: Max Gaika, a data and FOIA guru built an interactive map of TSA confiscations based on data collected from the government. In this set, there are a total of 22,044 “dangerous items”  

Police/Crime 

  1. State Prison Admissions: The New York Times has gathered data assembling the quantity of inmates sent to state prison by county in 2006, 2013, 2014. The numbers were taken from the National Corrections Reporting Program which is restricted to the public, but accessible for select reporters.
  2. NYC Police Complaints: New York City now publishes official complaints against city police from every closed case since 2006. There are over 200,000 complaints all of which include location and presence of video evidence, but no information about the officer involved.

 

While building your own project cannot replicate the experience of fellowship at The Data Incubator (our Fellows get amazing access to hiring managers and access to nonpublic data sources) we hope this will get you excited about working in data science.  And when you are ready, you can apply to be a Fellow!

Got any more data sources?  Let us know and we’ll add them to the list!

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Back to index