Data are becoming the new raw material of business
The Economist

2018 Data Sources for Cool Data Science Projects, provided by Thinknum


Links to our previous “Data Sources for Cool Data Science Projects” posts:
Part 1Part 2Part 3Part 4, Part 5

At The Data Incubator, we run a Data Science Fellowship program for Master’s and PhD graduates looking to transition to a career in industry. Our admissions team, as well as our hiring partners, love Fellows who don’t mind getting their hands dirty with data. That’s why our applicants submit ideas for capstone projects they’ll work on throughout the 8-week Fellowship to showcase their data science skills. One of the biggest obstacles to creating and completing successful projects has been getting access to interesting data.

Today, we’re excited to announce a partnership with leading alternative webcrawled data provider, Thinknum. Thinknum has been the principal provider of web crawled data to the finance community for over 3 years, counting more than 150 elite hedge funds and a majority of investment banks in their client list, employing the data to experiment with ever-more innovative and differentiated ways of producing investment ideas across all sectors and multiple asset classes. More recently, Thinknum’s data has been in high demand for the some of the largest and most innovative corporate customers for internal strategic decision making. The data is also heavily used by journalists, especially those reporting on the financial sector, with the media outlets like CNN, Business Insider and CNBC all using Thinknum resources in their stories. This partnership will provide Fellows and Fellowship applicants access to some of the data used by experts in the finance industry and corporate leaders on a daily basis.

Business, economic and social activity is continually moving online. This increasing digital activity leaves behind data trails that, with proper organization, can reveal otherwise invisible trends, shifts and movements. Thinknum clients, and now The Data Incubator Fellows and applicants can utilize this data for the purposes of investing, gaining deeper understanding of businesses, or telling a story about an industry trend. Thinknum trawls the internet to collect data on over 400,000 public and private companies across the globe every day, generating huge amounts of data. Their intuitive web-based tool will allow fellows to easily navigate huge volumes of data to gather insights, create correlations, and generate visualisations to share with other fellows in seconds.
 

Thinknum Data

Thinknum tracks thousands of websites capturing and indexing vast amounts of public data, indexes it and maps it back to individual companies. In the full Thinknum library there are over 20 datasets, each containing dozens of metrics updated daily.
 

3 Datasets

Thinknum is providing The Data Incubator with access to three real world datasets for our fellows to analyze and explore. In terms of potential projects, there are virtually limitless options for each dataset and most of them haven’t been worked through. If you take a look at the number of columns for each, you will get a sense just how many questions one can ask. Included are a few initial suggestions though.

Enter your email to receive the data sets and get started on your own data science projects:



 

    Job Postings:

  • This database tracks individual job postings on corporate websites, allowing researchers and data scientists to view overall hiring plans of a company overtime. As well as historical data, users explore in a great detail what types of positions a company is looking to fill, where a company is looking to grow geographically, and in what specific product/business lines the company is looking to expand the most.
  •  
  • Using this database,Thinknum Media journalists were able to show that the number of job listings at Apples new headquarters containing the word “Siri” had spiked in the recent weeks. They also saw that almost all of the 161 jobs related to Siri, 154 were in software engineering. From their findings, Thinknum journalists were able predict Apple’s efforts to concentrate on Siri development an entire week before the plans were officially announced by Apple.
  •  
    • Project suggestions:

    • In which geographies are tech companies hiring the most engineers, blockchain developers, etc?
    • Using job openings data, explore how banks are shifting their strategy to heavier reliance on technology/heavier regulatory burden.

 

    Linkedin Profiles:

  • This database tracks and records the number of employees across companies on daily basis and provides real time insight into how aggressively a company is growing vs its own plans and within its industry.
  •  
  • Here, Thinknum Media looked at the LinkedIn profile data for Vox and Buzzfeed employees, as well as job listings data. The journalists also looked at company survey data from Glassdoor, and found that the numbers of Vox employees who had a positive outlook on the future of their company had fallen almost 20%. By combining all these datasets, they found that the number of open job listings was falling, the number of people reporting to be employees of the companies had fallen, and coupled with the findings from the Glassdoor surveys – showed a picture of slowing company growth for both Vox and Buzzfeed.
  •  
    • Project suggestions:

    • Which companies have delivered on their strategic expansion plans (filled the most job openings that showed up on Linkedin)?
    • Find companies where hiring is most predictive of stock prices.

 

    Facebook Followers:

  • Social media platforms like Facebook provide a myriad of data points about companies such as customer traction, foot traffic, and brand awareness among others.
  •  
  • By analyzing Facebook ‘check in’ data, Investment Bank Cowen used this data to track foot traffic to Chipotle starting in 2017, and thus predict falls in Chipotle stock performance as well. This metric of analyzing footfall became a staple for fast food restaurant research analysts as discussed by Yahoo Finance article.
  •  
    • Project suggestions:

    • Compare companies with highest volatility of “talking about count” — who they are – and use any information online to see if this metric overlaps with highly publicized events and marketing campaigns.
    • Facebook check-ins as a metric for foot traffic for restaurant, hospitality and retail businesses. Who are the winners in attracting customers to physical locations.
    • Facebook followers and which companies are the most successful at growing social media traction

 
While building your own project cannot replicate the experience of fellowship at The Data Incubator (our Fellows get amazing access to hiring managers and access to nonpublic data sources) we hope this will get you excited about working in data science. And when you are ready, you can apply to be a Fellow!

Got any more data sources? Let us know and we’ll add them to the list!

 

Visit our website to learn more about our offerings:

 


A Study Of Reddit Politics

This article was written for The Data Incubator by Jay Kaiser, a Fellow of our 2018 Winter cohort in Washington, DC who landed a job with our hiring partner, ZeniMax Online Studios, as a Big Data Engineer.

 

The Question

The 2016 Presidential Election was, in a single word, weird. So much happened during the months leading up to November that it became difficult to keep track with what who said when and why. However, the finale of the election that culminated with Republican candidate Donald J. Trump winning the majority of the Electoral College and hence becoming the 45th President of the United States was an outcome which at the time I had thought impossible, if solely due to the aforementioned eccentric series of events that had circulated around Trump for a majority of his candidacy.

Following the election, the prominent question that could not leave my mind was a simple one: how? How had the American people changed so much in only a couple of years to allow an outsider hit by a number of black marks during the election to be elected to the highest position in the United States government? How did so many pollsters and political scientists fail to predict this outcome? How can we best analyze the campaigns of each candidate, now given hindsight and knowledge of the eventual outcome? In an attempt to answer each of these, I have turned to a perhaps unlikely source.

Continue reading


4 Data Science Projects That We Can’t Get Enough Of

LI3Y5U376XAt The Data Incubator we run a free advanced 8-week fellowship for PhDs looking to enter the industry as data scientists.  

As part of the application process, we ask potential fellows to propose and begin working on a data science project to highlight their skills to employers.  Regardless of whether you’re selected to be a fellow, this project will be instrumental in attracting employer interest and highlighting your skills.  Here are some projects that we would love to see, and that we hope to see you take on as well.

 

Multi-Axial Political Analysis  

We often think of American politics in terms of a single axis: left versus right, democrat versus republican.  In reality, the parties are composed of varying factions with different identities and political priorities and American politics is actually broken along multiple axes: foreign policy, social issues, regulation, social spending, education, second amendment, just to name a few.  Continue reading


JUST Capital and The Data Incubator Challenge

Data Science For Social Good (1)

 

Today, we’re excited to announce that we’re teaming up with JUST Capital to help crowd-source data science for social good.  The Data Incubator offers a free eight-week data science fellowship for those with a PhD or a masters degree looking to transition into data science.  As a part of the application process, students are asked to submit a data science capstone project and the best students are invited to work on them during the fellowship.  JUST Capital is helping providing data and project prompts to harness the collective brainpower amongst The Data Incubator fellows to solve these high-impact social problems.

  • These projects focus on applied data science techniques with tangible impacts on JUST Capital’s mission.
  • The projects are open ended and creativity is encouraged. The documents provided, below, are suitable for analysis, but one should not shy in seeking out additional sources of data.

JUST Capital is a nonprofit that provides information and rankings on how large corporations perform on issues that matter most to the public. We give individuals a voice on what really matters to them, and evaluate how companies perform on those issues. By providing the right knowledge and making it easy to access and understand, we believe capital will flow to corporations that are more JUST, ultimately leading to a balanced business world that takes into account human needs that are so often neglected today. The meaning of JUST is defined by the American public as fair, equitable and balanced. In 2016, JUST Capital surveyed nearly 4,000 Americans from all regions and walks of life, in its second annual Poll on Corporate America. The issues identified by the public form the basis of our benchmark — it is against these Drivers and Components that we measure corporate performance. The most important factors broadly relate to employees, customers, company leadership, the environment, communities and investors.

Continue reading


Data Sources for Cool Data Science Projects Part 6

startup-593324_960_720Links to Part 1Part 2Part 3Part 4, Part 5

At The Data Incubator, we run a free eight week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are a few cool public data sources you can use for your next project:

Continue reading


Data Science Project Ideas

We love data science and cool data science projects.  If you’re a applying for our free data science fellowship and looking to propose a data science project, here are four project ideas.

startup-849805__340GitHub

GitHub is a great source of data on how engineers write code.  A recent post found discrimination against Pull Requests submitted by women on GitHub, although perhaps that study could have been better.  But there are lots of other ideas to pursue.  We can easily learn an n-gram classifier on whether a line of code is a comment or not and search for commented out code.  Are repos by academics more likely to have commented out code?  Are they more likely to violate lint rules?  Additionally, it would be interesting to analyze commits that are in response to bug fixes to predict in which lines of code bugs are more likely to occur.

Continue reading


Data Sources for Cool Data Science Projects: Part 5

computer-1185626_960_720Links to Part 1Part 2Part 3, Part 4

At The Data Incubator, we run a free eight week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are some more cool public data sources you can use for your next project:

Continue reading


Data Sources for Cool Data Science Projects: Part 4

student-849825_960_720Links to Part 1Part 2Part 3

At The Data Incubator, we run a free eight week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are some more cool public data sources you can use for your next project: Continue reading


Data Sources for Cool Data Science Projects: Part 3

student-849822_960_720Links to Part 1, Part 2

At The Data Incubator, we run a free eight week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are some more cool public data sources you can use for your next project: Continue reading


Data Sources for Cool Data Science Projects: Part 2

startup-849804_960_720Link to Part 1

At The Data Incubator, we run a free eight week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are some more cool public data sources you can use for your next project: Continue reading