At The Data Incubator we run a free eight-week Data Science Fellowship Program to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring Data Scientists. Brian was a Fellow in our spring cohort who landed a job with one of our hiring partners, Capital One, after completing his postdoc at Columbia and NYU.
Tell us about your background. How did it set you up to be a great Data Scientist?
What do you think you got out of The Data Incubator?
Second, I learned a lot about the landscape of the industry, which helped me figure out what I was looking for in a job. Over the course of the program, I was introduced to companies that weren’t on my radar and found myself very interested in them. Once you are in The Data Incubator program, there are too many partner companies on the list to be able to seriously pursue them all. You have to know what you are looking for in order to focus your efforts, and I definitely gained that focus over the course of the program.
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
What is your favorite thing you learned at The Data Incubator?
Could you tell us about your Data Incubator Capstone project?
I was able to identify clusters of tweets that occur near the same time, contain keywords related to the same team, and contain either the word “touchdown” or “field goal.” This allows one to accurately find the final score of the game using only the contents of the Twitter stream. These clusters can also be used to estimate the time when touchdowns or field goals occur.
I also used the K-means clustering algorithm along with Natural Language Processing techniques to divide the tweets into three clusters based on the frequency of popular words in each tweet. Each of the clusters are well delineated, and we can use the dominant words in each cluster to see that they represent tweets about the Seahawks or Patriots, tweets about the commercials, and tweets about Katy Perry and the halftime show. Moreover, histograms in time for each cluster demonstrate that in the first quarter, viewer attention is roughly split between the game itself and the commercials, but attention shifts to the commercials in the second quarter. When the halftime show starts, attention shifts dramatically to Katy Perry and the halftime show, and continues to dominate the rest of the game, with the exception of a spike at 145 minutes when the Seahawks score, and at the very end of the game. From this, we can conclude that commercials airing in the second quarter are most likely to hold viewers’ attention and should be worth more.
I also used a “bag of words” model to estimate how positive or negative tweets about a given brand were. I do this to compute a measure of the Twitter audience’s response to fifteen of the most-tweeted commercials. I then compare this with ratings computed by survey and published by USA Today, and find a significant correlation. Of course, the Twitter sentiment score has the advantage of being available in real time.
For a slightly more detailed writeup, please visit superbowl.brianfarris.me.
Visit our website to learn more about our offerings:
- Data Science Fellowship – a free, full-time, eight-week bootcamp program for PhD and master’s graduates looking to get hired as professional Data Scientists in New York City, Washington DC, San Francisco, and Boston.
- Hiring Data Scientists
- Corporate data science training
- Online data science courses: introductory part-time bootcamps – taught by our expert Data Scientists in residence, and based on our Fellowship curriculum – for busy professionals to boost their data science skills in their spare time.