At The Data Incubator we run a free eight week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Jeiran was a Fellow in our winter cohort who landed a job with one of our hiring partners, Chartbeat. Here’s her story:
What are you doing now and how did you get there?
I am a research data scientist at Chartbeat, where I work on scalable machine learning algorithms that can be used to understand the media landscape and help publishing companies monetize their user engagement.
At NYU, I studied the brain and its blood circulatory network through MRI data. I worked on spatio-temporal noise reduction techniques and developed models to detect and quantify different patterns of blood flow in different regions of the brain. During my graduate years, I learned to convey results visually and build concise narratives. Every sophisticated statistical tool or machine learning algorithm stems from a simple intuitive understanding of the problem, which can be used to construct your narrative. I use this skill every day at Chartbeat.
A rough answer to the right question is preferable to an exact answer to the wrong question. In other words, statistics is only a tool that can be used to answer questions, but it cannot always tell us what the question is. The imperative is to ask the right questions, which are informed by your data. I learned that data comes first, and inference and modeling follow. I also learned that every estimate should come equipped with a measure of uncertainty. I learned this by working with real-world data. My research had the right balance of theory and application for my current role at Chartbeat.
What do you think you got out of The Data Incubator?
The Incubator provided me with the comfortable physical and mental space to focus on and realize my project. This project did not fit the academic timeline. It had to be done quickly, and I had to limit my objectives but also choose interesting ones. At the Incubator, I received continual advice as the project was progressing, which helped me choose the right objectives and equip my project with a well-constructed narrative. I showcased this project to multiple employers, which helped me bond with them on a more personal level. The Incubator extensively prepared us for job interviews, and we had regular interview practice sessions.
Last but not the least, I established a great network — composed of not just the Incubator Fellows in my cohort but also previous Fellows who have been data scientists at top firms and startups. We also met many chief data scientists at different companies. They came to the Incubator for panel discussions and happy hours on a regular basis, and we met them up-close and asked them all we wanted to ask. This network facilitated the job hunt and turned it into a much more pleasant experience.
Could you tell us about your Incubator project?
The beginning of the twenty first century is marked with a war that killed 151,000 to over one million humans and cost over six trillion dollars: the Iraq war. This war is not only important due to its staggering costs (both human and financial) but also on account of its publicly available and well-documented daily records from 2004 to 2010.
These documents provide a very high spatial and temporal resolution view of the conflict. For example, I extracted from these government memos the number of violent events per day in each county. Then, using latent factor analysis techniques, e.g. non-negative matrix factorization, I was able to cluster the top three principal war zones. Interestingly these principal conflict zones were areas populated by the three main ethnoreligious groups in Iraq. Moreover, adaptive Bayesian smoothing approaches revealed statistically significant jumps in the underlying temporal trends within each cluster — the so-called spike alert days. These spike alert days coincided with well-documented changes in the way the war was handled. Although the algorithms used to analyze the war memos were blind to the historical, geographical, and political context of the conflict, they were able to shed light on decisions that exacerbated it.
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
Get started with tools of the trade before you apply. Take up a mini project, use an open data set (the Incubator has two blog posts with useful links) and restrict yourself to Python. Familiarize yourself with databases and database query languages, e.g. PostgreSQL. Start your github account by committing your mini projects. Definitely finish the Incubator 12-day preparatory program and complete the exercises. Before you start with the fellowship, choose your project and your dataset. Listen to Michael and attend the lectures and all the meetings. [Editor’s Note: for suggestions on data sources, check out our posts here and here.]
Learn more about The Data Incubator here.