At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Hernan was a Fellow in our Fall 2015 cohort in New York City who landed a job with our hiring partner, 1010data, in New York City.
Tell us about your background. How did it set you up to be a great data scientist?
My background is in complex network and statistical physics. My PhD studies focused mostly on theoretical modeling of networks and their topological properties. Later on, during my postdoc, I worked primarily on using those networks and graph theory techniques to analyze real-world data.
What do you think you got out of The Data Incubator?
I think the most important tool I learned is Machine Learning. Before coming to The Data Incubator I only knew conceptually what ML was. This fellowship gave me a much deeper understanding of the different ML techniques, and maybe more importantly hand-on experience using the different ML tools on real-world data.
I also learned a large number of tech tools, such as Hadoop and MapReduce which are essential for the analysis of very large amounts of data.
Last, but not least, the Incubator helped me to have a more business oriented thinking of problems. In a business environment conclusions must be concrete, translate into actionable items, and easily communicable. TDI helped me transition from an academic view of problems to a business/actionable approach.
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
Take some introductory course (maybe some basic free online course) on Machine Learning before starting the course. Brush up on probability and stats.
What is your favorite thing you learned at The Data Incubator?
Machine learning is my favorite thing I learned. Before TDI I knew ML very superficially and the fellowship gave me hand-on experience. Other than that I really enjoyed learning MapReduce, Hadoop, and all related topics, like Spark.
Could you tell us about your Data Incubator Capstone project?
I used NYC taxi trip data to provide an indication of the activity at different locations and at different times within New York City. To give a concrete example: let’s say you want to start a new business. You will want to be able to find the locations of highest “activity” or “mobility” because a high flow of people might introduce more customers to your business. Taxi pickups and dropoffs are a great way to measure these centers of high activity. Using statistical tools and complex networks techniques I found those centers of high activity and created an app to vizualize the results.
And lastly, tell us about your new job!
As a Data Scientists at 1010data, I’ve developed machine learning algorithms to predict demographic information based on purchase patterns. I’ve also managed collaborative work involving multiple departments (external data providers, Data Acquisition, Sales, System Developers, upper management) in driving new datasets from raw data, through the R&D process, and all the way through the final client-facing product.