At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Ryan was a Fellow in our Fall 2016 cohort who landed a job at Austin based startup OK Roger.
My PhD and postdoctoral work was in the field of statistical mechanics with a strong emphasis on the design of new colloidal materials. Such research has required me to develop a hybrid set of strong analytical math and computational skills—of which have been extremely useful for bridging into Data Science. From the deeper level understanding afforded by this mixed skill set, I feel well posed to leverage existing technologies as well as develop novel alternatives. As an example of the latter, my forays into the fundamentals of Machine Learning helped me to develop a super-computing application capable of inferring the inter-particle forces an experimentalist must engineer to elicit a desired material property. This required the development of both an analytical framework and an underlying large scale molecular simulation element. Combining these general technical skills with what I learned at The Data Incubator, I feel well poised to be successful in a Data Science position
What do you think you got out of The Data Incubator?
Firstly, I cannot emphasize how impressive the breadth and depth of the material is that one learns in only eight weeks. I have gained practical hands on experience with topics ranging across fundamental algorithms, machine learning, web scraping, time series, and wrangling large data sets using Spark and MapReduce. As a corollary to this though, the program requires a mindset shift away from traditional academics where the goal is to achieve the best, publication quality result or model and time constraints are less stringent. Specifically, a much more pragmatic perspective is required for success in the program, namely, judging when the product is good enough for the goal at hand. So in conclusion, while the technical course material is absolutely phenomenal, I must give a special mention to this soft skill teaching (as well as the many others).
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
Develop a strong practical foundation in Data Science before getting into the “weeds”, so to speak. This can seem foreign coming from academics; however, it will be a much better use of time in preparing for The Data Incubator. Firstly, achieve proficiency with Python as this is the current home base for Data Science. Secondly, start developing a portfolio of Data Science projects. Kaggle is a great place to get started while getting a practical overview of the types of problems that one might encounter in Data Science. Lastly, do not start with the “fancy” machine learning tools, no matter how tempting this might be. Few problems can only be solved with something like neural networks. Start by exploring, and actually using, simpler tools like Linear and Logistic Regression, Random Forests, and Support Vector Machines—these will most likely form the foundation of your toolbox.
What is your favorite thing you learned at The Data Incubator?
I truly enjoyed (and valued) learning general strategies to handle sizable data sets. Topics included using Spark or MapReduce to extract key features from vast data sets to building machine learning models in a memory efficient manner. With the rapid, and ever growing, size of a typical data set, I believe such skills are truly invaluable to any Data Scientist.
Could you tell us about your Data Incubator project?
For my capstone project I developed an app that mines Twitter for Tweets that are discussing a disaster level event. Given the popularity and accessibility of social media, particularly due to smart phone technology, this may be the fastest way to discover such critical information. Underlying the app is a predictive model that assigns a disaster relevance probability to each mined Tweet. Additional sorting and aggregate statistical analysis is also performed to help expedite the discovery of the most pertinent information. Potential users of such a tool include various government agencies and disaster relief organizations.