Data are becoming the new raw material of business
The Economist

How to Catch ‘Em All: Alumni Spotlight on Yina Gu

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists.  Yina was a Fellow in our Winter 2017 cohort who landed a job with one of our hiring partners, Opera Solutions

Tell us about your background. How did it set you up to be a great data scientist 

I received my PhD degree from The Ohio State University majoring computational chemistry. For my PhD research, I developed multiple predictive models and published web servers to solve various biophysics problems using machine learning and statistical methods in Python, R and Matlab. The data science skills and experiences I gained in my 5 years of PhD not only allow me to solve the fundamental scientific problems effectively and efficiently, but also enable my transition from academia to industry to solve the real-world challenges.

What do you think you got out of The Data Incubator?

The 8-weeks intensive training at The Data Incubator really helped me to go deeper into data science field and get fully prepared for the essential skills to work in a big data industry with the cutting-edge analytics techniques, including programming, machine learning, data visualization as well as business mindset. Last but not least, I believe the networking with other very talented fellows are the most valuable thing I got out of TDI!

What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?

Learning and advancing your programming and data mining skills in Python and SQL. Be familiar with statistics and basic machine learning methods. There are many useful online resources to start with. After feeling comfort with those tools, look for interesting dataset to play with and prepare for a capstone project to solve a data-driven business problem.



Could you tell us about your Data Incubator project?

For my capstone project, I developed a multi-functional web app, PokéAttractor to help millions of Pokémon GO players to catch Pokémons as well as to provide business owner unprecedented insights about their customers. In the first part, I trained a k-Nearest Neighbor classifier on over 293,000 pokemon sightings dataset from Kaggle. My app predicts the probabilities for up to 10 most possible Pokémons appear at a given location and time. In the second part,  I scrapped over 275,000 tweets and analyzed the social media impact of Pokémon GO using machine learning, natural language processing, time series forecasting, and geographic data analytics techniques.

How did you come up with the idea for the project?

Pokémon GO is officially the biggest mobile game in the US history. I am one of millions of Pokémon trainers who walk around daily but just wonder how to catch’em all in a more effective and efficiently way. Meanwhile, I realize that businesses owners also wonder how to capitalize this massive opportunity and drive huge amounts of foot traffic and conversions. I came up the idea to build a multi-functional web app to help both players to catch Pokémon and business owner to attract fans.

What technologies did you use and what skills did you learn at TDI that you applied to the project?

I learned and applied a lot of useful tools and techniques at TDI to my capstone project, which include web-scraping, SQL database, Pandas, machine learning in Scikit-Learn (k-Nearest Neighbor classification, k-means clustering), NLP with NLTK library, timeseries analysis, data visualization in Javascript D3, Folium and Bokeh, web development with Bootstrap, CSS, Flask and Heroku

You can learn more about Yina’s project and see some of her findings here:

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Back to index