Data are becoming the new raw material of business
The Economist


Predicting Which Bills Will Become Laws, with Data Science: Alumni Spotlight on Michael Yen


At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists.  Michael was a Fellow in our Winter 2017 cohort in San Francisco, who landed a job with one of our hiring partners, Cerego

 

Tell us about your background. How did it set you up to be a great data scientist?

My formal education is in physics, but I’ve also done a lot of my research at UC Berkeley’s Computer Science department. I cherish both of these backgrounds equally since I learned how to “do science” from physics and build really cool things from computer science. I think this is a winning combination for a data scientist since a lot of companies are looking for scientist who can write code.

 

What do you think you got out of The Data Incubator?

Two things, and both are equally important. First, TDI gave me the exposure to their hiring partners that I just couldn’t get on my own. Before starting the fellowship I had spent over 10 months applying to jobs on my own with a call back rate of 3%. At TDI, my call back rate shot to 90% and I even began fielding unsolicited interviews. I think having the TDI mark of approval certainly moved me up in the stack of resumes. Secondly, I expanded my professional network by becoming close friends with twelve other fellows who are all going to be doing fantastic things in the future.

 

What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?

If you are already comfortable with the fundamental data science tools and techniques, then I would suggest focusing on building a polished capstone project. This is going to be the one thing for you to point at and say “yes, I built that”. Having a polished capstone got me much further during interview discussions than anything else on my resume.

 

What’s your favorite thing you learned while at The Data Incubator?

Failure is key, it is built into the heart of all science, data science included. As a data scientist you will be pushing the limits of what is possible and failure is an inevitable side effect. However, a good data scientist is going to iteratively improve off of each failure. At TDI, I learned through failure with the support of the amazing instructors there.

 

Describe your Data Incubator Capstone Project

I created a website that makes predictions on the outcome of proposed bills in the US senate, which means it is predicting which bills will become law. In order to do this I needed the historical voting record for each senator, which also allowed me to create graphical summaries of each senator’s voting behavior.

 

How did you come up with the idea for the project?

Through trial and error, this was not my first idea. I wanted a project that would make me a better NLP practitioner while having a motivation that is easy to explain to potential employers and also relevant to a lot of people. From what I observed, having a relevant project that will get other people fired up is the most important aspect of picking a good capstone project.

 

What technologies did you use and what skills did you learn at TDI that you applied to the project?

I used a lot of different technologies. The beautiful thing about my capstone project is that I got to build a complete application, so I got to use the entire data scientist toolbox while pretending to be an engineer. Specifically my project combines web scraping, databases, text mining, machine learning, and web development. In a large company, all of these tasks would be distributed to at least three different departments, but now that I have hands-on experience with all of it I feel comfortable discussing these points with experts.

 

What was your most surprising or interesting finding?

I most enjoyed looking at word clouds I created describing each senator’s voting behavior. Senators vote a lot and on a wide variety of things, so having the visual summary was really helpful in learning about each senator.

 

Describe the business application for this project (how could a company use your work or your data)

Is your business effected by US laws? Would it be helpful for you to get a warning about impending laws, either to adapt your business practices or to lobby for or against these laws?

 

Do you have an interesting visualization to share?

Word cloud describing democratic NY senator Chuck Schumer’s voting behavior. I had an “Aha, I did it!” moment when looking at this word cloud because the words “same-sex marriage” and “abortion” are in the against cloud, which is surprising for a democrat from NY, however is consistent with Schumer’s platform.

 

Tell us a little about your new position at Cerego

I landed the perfect job at Cerego. Cerego’s core product is relevant to my academic background and I get to keep working on interesting NLP projects. It is the perfect blend of being able to rely on my past experiences but also pushing me to learn new things. My hope is that in a few years I will be calling myself an NLP expert.

 

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone

Back to index