Data are becoming the new raw material of business
The Economist

Machine Learning and Modeling the Stock Market: Alumni Spotlight on Michael Skarlinski

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Michael was a Fellow in our Winter 2016 cohort who landed a job with one of our hiring partners, Schireson Associates.

 

Tell us about your background. How did it set you up to be a great data scientist? 

My PhD work was in computational materials science, where I worked with reactive molecular dynamics simulations. The field is totally simulation based, and typically requires high performance computing resources. Running these simulations helped build my chops for working with parallel systems and command line tools. The software required familiarity with some powerful languages and APIs like C and CUDA. Learning those definitely helped my understanding of Python once I converted to using it.

Toward the third year of my PhD I got really interested in machine learning. I started using scikit-learn to predict different aspects of simulations I worked on. These projects became a large part of my thesis and contributed to choosing The Data Incubator as a next step in my career.

Continue reading


Calculating the Perfect Algorithm: Alumni Spotlight on Sumanth Swaminathan

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Sumanth was a Fellow in our Winter 2016 cohort who landed a job with one of our hiring partners, Revon.

 

Tell us about your background. How did it set you up to be a great data scientist?

I did my bachelors degree in Chemical Engineering at the University of Delaware and my PhD in Applied Mathematics at Northwestern University.  After some postdoctoral work between Northwestern and Oxford University, I went into industry as a quantitative consultant for W.L. Gore & Associates.  For the past 4 years, I have spent most of my time delivering technology solutions at W.L. Gore, teaching mathematics at the University of Delaware, and performing and teaching Indian Classical Music.  

On the question of what makes a strong data scientist, I think that the better practitioners in the field tend to be hypothesis driven, strong critical thinkers with hard skills in statistics, programming, mathematics, and hardware.  Hence, my background in engineering and mathematics, my consulting experience, and my years of teaching probably contributed the most to my success.  

 

Continue reading


Making (LinkedIn) Connections: Alumni Spotlight on Xia Hong

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Xia was a Fellow in our Summer 2015 cohort who landed a job at LinkedIn.

 

Tell us about your background. How did it set you up to be a great data scientist? 

I am an experimental physicist in soft condensed matter by training in my PhD program at Emory University. There are three things that I think have helped me a lot to become a good data scientist:
1). The solid background in physics and math that I obtained back in my college. The knowledge itself isn’t necessarily reflected in my day to day work now. However, the training of logical thinking and critical thinking is really beneficial in a long run.
2). Persistence in finding root causes. The massive amount of data can easily leave you feeling swamped. I believe that always asking why until you get to the true cause of the problem is really essential. Sometimes, the insights are hidden behind and need our motivation to dig them out. No matter if it’s driven by natural stubbornness or original curiosity, I find the persistence usually a great help for walking the last mile to the final discovery.
3). Passion for solving problems using data. There is a joint program in our department where I took computer science courses for a masters degree. In the course projects, I started to find my passion in solving practical problems using data science approaches. Now I am working on product analytics and I cannot imagine how tough it could be without that passion and curiosity about what we can do to improve it.

Continue reading


From Eco-Friendly Batteries to Random Forests: Alumni Spotlight on Matt Lawder

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Matt was a Fellow in our Winter 2016 cohort who landed a job with one of our hiring partners, 1010data.

 

Tell us about your background. How did it set you up to be a great data scientist? Matt Lawder

I defended my PhD dissertation at Washington University in St. Louis, a few weeks before coming to The Data Incubator. I was part of the MAPLE lab in Energy, Environmental, and Chemical Engineering (I know, it’s a mouthful). Our lab focused on physics-based electrochemical modeling, mostly geared toward Li-ion batteries.

For my main dissertation project, I studied how batteries age under different real-world cycling patterns. Most cycle life estimates for a battery are based on simple constant charge and constant discharge patterns, but lots of applications (such those experienced by batteries in electric vehicles or coupled to the electric grid) do not have simple cycling patterns. This variation effects the life of the battery.

Both through model simulation and long-term experiments, I had to analyze battery characteristics over thousands of cycles and pick out important features. This type of analysis along with programming computational models that were used to create these data sets helped give me a background to tackle data science problems.

Additionally, I think that working on my PhD projects allowed me to gain experience in solving unstructured problems, where the solution (and sometime even the problem/need) are not well defined. these type of problems are very common, especially once you get outside of academia. 

Continue reading


Five Tips for Future-proofing Your Business with Data Science

We all want to be future-proof: not just prepared for unforeseen developments but positioned well to take advantage of them. Having a flexible, adaptable, and scalable technology stack is a great way to get to achieve that goal when it comes to being able to leverage data science effectively. Here are five ideas I personally think it’s crucial to keep in mind when building out your own functionality:

1. Your pipeline is only as good as its weakest link.

It’s great that your predictive modelers have come up with a thousand new features to incorporate, but have you asked your data engineers how that will affect the performance of backend queries? What about your data collection and ingestion flow? Maybe your team is frothing at the mouth for an upgrade to Spark Streaming to run their clustering algorithms in real time, but your frontend will lose responsiveness if you try to display the results as fast as they come in. The key here is not to get sucked into the hype of “scaling up” without fully recognizing the implications across your entire organization and what new demands will be placed on all those moving parts. Continue reading


The 3 Things That Make Technical Training Worthwhile

Managers understand that having employees who understand the latest tools and technologies is vital to keeping a company competitive. But training employees on those tools and technologies can be a costly endeavor (corporations spent $130 billion on corporate training in general in 2014) and too often training simply doesn’t achieve the objective of giving employees the skills they need.

At The Data Incubator, we work with hundreds of clients who hire PhD data scientists from our fellowship program or enroll their employees in our big data corporate training. We’ve found in our work with these companies across industries that technical training often lacks three important things: hands-on practice, accountability, and breathing room. Continue reading


Automatically Generating License Data from Python Dependencies

We all know how important keeping track of your open-source licensing is for the average startup.  While most people think of open-source licenses as all being the same, there are meaningful differences that could have potentially serious legal implications for your code base.  From permissive licenses like MIT or BSD to so-called “reciprocal” or “copyleft” licenses, keeping track of the alphabet soup of dependencies in your source code can be a pain.

Today, we’re releasing pylicense, a simple python module that will add license data as comments directly from your requirements.txt or environment.yml files.

Continue reading


Calculus Is So Last Century

Training in statistics, linear algebra and algorithmic thinking is more relevant for today’s educated workforce.

This article is written by Data Incubator founder Michael Li and Columbia University Professor Allison Bishop. It was originally featured on The Wall Street Journal

Can you remember the last time you did calculus? Unless you are a researcher or engineer, chances are good it was in a high-school or college class you’d rather forget. For most Americans, solving a calculus problem is not a skill they need to perform well at work.

This is not to say that America’s workforce doesn’t need advanced mathematics—quite the opposite. An extensive 2011 McKinsey Global Institute study found that by 2018 the U.S will face a 1.5 million worker shortfall in analysts and managers who have the mathematical training necessary to deal with analysis of “large data sets,” the bread and butter of the big-data revolution.

The question is not whether advanced mathematics is needed but rather what kind of advanced mathematics. Calculus is the handmaiden of physics; it was invented by Newton to explain planetary and projectile motion. While its place at the core of math education may have made sense for Cold War adversaries engaged in a missile and space race, Minute-Man and Apollo no longer occupy the same prominent role in national security and continued prosperity that they once did. Continue reading


Leveraging a Physics Background to Achieve Data Science Success: Alumni Spotlight on Andrew Yue

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Andrew was a Fellow in our Fall 2015 cohort who landed a job with one of our hiring partners, IST Research.


Tell us about your background. How did it set you up to be a great data scientist?

Andrew YueI’m an experimental nuclear physicist by training. I had the great privilege to perform research at the National Institute of Standards and Technology (NIST) for nine years. NIST is a Department of Commerce laboratory that specializes in the science of measurement (metrology) and its application to industry. My research focused on precision measurement techniques with neutrons to advance our understanding of fundamental physics and to improve industry services offered by my group.

There are two things that I think have helped me get to where I am:

1) Like most physicists, I think I have a natural propensity to tinker with things well outside my expertise. Taken too far, this can be a bad thing. But, applied appropriately, it’s exactly the kind of attitude needed to learn and keep up with the ever-changing field of data science.

2) Having focused on precision measurements in my research, I’ve seen time and time again how much the environment in which I performed my experiments impacted the data and informed my analysis. The parallel to data science is that my training has taught me that a deep understanding of the problem and how the data was collected are what allow you to ask the right questions and produce meaningful results.  Continue reading


Mindset Shift: Transitioning from Academia to Industry

Special thanks to Francesco Mosconi for contributing this post.

 

Transitioning from Academia to Industry can be difficult for a number of reasons.

  1. You are learning a new set of hard skills (data analysis, programming in python, machine learning, map reduce etc.), and you are doing this in a very short time.
  2. You are also learning new soft skills, which also require practice to develop.
  3. A mindset shift needs to occur, and your success in industry will strongly depend on how quickly this happens.

 

Learn to prioritize

When your goal is knowledge, like in Academia, it is okay to spend as much time as you want learning a new concept or completing a project. On the other hand, during this program and on the job, you will often find that there is not enough time to deal with all the tasks and assignments required of you. In a situation where you have more on your plate than you can handle, it is essential to develop the ability to decide which tasks require execution, which can be postponed, and which ones can be simply ignored. There are many frameworks and approaches to prioritization, famous examples including the Getting Things Done system and the Eisenhower Method. Most methods are good, and eventually you will find your favorite one; however, they only work if consistently applied. In other words, it is less important which prioritization method you choose but it is fundamental that you prioritize your day and your week according to the specific goals you are to accomplish. Continue reading