Data are becoming the new raw material of business
The Economist

Learning to Think Like a Data Scientist: Alumni Spotlight on Ceena Modarres

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists.  Ceena was a Fellow in our Winter 2017 cohort who landed a job with our hiring partner, Capital One

Tell us about your background. How did it set you up to be a great data scientist 

I received my M.S. in Reliability Engineering from the University of Maryland. In Reliability Engineering, a practitioner will assess/prevent the failure of a physical system (car, computer, etc.). Many of these approaches tend to be statistics and data driven and much of the modern research in the field (including my own) uses Machine Learning to improve relevant analyses. However, when I was done with my Master’s, I realized I was more passionate about the Data Science/Machine Learning than the engineering side. So when I heard about The Data Incubator, it seemed like a great fit.

What do you think you got out of The Data Incubator?

As a recent MS graduate who had never worked before, the most important thing I learned at The Data Incubator was how to think like a Data Scientist. Since Data Science is still a new field, many positions require a unique and not necessarily homogenous set of skills. The Data Incubator not only teaches its students all the necessary technology, but it teaches them how to think about Data Science problems in a systematic and effective way. TDI also provided a network of possible employers and former alumni that proved valuable for my job search.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone

MATLAB vs. Python NumPy for Academics Transitioning into Data Science

h5g3etjnacmazg8oq17z_400x400

At The Data Incubator, we pride ourselves on having the most up to date data science curriculum available. Much of our curriculum is based on feedback from corporate and government partners about the technologies they are using and learning. In addition to their feedback we wanted to develop a data-driven approach for determining what we should be teaching in our data science corporate training and our free fellowship for masters and PhDs looking to enter data science careers in industry. Here are the results.

This technical article was written for The Data Incubator by Dan Taylor, a Fellow of our 2017 Spring cohort in Washington, DC. 

 

For many of us with roots in academic research, MATLAB was our first introduction to data analysis. However, due to its high cost, MATLAB is not very common beyond the academy. It is simply too expensive for most companies to be able to afford a license. Luckily, for experienced MATLAB users, the transition to free and open source tools, such as Python’s NumPy, is fairly straight-forward. This post aims to compare the functionalities of MATLAB with Python’s NumPy library, in order to assist those transitioning from academic research into a career in data science.

MATLAB has several benefits when it comes to data analysis. Perhaps most important is its low barrier of entry for users with little programming experience. MathWorks has put a great deal of effort into making MATLAB’s user interface both expansive and intuitive. This means new users can quickly get up and running with their data without knowing how to code. It is possible to import, model, and visualize structured data without typing a single line of code. Because of this, MATLAB is a great entrance point for scientists into programmatic analysis. Of course, the true power of MATLAB can only be unleashed through more deliberate and verbose programming, but users can gradually move into this more complicated space as they become more comfortable with programming. MATLAB’s other strengths include its deep library of functions and extensive documentation, a virtual “instruction manual” full of detailed explanations and examples.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone

Taking on Data Science with Mathematics: Alumni Spotlight on Brian Munson

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists.  Brian was a Fellow in our Winter 2017 cohort who landed a job with Quantworks

Tell us about your background. How did it set you up to be a great data scientist 

I was a research mathematician and professor before deciding on a career change. Having a deep knowledge of math really helps me understand how things work, whether it is the theoretical ideas behind fancy algorithms or reading a piece of code and deciphering what it does.

What do you think you got out of The Data Incubator?

Confidence in my code-writing ability. A polished resume and an important talking point with employers in my capstone project.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone

Ranking Popular Deep Learning Libraries for Data Science

Gold  Blog
At The Data Incubator, we pride ourselves on having the most up to date data science curriculum available. Much of our curriculum is based on feedback from corporate and government partners about the technologies they are using and learning. In addition to their feedback we wanted to develop a data-driven approach for determining what we should be teaching in our data science corporate training and our free fellowship for masters and PhDs looking to enter data science careers in industry. Here are the results.

The Rankings

Below is a ranking of 23 open-source deep learning libraries that are useful for Data Science, based on Github and Stack Overflow activity, as well as Google search results. The table shows standardized scores, where a value of 1 means one standard deviation above average (average = score of 0). For example, Caffe is one standard deviation above average in Github activity, while deeplearning4j is close to average. See below for methods.

Results and Discussion

The ranking is based on equally weighing its three components: Github (stars and forks), Stack Overflow (tags and questions), and Google Results (total and quarterly growth rate). These were obtained using available APIs. Coming up with a comprehensive list of deep learning toolkits was tricky – in the end, we scraped five different lists that we thought were representative (see methods below for details). Computing standardized scores for each metric allows us to see which packages stand out in each category. The full ranking is here, while the raw data is here.
Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone

The APIs for Neural Networks in TensorFlow

By Dana Mastropole, Robert Schroll, and Michael Li

TensorFlow has gathered quite a bit of attention as the new hot toolkit for building neural networks. To the beginner, it may seem that the only thing that rivals this interest is the number of different APIs which you can use. In this article we will go over a few of them, building the same neural network each time. We will start with low-level TensorFlow math, and then show how to simplify that code with TensorFlow’s layer API. We will also discuss two libraries built on top of TensorFlow, TFLearn and Keras.

The MNIST database is a collection of handwritten digits. Each is recorded in a $28\times28$ pixel grayscale image. We we build a two-layer perceptron network to classify each image as a digit from zero to nine. The first layer will fully connect the 784 inputs to 64 hidden neurons, using a sigmoid activation. The second layer will connect those hidden neurons to 10 outputs, scaled with the softmax function. The network will be trained with stochastic gradient descent, on minibatches of 64, for 20 epochs. (These values are chosen not because they are the best, but because they produce reasonable results in a reasonable time.)

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone

Bringing Astronomy Down to Earth: Alumni Spotlight on Tim Weinzirl

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists.  Tim was a Fellow in our Spring 2017 cohort who landed a job with one of our hiring partners, First Republic Bank

Tell us about your background. How did it set you up to be a great data scientist 

My education includes a B.S. in Physics from Drake University and a Ph.D. in Astronomy from the University of Texas at Austin. After grad school, I went overseas for a Research Fellowship at the University of Nottingham. Astronomers do a lot of coding relative to other fields, and having been coding in Python since 2006 for work, I was very familiar with the Python SciPy stack. Since 2014, I have also been volunteering time to data science and software engineering projects for a people analytics startup. This was extremely useful because it provided references in industry who could vouch for my data science skills.

What do you think you got out of The Data Incubator?

I got several useful things out of The Data Incubator: Strategies for resume writing, experience building and deploying a live web application, and a comprehensive set of IPython notebooks that encapsulate the advanced features of scikit-learn, SQL, and big data tools (Hadoop, Spark).

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone

Using a convolutional neural network to identify anatomically distinct cervical types: Alumni Spotlight on Rachel Allen

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists.  Rachel was a Fellow in our Spring 2017 cohort and an instructor for our Summer 2017 cohort. 

My background is in neuroscience; specifically I studied how images are processed in the visual system of biological brains. 14903037435-rachel_kay_allenFor my capstone project I knew I wanted to use an artificial neural network to create an image classifier. Intel and Mobile ODT released a large dataset for a medical image classifying competition around the same time I was brainstorming possible projects. Their dataset included thousands of medical images of cervixes that were labeled by medical professionals as one of three types based on anatomy. Healthcare providers often have difficulty determining the anatomical classification of a cervix during an examination. Some types of cervixes require additional screening to determine if pathology is present. Thus, an algorithm-aided decision of cervical type could improve the quality of cervical cancer screening for patients and efficiency for practitioners.

I began my project with some exploration of the images. I used t-SNE (t-distributed stochastic neighbor embedding) in scikit-learn, which is a tool to visualize high-dimensional data. Visualizing each image as a point in a 3-D plot showed that none of the three classes of cervixes clustered together. I also used a hierarchical cluster analysis in seaborn to confirm that the images did not easily group together by their three classes.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone

How to Catch ‘Em All: Alumni Spotlight on Yina Gu

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists.  Yina was a Fellow in our Winter 2017 cohort who landed a job with one of our hiring partners, Opera Solutions

Tell us about your background. How did it set you up to be a great data scientist 

I received my PhD degree from The Ohio State University majoring computational chemistry. For my PhD research, I developed multiple predictive models and published web servers to solve various biophysics problems using machine learning and statistical methods in Python, R and Matlab. The data science skills and experiences I gained in my 5 years of PhD not only allow me to solve the fundamental scientific problems effectively and efficiently, but also enable my transition from academia to industry to solve the real-world challenges.

What do you think you got out of The Data Incubator?

The 8-weeks intensive training at The Data Incubator really helped me to go deeper into data science field and get fully prepared for the essential skills to work in a big data industry with the cutting-edge analytics techniques, including programming, machine learning, data visualization as well as business mindset. Last but not least, I believe the networking with other very talented fellows are the most valuable thing I got out of TDI!

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone

The Many Facets of Artificial Intelligence

artificial-intelligence-2228610_960_720When you think of artificial intelligence (AI), do you envision C-3PO or matrix multiplication? HAL 9000 or pruning decision trees? This is an example of ambiguous language, and for a field which has gained so much traction in recent years, it’s particularly important that we think about and define what we mean by artificial intelligence – especially when communicating between managers, salespeople, and the technical side of things. These days, AI is often used as a synonym for deep learning, perhaps because both ideas entered popular tech-consciousness at the same time. In this article I’ll go over the big picture definition of AI and how it differs from machine learning and deep learning. Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone

Developing a Foundation for Data Science: Alumni Spotlight on Ryan Jadrich

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists.  Ryan was a Fellow in our Fall 2016 cohort who landed a job at Austin based startup OK Roger

Tell us about your background. How did it set you up to be a great data scientist 

My PhD and postdoctoral work was in the field of statistical mechanics with a strong emphasis on the design of new colloidal materials. Such research has required me to develop a hybrid set of strong analytical math and computational skills—of which have been extremely useful for bridging into Data Science. From the deeper level understanding afforded by this mixed skill set, I feel well posed to leverage existing technologies as well as develop novel alternatives.  As an example of the latter, my forays into the fundamentals of Machine Learning helped me to develop a super-computing application capable of inferring the inter-particle forces an experimentalist must engineer to elicit a desired material property. This required the development of both an analytical framework and an underlying large scale molecular simulation element. Combining these general technical skills with what I learned at The Data Incubator, I feel well poised to be successful in a Data Science position

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedInEmail this to someone
Share this with someone