Data are becoming the new raw material of business
The Economist

3 Reasons You Should Attend a Data Conference

This piece was written by The Data Incubator’s Megan Cummings.


Virtually every week across the globe there is a high-quality data conference happening. Each year, you should take a moment to analyze your budget and availability to find a conference that would best fit your needs. Whether you’re a computer science student, entry level data scientist, Chief Technology Officer, or CEO, attending a conference is an essential key to career development, growing your professional network, and staying on top of the latest trends. So if at any point you’ve ever been hesitant to register for a conference, here are the top three takeaways:


This is the most obvious benefit of attending a conference and the sole reason why some people choose to attend. There’s something new to learn in the data science universe every day, and many of these conferences are at the foreground of these cutting-edge changes. Conferences are where you can access the most up to date industry information, oftentimes from the sources themselves. With hundreds, sometimes thousands of attendees, you can also gain exposure to a boundless variety of new ideas and trends that can benefit your latest project or business venture. When planning out your conference schedule for the year, be picky and selective about which ones to dedicate your time and money to by first asking yourself what you hope to learn from each one. Take time to research the speakers, break out sessions, and workshops and make a schedule far in advance to ensure you’re getting the most out of the conference as possible. On top of that, make sure what you’re taking away is beneficial for your organization, professional development, or career growth.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Data Sources for Cool Data Science Projects: Part 3

student-849822_960_720Links to Part 1, Part 2

At The Data Incubator, we run a free eight week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are some more cool public data sources you can use for your next project: Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Hiring Data Scientists from Outside the US: A Primer on Visas

This post was written collectively by myself, Michael J. Wildes, & Adam W. Moses. The full, original post for this piece can be found at Harvard Business Review



It’s no secret that there’s a shortage of data scientists in America’s workforce. Many companies look to hire overseas to help ease the domestic talent shortfall (in fact, one in three data scientists are born outside the U.S.) so understanding the ins and outs of visas is rapidly becoming a business necessity. Not all visas are created equal. Some are drastically more expensive, can have lengthy approval processes, and low approval rates. In a tight labor market, it’s imperative that a hiring manager understand the immigration issues that affect who they can hire and how.

At The Data Incubator, we help companies hire data scientists from our big data fellowship.  Visa and immigration issues for our non-domestic fellows are one of the most common questions that the companies we w
ork with ask us about. So we’ve teamed up with Wildes & Weinberg P.C., a leading immigration law firm, to come up with a short primer for employees on technical visas. We compared the six visa categories our data scientist fellows most commonly qualify for (F1-OPT, TN, H1-B, H-1B1, E3, O1) across the six criteria employers care about (eligibility, legal fees, filing fees, quota, length of process, and chances of approval).  Here’s what we found:  Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Spark comparison: AWS vs. GCP

This post was written collectively by myself and Ariel M’ndange-Pfupfu. The original post for this piece can be found at O’Reilly

cloud-computing-2001090_960_720There’s little doubt that cloud computing will play an important role in data science for the foreseeable future. The flexible, scalable, on-demand computing power available is an important resource, and as a result, there’s a lot of competition between the providers of this service. Two of the biggest players in the space are Amazon Web Services (AWS) and Google Cloud Platform (GCP).

This article includes a short comparison of distributed Spark workloads in AWS and GCP—both in terms of setup time and operating cost. We ran this experiment with our students at The Data Incubator, a big data training organization that helps companies hire top-notch data scientists and train their employees on the latest data science skills. Even with the efficiencies built into Spark, the cost and time of distributed workloads can be substantial, and we are always looking for the most efficient technologies so our students are learning the best and fastest tools.

Submitting Spark jobs to the cloud

Spark is a popular distributed computation engine that incorporates MapReduce-like aggregations into a more flexible, abstract framework. There are APIs for Python and Java, but writing applications in Spark’s native Scala is preferable. That makes job submission simple, as you can package your application and all its dependencies into one JAR file.

It’s common to use Spark in conjunction with HDFS for distributed data storage, and YARN for cluster management; this makes Spark a perfect fit for AWS’s Elastic MapReduce (EMR) clusters and GCP’s Dataproc clusters. Both EMR and Dataproc clusters have HDFS and YARN preconfigured, with no extra work required.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

The Role of Data Science in Fintech

The following piece is co-written with our friends at Kabbage.

entrepreneur-1340649_960_720The application of computation and statistics to the real world has opened up an entire new paradigm. Named data science, it is in the process of transforming everything from media to business into more efficient and productive endeavors.

Nowhere has data science been more effective in creating change than the in the financial world. Data science has spawned an entirely new financial animal known as fintech. Fintech has radically altered the financial landscape by facilitating the application of big data and complex calculations to financial decision making. Accuracy and predictive ability has been drastically improved across the board by the use of data science in the field of financial decision making.  At The Data Incubator, we work with hundreds of companies innovating in fintech and beyond who are looking to enroll their employees in our big data corporate training or hire data scientists who graduated from our fellowship.  At Kabbage, we are dedicated to supporting the small business community and helping small business owners qualify for a line of credit in minutes with our fully automated online lending platform. Below are five applications of data science in Fintech.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Data-Driven Solutions for Agronomy: Alumni Spotlight on Lindsay Bellani

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Lindsay was a Fellow in our Summer 2015 cohort who landed a job with one of our hiring partners, DuPont Pioneer.

Tell us about your background. How did it set you up to be a great data scientist? 

I love biology — in particular, neuroscience — and I had every intention of pursuing a career in academia. I received my BS in biology from UNC Chapel Hill, and went on to study neurogenetics at The Rockefeller University in New York City. I decided to pursue a bit of a non-traditional PhD project — I wanted to understand why mosquitoes bite some people more often than others. Though I didn’t know it at the time, it was this choice that led me to a career in data science. I began by setting up a clinical study wherein we recruited hundreds of volunteers and tested them for attractiveness to mosquitoes. We then collected a bunch of different samples from each of them — everything from blood to questionnaire results. We wanted to understand which, if any, of these factors were predictive of mosquito attractiveness. At the end of the study, I was left with a whole lot of data and not a clue what to do with it. With the help of our University’s biostatistics department, in particular Joel Correa da Rosa, I learned how to use machine learning to do predictive modeling. It was a difficult, real-world dataset, and its analysis led to many interesting debates as to what was the best way to handle its various nuances. I began coding on my own to try new ideas, and eventually Joel and I became equal thought partners in the process. I actually ended up working out of the biostatistics office instead of my own lab for a few months before my thesis defense. Through this process, I began to love the art of data science, and I was encouraged to hear from others that I had a knack for it. It was all of the rigor and analytical-thinking and puzzle-solving that I loved about bench science, but even better. Seeing my enthusiasm and aptitude, my husband recommended that I apply for The Data Incubator. I kind of applied on a whim — I think I filled out the application the same day it was due.

I’m grateful for the path that led me to a career in data science. My background in biology has given me the ability to think scientifically about a problem — to understand the nuance of data collection, and how to design a good experiment, and which analyses might provide the biggest insights. Because I ran a clinical study and none of the members of my lab had a background in machine learning, I had to practice explaining this complex data science problem to non-technical audiences, which has been an asset when presenting results to the business side of the company I work for. It’s been a very natural transition, which I think speaks to what a good fit it is for my personality and talents.
From a research perspective, working in a vibrant academic setting also meant learning how to ask bold questions, even at the risk of sounding stupid in front of a large group of mentors and peers–something I’ve done more than I care to admit. For me, finding the right question to ask is just as important as having the technical expertise to find an answer, and that’s one of the things that makes Data Science so exciting.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Economics to Data Science – How You Can Become a Great Data Scientist

145228581481-blake_boswellIf you have a Masters or PhD in Economics and are looking for a career in Data Science, you have come to the right place. We love it when Economists come through our Data Incubator because we know they have the skillset to succeed. For example, Blake Boswell (right) did his Master’s in Economics at Johns Hopkins University, completed the Data Science Fellowship at The Data Incubator and now works at Cornerstone Research.

We’ve found that Economists have extensive training in articulating complex ideas, something students from other disciplines can oftentimes lack. We can give Economics students a fuzzy question and they answer it with Computer and Data Science, and then convert it back into comprehensible words a non data expert could grasp. This is a very important skill to have.

Most data scientists don’t approach problems like Econometricians. In data science there is no unifying theory, the goal is to predict outcomes given the data – not to use data to estimate model parameters as Econometricians do. Both approaches have their merits, but predictions take precedent in industry. Nonetheless, your training as an Economist will help you to avoid drawing some inappropriate conclusions from data, where many Data Scientists wouldn’t think through to how the deep structural changes can undermine predictions.  Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Multi-Armed Bandits

Special thanks to Brian Farris for contributing this post.

MABWho should care?

Anyone who is involved in testing. Whether you are testing creatives for a marketing campaign, pricing strategies, website designs, or even pharmaceutical treatments, multi-armed bandit algorithms can help you increase the accuracy of your tests while cutting down costs and automating your process.


Where does the name come from?

A typical slot machine is a device in which the player pulls a lever arm and receives rewards at some expected rate. Because the expected rate is typically negative, these machines are sometimes referred to as “one-armed bandits”. By analogy, a “multi-armed bandit” is a machine in which there are multiple lever arms to pull, each one of which may pay out at a different expected rate.  Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

5 secrets for writing the perfect data scientist resume

handshake-2056023_960_720Data scientists are in demand like never before, but nonetheless, getting a job as a data scientist requires a resume that shows off your skills. At The Data Incubator, we’ve received tens of thousands of resumes from applicants for our free Data Science Fellowship. While we work hard to read between the lines to find great candidates who happen to have lackluster CVs, many recruiters may not be as diligent. Based on our experience, here’s the advice we give to our Fellows about how to craft the perfect resume to get hired as a data scientist.

Be brief: A resume is a summary of your accomplishments. It is not the right place to put your little-league participation award. Remember, you are being judged on something a lot closer to theaverage of your listed accomplishments than their sum. Giving unnecessary information will only dilute your average. Keep your resume to no more than one page. Remember that a busy HR person will scan your resume for 10 seconds. Adding more content will only distract them from finding key information (as will that second page). That said, don’t play font games; keep text at 11-point font or above.  Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

MIT’s $75,000 Big Data finishing school (and its many rivals)

New courses target the need for managers and techies to talk to each other as data proliferate

For most students, a top degree in a field such as computer science or maths ought to be a passport to a career perfectly in tune with the relentless digitisation of work.

For the 30 graduates taking up a new one-year course at MIT’s Sloan School of Management in September, it will be only the prelude to a spell in a Big Data finishing school.

This first cohort of students will pay $75,000 in tuition fees for their Master of Business Analytics degree, with classes ranging from “Data mining: Finding the Data and Models that Create Value” to “Applied Probability”.

They will be calculating that the qualification will sprinkle their CVs with extra stardust, attracting elite employers that are trying to find meaning in the increasing volumes of data that businesses are generating. Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn