Data are becoming the new raw material of business
The Economist

Data Scientist Salaries

money-548948_960_720At The Data Incubator we’ve worked with hundreds of Fellows looking to enter industry and our alumni work at companies including LinkedIn, Palantir, Amazon, Capital One, and the NYTimes.  

Starting salary is one of the most common concerns for professionals entering any field, but as we’ve only been using the job title “Data Scientist” for about eight years it can be particularly challenging for prospective data scientists to find good information on their job market. LinkedIn and Facebook were the first to give employees on their data teams the title of data scientist, but now there are thousands of data scientists working across all industries alongside data engineers, data analysts, and quantitative analysts.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Data Sources for Cool Data Science Projects: Part 5

computer-1185626_960_720Links to Part 1Part 2Part 3, Part 4

At The Data Incubator, we run a free eight week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are some more cool public data sources you can use for your next project:

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Data Sources for Cool Data Science Projects: Part 4

student-849825_960_720Links to Part 1Part 2Part 3

At The Data Incubator, we run a free eight week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are some more cool public data sources you can use for your next project: Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

3 Reasons You Should Attend a Data Conference

This piece was written by The Data Incubator’s Megan Cummings.

 

Virtually every week across the globe there is a high-quality data conference happening. Each year, you should take a moment to analyze your budget and availability to find a conference that would best fit your needs. Whether you’re a computer science student, entry level data scientist, Chief Technology Officer, or CEO, attending a conference is an essential key to career development, growing your professional network, and staying on top of the latest trends. So if at any point you’ve ever been hesitant to register for a conference, here are the top three takeaways:

Learning

This is the most obvious benefit of attending a conference and the sole reason why some people choose to attend. There’s something new to learn in the data science universe every day, and many of these conferences are at the foreground of these cutting-edge changes. Conferences are where you can access the most up to date industry information, oftentimes from the sources themselves. With hundreds, sometimes thousands of attendees, you can also gain exposure to a boundless variety of new ideas and trends that can benefit your latest project or business venture. When planning out your conference schedule for the year, be picky and selective about which ones to dedicate your time and money to by first asking yourself what you hope to learn from each one. Take time to research the speakers, break out sessions, and workshops and make a schedule far in advance to ensure you’re getting the most out of the conference as possible. On top of that, make sure what you’re taking away is beneficial for your organization, professional development, or career growth.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Data Sources for Cool Data Science Projects: Part 3

student-849822_960_720Links to Part 1, Part 2

At The Data Incubator, we run a free eight week data science fellowship to help our Fellows land industry jobs. Our hiring partners love considering Fellows who don’t mind getting their hands dirty with data.  That’s why our Fellows work on cool capstone projects that showcase those skills.  One of the biggest obstacles to successful projects has been getting access to interesting data.  Here are some more cool public data sources you can use for your next project: Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Hiring Data Scientists from Outside the US: A Primer on Visas

This post was written collectively by myself, Michael J. Wildes, & Adam W. Moses. The full, original post for this piece can be found at Harvard Business Review

 

world-dots-map

It’s no secret that there’s a shortage of data scientists in America’s workforce. Many companies look to hire overseas to help ease the domestic talent shortfall (in fact, one in three data scientists are born outside the U.S.) so understanding the ins and outs of visas is rapidly becoming a business necessity. Not all visas are created equal. Some are drastically more expensive, can have lengthy approval processes, and low approval rates. In a tight labor market, it’s imperative that a hiring manager understand the immigration issues that affect who they can hire and how.

At The Data Incubator, we help companies hire data scientists from our big data fellowship.  Visa and immigration issues for our non-domestic fellows are one of the most common questions that the companies we w
ork with ask us about. So we’ve teamed up with Wildes & Weinberg P.C., a leading immigration law firm, to come up with a short primer for employees on technical visas. We compared the six visa categories our data scientist fellows most commonly qualify for (F1-OPT, TN, H1-B, H-1B1, E3, O1) across the six criteria employers care about (eligibility, legal fees, filing fees, quota, length of process, and chances of approval).  Here’s what we found:  Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Spark comparison: AWS vs. GCP

This post was written collectively by myself and Ariel M’ndange-Pfupfu. The original post for this piece can be found at O’Reilly

cloud-computing-2001090_960_720There’s little doubt that cloud computing will play an important role in data science for the foreseeable future. The flexible, scalable, on-demand computing power available is an important resource, and as a result, there’s a lot of competition between the providers of this service. Two of the biggest players in the space are Amazon Web Services (AWS) and Google Cloud Platform (GCP).

This article includes a short comparison of distributed Spark workloads in AWS and GCP—both in terms of setup time and operating cost. We ran this experiment with our students at The Data Incubator, a big data training organization that helps companies hire top-notch data scientists and train their employees on the latest data science skills. Even with the efficiencies built into Spark, the cost and time of distributed workloads can be substantial, and we are always looking for the most efficient technologies so our students are learning the best and fastest tools.

Submitting Spark jobs to the cloud

Spark is a popular distributed computation engine that incorporates MapReduce-like aggregations into a more flexible, abstract framework. There are APIs for Python and Java, but writing applications in Spark’s native Scala is preferable. That makes job submission simple, as you can package your application and all its dependencies into one JAR file.

It’s common to use Spark in conjunction with HDFS for distributed data storage, and YARN for cluster management; this makes Spark a perfect fit for AWS’s Elastic MapReduce (EMR) clusters and GCP’s Dataproc clusters. Both EMR and Dataproc clusters have HDFS and YARN preconfigured, with no extra work required.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

The Role of Data Science in Fintech

The following piece is co-written with our friends at Kabbage.

entrepreneur-1340649_960_720The application of computation and statistics to the real world has opened up an entire new paradigm. Named data science, it is in the process of transforming everything from media to business into more efficient and productive endeavors.

Nowhere has data science been more effective in creating change than the in the financial world. Data science has spawned an entirely new financial animal known as fintech. Fintech has radically altered the financial landscape by facilitating the application of big data and complex calculations to financial decision making. Accuracy and predictive ability has been drastically improved across the board by the use of data science in the field of financial decision making.  At The Data Incubator, we work with hundreds of companies innovating in fintech and beyond who are looking to enroll their employees in our big data corporate training or hire data scientists who graduated from our fellowship.  At Kabbage, we are dedicated to supporting the small business community and helping small business owners qualify for a line of credit in minutes with our fully automated online lending platform. Below are five applications of data science in Fintech.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Data-Driven Solutions for Agronomy: Alumni Spotlight on Lindsay Bellani

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Lindsay was a Fellow in our Summer 2015 cohort who landed a job with one of our hiring partners, DuPont Pioneer.

Tell us about your background. How did it set you up to be a great data scientist? 

I love biology — in particular, neuroscience — and I had every intention of pursuing a career in academia. I received my BS in biology from UNC Chapel Hill, and went on to study neurogenetics at The Rockefeller University in New York City. I decided to pursue a bit of a non-traditional PhD project — I wanted to understand why mosquitoes bite some people more often than others. Though I didn’t know it at the time, it was this choice that led me to a career in data science. I began by setting up a clinical study wherein we recruited hundreds of volunteers and tested them for attractiveness to mosquitoes. We then collected a bunch of different samples from each of them — everything from blood to questionnaire results. We wanted to understand which, if any, of these factors were predictive of mosquito attractiveness. At the end of the study, I was left with a whole lot of data and not a clue what to do with it. With the help of our University’s biostatistics department, in particular Joel Correa da Rosa, I learned how to use machine learning to do predictive modeling. It was a difficult, real-world dataset, and its analysis led to many interesting debates as to what was the best way to handle its various nuances. I began coding on my own to try new ideas, and eventually Joel and I became equal thought partners in the process. I actually ended up working out of the biostatistics office instead of my own lab for a few months before my thesis defense. Through this process, I began to love the art of data science, and I was encouraged to hear from others that I had a knack for it. It was all of the rigor and analytical-thinking and puzzle-solving that I loved about bench science, but even better. Seeing my enthusiasm and aptitude, my husband recommended that I apply for The Data Incubator. I kind of applied on a whim — I think I filled out the application the same day it was due.


I’m grateful for the path that led me to a career in data science. My background in biology has given me the ability to think scientifically about a problem — to understand the nuance of data collection, and how to design a good experiment, and which analyses might provide the biggest insights. Because I ran a clinical study and none of the members of my lab had a background in machine learning, I had to practice explaining this complex data science problem to non-technical audiences, which has been an asset when presenting results to the business side of the company I work for. It’s been a very natural transition, which I think speaks to what a good fit it is for my personality and talents.
From a research perspective, working in a vibrant academic setting also meant learning how to ask bold questions, even at the risk of sounding stupid in front of a large group of mentors and peers–something I’ve done more than I care to admit. For me, finding the right question to ask is just as important as having the technical expertise to find an answer, and that’s one of the things that makes Data Science so exciting.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Economics to Data Science – How You Can Become a Great Data Scientist

145228581481-blake_boswellIf you have a Masters or PhD in Economics and are looking for a career in Data Science, you have come to the right place. We love it when Economists come through our Data Incubator because we know they have the skillset to succeed. For example, Blake Boswell (right) did his Master’s in Economics at Johns Hopkins University, completed the Data Science Fellowship at The Data Incubator and now works at Cornerstone Research.

We’ve found that Economists have extensive training in articulating complex ideas, something students from other disciplines can oftentimes lack. We can give Economics students a fuzzy question and they answer it with Computer and Data Science, and then convert it back into comprehensible words a non data expert could grasp. This is a very important skill to have.

Most data scientists don’t approach problems like Econometricians. In data science there is no unifying theory, the goal is to predict outcomes given the data – not to use data to estimate model parameters as Econometricians do. Both approaches have their merits, but predictions take precedent in industry. Nonetheless, your training as an Economist will help you to avoid drawing some inappropriate conclusions from data, where many Data Scientists wouldn’t think through to how the deep structural changes can undermine predictions.  Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn