Data are becoming the new raw material of business
The Economist

Hiring Data Scientists from Outside the US: A Primer on Visas

This post was written collectively by myself, Michael J. Wildes, & Adam W. Moses. The full, original post for this piece can be found at Harvard Business Review

 

world-dots-map

It’s no secret that there’s a shortage of data scientists in America’s workforce. Many companies look to hire overseas to help ease the domestic talent shortfall (in fact, one in three data scientists are born outside the U.S.) so understanding the ins and outs of visas is rapidly becoming a business necessity. Not all visas are created equal. Some are drastically more expensive, can have lengthy approval processes, and low approval rates. In a tight labor market, it’s imperative that a hiring manager understand the immigration issues that affect who they can hire and how.

At The Data Incubator, we help companies hire data scientists from our big data fellowship.  Visa and immigration issues for our non-domestic fellows are one of the most common questions that the companies we w
ork with ask us about. So we’ve teamed up with Wildes & Weinberg P.C., a leading immigration law firm, to come up with a short primer for employees on technical visas. We compared the six visa categories our data scientist fellows most commonly qualify for (F1-OPT, TN, H1-B, H-1B1, E3, O1) across the six criteria employers care about (eligibility, legal fees, filing fees, quota, length of process, and chances of approval).  Here’s what we found:

F-1 Visa “Optional Practical Training”

Who’s eligible?
Undergraduate and graduate students with F-1 visa status who have completed or have been pursuing their degrees for more than nine months are permitted by the United States Citizenship and Immigration Services (USCIS) to work for one year on a student visa towards getting practical training to complement their education. This year of employment is known as “Optional Practical Training” or “OPT.”

Typical legal fees
For the F-1 OPT there are typically no legal fees. Most F-1 students apply for their optional practical training employment cards on their own and without the assistance of an attorney. The international student department of most colleges and universities provide hands-on instructions for students in navigating the processes.

Approximate filing fees
There is a $380 fee for filing the initial OPT application for one-year of work authorization. Graduates with STEM degrees can file for a 2-year extension, which costs another $380 (most companies pass this cost along to the employee, but not always).

Is there a quota?
There is no quota, all F-1 graduates of degree programs at the bachelor’s level and higher are eligible.

How long does the process take from filing?
It takes 90 days from submission of application, or around two to five months from the student’s graduation. F-1 visa holders may apply for work authorization in a 150-day window, 60 days before graduation and 90 days after graduation. USCIS, by law, must approve it within 90 days of the application (unless it requires additional information). If the student already has their Employment Authorization Document and is merely switching employers, then it takes one to two weeks.

What are the chances of approval?
Nearly guaranteed.

 

For a full list of visas and their descriptions + requirements visit the original post at Harvard Business Review

 

 

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Spark comparison: AWS vs. GCP

This post was written collectively by myself and Ariel M’ndange-Pfupfu. The original post for this piece can be found at O’Reilly

There’s little doubt that cloud computing will play an important role in data science for the foreseeable future. The flexible, scalable, on-demand computing power available is an important resource, and as a result, there’s a lot of competition between the providers of this service. Two of the biggest players in the space are Amazon Web Services (AWS) and Google Cloud Platform (GCP).

This article includes a short comparison of distributed Spark workloads in AWS and GCP—both in terms of setup time and operating cost. We ran this experiment with our students at The Data Incubator, a big data training organization that helps companies hire top-notch data scientists and train their employees on the latest data science skills. Even with the efficiencies built into Spark, the cost and time of distributed workloads can be substantial, and we are always looking for the most efficient technologies so our students are learning the best and fastest tools.

Submitting Spark jobs to the cloud

Spark is a popular distributed computation engine that incorporates MapReduce-like aggregations into a more flexible, abstract framework. There are APIs for Python and Java, but writing applications in Spark’s native Scala is preferable. That makes job submission simple, as you can package your application and all its dependencies into one JAR file.

It’s common to use Spark in conjunction with HDFS for distributed data storage, and YARN for cluster management; this makes Spark a perfect fit for AWS’s Elastic MapReduce (EMR) clusters and GCP’s Dataproc clusters. Both EMR and Dataproc clusters have HDFS and YARN preconfigured, with no extra work required.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

The Role of Data Science in Fintech

The following piece is co-written with our friends at Kabbage.

The application of computation and statistics to the real world has opened up an entire new paradigm. Named data science, it is in the process of transforming everything from media to business into more efficient and productive endeavors.

Nowhere has data science been more effective in creating change than the in the financial world. Data science has spawned an entirely new financial animal known as fintech. Fintech has radically altered the financial landscape by facilitating the application of big data and complex calculations to financial decision making. Accuracy and predictive ability has been drastically improved across the board by the use of data science in the field of financial decision making.  At The Data Incubator, we work with hundreds of companies innovating in fintech and beyond who are looking to enroll their employees in our big data corporate training or hire data scientists who graduated from our fellowship.  At Kabbage, we are dedicated to supporting the small business community and helping small business owners qualify for a line of credit in minutes with our fully automated online lending platform. Below are five applications of data science in Fintech.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Data-Driven Solutions for Agronomy: Alumni Spotlight on Lindsay Bellani

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. Lindsay was a Fellow in our Summer 2015 cohort who landed a job with one of our hiring partners, DuPont Pioneer.

Tell us about your background. How did it set you up to be a great data scientist? 

I love biology — in particular, neuroscience — and I had every intention of pursuing a career in academia. I received my BS in biology from UNC Chapel Hill, and went on to study neurogenetics at The Rockefeller University in New York City. I decided to pursue a bit of a non-traditional PhD project — I wanted to understand why mosquitoes bite some people more often than others. Though I didn’t know it at the time, it was this choice that led me to a career in data science. I began by setting up a clinical study wherein we recruited hundreds of volunteers and tested them for attractiveness to mosquitoes. We then collected a bunch of different samples from each of them — everything from blood to questionnaire results. We wanted to understand which, if any, of these factors were predictive of mosquito attractiveness. At the end of the study, I was left with a whole lot of data and not a clue what to do with it. With the help of our University’s biostatistics department, in particular Joel Correa da Rosa, I learned how to use machine learning to do predictive modeling. It was a difficult, real-world dataset, and its analysis led to many interesting debates as to what was the best way to handle its various nuances. I began coding on my own to try new ideas, and eventually Joel and I became equal thought partners in the process. I actually ended up working out of the biostatistics office instead of my own lab for a few months before my thesis defense. Through this process, I began to love the art of data science, and I was encouraged to hear from others that I had a knack for it. It was all of the rigor and analytical-thinking and puzzle-solving that I loved about bench science, but even better. Seeing my enthusiasm and aptitude, my husband recommended that I apply for The Data Incubator. I kind of applied on a whim — I think I filled out the application the same day it was due.


I’m grateful for the path that led me to a career in data science. My background in biology has given me the ability to think scientifically about a problem — to understand the nuance of data collection, and how to design a good experiment, and which analyses might provide the biggest insights. Because I ran a clinical study and none of the members of my lab had a background in machine learning, I had to practice explaining this complex data science problem to non-technical audiences, which has been an asset when presenting results to the business side of the company I work for. It’s been a very natural transition, which I think speaks to what a good fit it is for my personality and talents.
From a research perspective, working in a vibrant academic setting also meant learning how to ask bold questions, even at the risk of sounding stupid in front of a large group of mentors and peers–something I’ve done more than I care to admit. For me, finding the right question to ask is just as important as having the technical expertise to find an answer, and that’s one of the things that makes Data Science so exciting.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Economics to Data Science – How You Can Become A Great Data Scientist

Economics to Data Science – How You Can Become A Great Data Scientist145228581481-blake_boswell

If you have a Masters or PhD in Economics and are looking for a career in Data Science, you have come to the right place. We love it when Economists come through our Data Incubator because we know they have the skillset to succeed. For example, Blake Boswell (right) did his Master’s in Economics at Johns Hopkins University, completed the Data Science Fellowship at The Data Incubator Fellowship, and now works at Cornerstone Research.

We’ve found that Economists have extensive training in articulating complex ideas, something students from other disciplines can oftentimes lack. We can give Economics students a fuzzy question and they answer it with Computer and Data Science, and then convert it back into comprehensible words a non data expert could grasp. This is a very important skill to have.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Multi-Armed Bandits

MAB

Special thanks to Brian Farris for contributing this post

Who should care?

Anyone who is involved in testing. Whether you are testing creatives for a marketing campaign, pricing strategies, website designs, or even pharmaceutical treatments, multi-armed bandit algorithms can help you increase the accuracy of your tests while cutting down costs and automating your process.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

5 secrets for writing the perfect data scientist resume

Data scientists are in demand like never before, but nonetheless, getting a job as a data scientist requires a resume that shows off your skills. At The Data Incubator, we’ve received tens of thousands of resumes from applicants for our free Data Science Fellowship. While we work hard to read between the lines to find great candidates who happen to have lackluster CVs, many recruiters may not be as diligent. Based on our experience, here’s the advice we give to our Fellows about how to craft the perfect resume to get hired as a data scientist. Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

MIT’s $75,000 Big Data finishing school (and its many rivals)

New courses target the need for managers and techies to talk to each other as data proliferate

For most students, a top degree in a field such as computer science or maths ought to be a passport to a career perfectly in tune with the relentless digitisation of work.

For the 30 graduates taking up a new one-year course at MIT’s Sloan School of Management in September, it will be only the prelude to a spell in a Big Data finishing school.

This first cohort of students will pay $75,000 in tuition fees for their Master of Business Analytics degree, with classes ranging from “Data mining: Finding the Data and Models that Create Value” to “Applied Probability”.

They will be calculating that the qualification will sprinkle their CVs with extra stardust, attracting elite employers that are trying to find meaning in the increasing volumes of data that businesses are generating. Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Turning Bold Questions into a Data Science Career at Amazon: Alumni Spotlight on David Wallace

At The Data Incubator we run a free eight-week data science fellowship to help our Fellows land industry jobs. We love Fellows with diverse academic backgrounds that go beyond what companies traditionally think of when hiring data scientists. David was a Fellow in our Winter 2016 cohort who landed a job with one of our hiring partners, Amazon.

Tell us about your background. How did it set you up to be a great data scientist? 

Before joining The Data Incubator, I completed my Ph.D. in chemistry at Johns Hopkins University, where I focused on the design and synthesis of new magnetic materials. My work gave me the opportunity to work alongside scientists in many different disciplines, and exposed me to a vast array of experimental techniques and theoretical constructs. From a data science perspective, this meant that I was constantly encountering new types of data and searching for scientifically rigorous models to explain those results. As the volume and complexity of these datasets increased, graphical data analysis tools like Excel and Origin weren’t making the cut for me, and I gradually made the transition to performing data transformation and analysis entirely in Python. That was a big technical leap that took a lot of time and frustration, but I think it ultimately made me a better researcher.

From a research perspective, working in a vibrant academic setting also meant learning how to ask bold questions, even at the risk of sounding stupid in front of a large group of mentors and peers–something I’ve done more than I care to admit. For me, finding the right question to ask is just as important as having the technical expertise to find an answer, and that’s one of the things that makes Data Science so exciting.

Continue reading

Tweet about this on TwitterShare on FacebookShare on LinkedIn

Testing Jupyter Notebooks

This was originally posted on Christian Moscardi’s blog and is a follow-up piece to another post on Embedding D3 in IPython Notebook. Christian is our Lead Developer! 

Jupyter is a fantastic tool that we use at The Data Incubator for instructional purposes. One perk of using Jupyter is that we can easily test our code samples across any language there’s a Jupyter kernel for. In this post, I’ll show you some of the code we use to test notebooks!

First, a quick discussion of the current state of testing ipython notebooks: there isn’t much documented about the process. ipython_nose is a really helpful extension for writing tests into your notebooks, but there’s no documentation or information about easy end-to-end testing. In particular, we want the programmatic equivalent of clicking “run all cells”.After poking around things like github’s list of cool ipython notebooks and the Jupyter docs, two things became apparent to us:

  1. Most people do not test their notebooks.
  2. Automated end-to-end testing is extremely easy to implement. Continue reading
Tweet about this on TwitterShare on FacebookShare on LinkedIn