Data are becoming the new raw material of business
The Economist


GPU Cloud Computing Services Compared: AWS, Google Cloud, IBM Nimbix/Power AI, and Crestle

This technical article was written for The Data Incubator by Tim Pollio, a Fellow of our 2017 Fall cohort in Washington, DC who joined The Data Incubator team as one of our resident Data Scientist instructors.

At The Data Incubator, a data science training and placement company, we’re excited about the potential for neural networks and deep learning to transform AI and Big Data. Of course, to practically run deep learning, normal CPUs won’t suffice — you’ll need GPUs. GPUs can dramatically increase the speed of deep learning algorithms, so it’s no surprise that they’re becoming increasingly popular and accessible. Amazon, Google, and IBM all offer GPU enabled options with their cloud computing services, and newer companies like Crestle provide additional options.

We tried four different services — Amazon Web Services, Google Cloud Platform, Nimbix/PowerAI, and Crestle — to find the options with the best performance, price, and convenience. Each service was tested using the same task: 1000 training steps on a tensorflow neural network designed for text prediction. The code for this benchmark can be found here.

Continue reading


Ranking Popular Distributed Computing Packages for Data Science

At The Data Incubator, we strive to provide the most up-to-date data science curriculum available. Using feedback from our corporate and government partners, we deliver training on the most sought after data science tools and techniques in industry. We wanted to include a more data-driven approach to developing the curriculum for our corporate data science training and our free Data Science Fellowship program for PhD and master’s graduates looking to get hired as professional Data Scientists. To achieve this goal, we started by looking at and ranking popular deep learning libraries for data science. Next, we wanted to analyze the popularity of distributed computing packages for data science. Here are the results.

The Rankings

Below is a ranking of the top 20 of 140 distributed computing packages that are useful for Data Science, based on Github and Stack Overflow activity, as well as Google Search results. The table shows standardized scores, where a value of 1 means one standard deviation above average (average = score of 0). For example, Apache Hadoop is 6.6 standard deviations above average in Stack Overflow activity, while Apache Flink is close to average. See below for methods.
Continue reading