Data are becoming the new raw material of business
The Economist

Python Multi-Threading vs Multi-Processing

There is a library called threading in Python and it uses threads (rather than just processes) to implement parallelism. This may be surprising news if you know about the Python’s Global Interpreter Lock, or GIL, but it actually works well for certain instances without violating the GIL. And this is all done without any overhead — simply define functions that make I/O requests and the system will handle the rest.


Global Interpreter Lock

The Global Interpreter Lock reduces the usefulness of threads in Python (more precisely CPython) by allowing only one native thread to execute at a time. This made implementing Python easier to implement in the (usually thread-unsafe) C libraries and can increase the execution speed of single-threaded programs. However, it remains controvertial because it prevents true lightweight parallelism. You can achieve parallelism, but it requires using multi-processing, which is implemented by the eponymous library multiprocessing. Instead of spinning up threads, this library uses processes, which bypasses the GIL.

It may appear that the GIL would kill Python multithreading but not quite. In general, there are two main use cases for multithreading:

  1. To take advantage of multiple cores on a single machine
  2. To take advantage of I/O latency to process other threads

In general, we cannot benefit from (1) with threading but we can benefit from (2).

Continue reading

Ranking Popular Deep Learning Libraries for Data Science

Gold Blog
At The Data Incubator, we pride ourselves on having the most up to date data science curriculum available. Much of our curriculum is based on feedback from corporate and government partners about the technologies they are using and learning. In addition to their feedback we wanted to develop a data-driven approach for determining what we should be teaching in our data science corporate training and our free fellowship for masters and PhDs looking to enter data science careers in industry. Here are the results.

The Rankings

Below is a ranking of 23 open-source deep learning libraries that are useful for Data Science, based on Github and Stack Overflow activity, as well as Google search results. The table shows standardized scores, where a value of 1 means one standard deviation above average (average = score of 0). For example, Caffe is one standard deviation above average in Github activity, while deeplearning4j is close to average. See below for methods.

Continue reading

MIT’s $75,000 Big Data finishing school (and its many rivals)

New courses target the need for managers and techies to talk to each other as data proliferate

For most students, a top degree in a field such as computer science or maths ought to be a passport to a career perfectly in tune with the relentless digitisation of work.

For the 30 graduates taking up a new one-year course at MIT’s Sloan School of Management in September, it will be only the prelude to a spell in a Big Data finishing school.

This first cohort of students will pay $75,000 in tuition fees for their Master of Business Analytics degree, with classes ranging from “Data mining: Finding the Data and Models that Create Value” to “Applied Probability”.

They will be calculating that the qualification will sprinkle their CVs with extra stardust, attracting elite employers that are trying to find meaning in the increasing volumes of data that businesses are generating. Continue reading