Data are becoming the new raw material of business
The Economist


SQLite vs Pandas: Performance Benchmarks

This technical article was written for The Data Incubator by Paul Paczuski, a Fellow of our 2016 Spring cohort in New York City who landed a job with our hiring partner, Genentech as a Clinical Data Scientist.

As a data scientist, we all know that unglamorous data manipulation is 90% of the work. Two of the most common data manipulation tools are SQL and pandas. In this blog, we’ll compare the performance of pandas and SQLite, a simple form of SQL favored by Data Scientists.

Let’s find out the tasks at which each of these excel. Below, we compare Python’s pandas to sqlite for some common data analysis operations: sort, select, load, join, filter, and group by.

Continue reading


NumPy and pandas – Crucial Tools for Data Scientists

This technical article was written for The Data Incubator by Don Fox, a Fellow of our 2017 Summer cohort in New York City.

When it comes to scientific computing and data science, two key python packages are NumPy and pandas. NumPy is a powerful python library that expands Python’s functionality by allowing users to create multi-dimenional array objects (ndarray). In addition to the creation of ndarray objects, NumPy provides a large set of mathematical functions that can operate quickly on the entries of the ndarray without the need of for loops.

Continue reading