Data are becoming the new raw material of business
The Economist

As many of our readers might know, conda is a package manager for the numerical python stack that solves many of the issues where pip falls short. While pip is great for pure Python packages (ones written exclusively in Python code), most data science packages need to rely on C code for performance. This, unfortunately, makes installation highly system-dependent. Conda alleviates this problem by managing compiled binaries such that a user does not the need to have a full suite of local build tools (for example, building NumPy from source no longer requires a FORTRAN 77 compiler). Additionally, conda is moving to include more than Python. For example, it’s also supporting managing packages in the R language. With such an ambitious scope, it’s not surprising that package coverage is incomplete and, if you’re a power user, you’ll often see yourself wanting to contribute missing packages. Here’s a primer on how:

## Installing Using Pip / PyPI in Conda as a Fallback

The first thing to notice is that you don’t necessarily need to jump to building or uploading packages. As a simple fallback, you can tell conda to build Python Packages directly from pip/PyPI. For example, take a look at this simple conda environment.yaml file:

# environment.yaml
dependencies:
- numpy
- scipy
- pip:
- requests

This installs numpy and scipy from anaconda but installs requests using pip.  You can invoke it by running:

conda env update -f environment.yaml

## Adding New Channels to Conda for more Packages

While the core maintainers has fairly good coverage, the coverage isn’t complete. Also, because conda packages are version and system dependent, the odds of a specific version not existing for your operating system is fairly high. For example, as of this writing, scrapy, a popular web scraping software, lives in the popular conda-forge channel (package). Similarly, many r packages are under the r channel, for example the r-dplyr package. R packages are, by convention, prefaced with a “r-” prefix to their CRAN name. You can find the channel that supports it by Googling “scrapy conda”. To install it, we’ll need to add conda-forge as a channel:

# environment.yaml
channels:
- conda-forge
dependencies:
- scrapy

## Building PyPI Conda Packages

But sometimes, even with extra channels, the packages simply don’t exist (as evidenced by a Google query) . In this case, we just have to build our own packages. It’s fairly easy. The first step is to create a user account on anaconda.org. The username will be your channel name.

First, you’ll want to make sure you have conda build and anaconda client installed:

conda install conda-build
conda install anaconda-client

and you’ll want to authenticate your anaconda client with your new anaconda.org credentials:

anaconda login

Finally, it’s easiest to configure anaconda to automatically upload all successful builds to your channel:

conda config --set anaconda_upload yes

With this setup, it’s easy! Below are the instructions for uploading the pyinstrument package:

# download pypi data to ./pyinstrument
conda skeleton pypi pyinstrument
# build the package
conda build ./pyinstrument
# you'll see that the package is automatically uploaded

## Building R Conda Packages

Finally, we’ll want to do this with R packages. Fortunately, there’s conda support for building from CRAN, R’s package manager. For example, glm2 is (surprisingly enough) not on anaconda.  We’ll run the following commands to build and auto upload it:

conda skeleton cran glm2
conda build ./r-glm2

Of course, glm2 now has OSX and Linux packages on The Data Incubator’s Anaconda channel at r-glm2 so you can directly include our channel:

# environment.yaml
channels:
- thedataincubator
dependencies:
- r-glm2