Installing Python packages; useful Python packages

The easiest way to install Python packages is to use pip, the Python package installer.

pip and Anaconda on OS X

If you’re using anaconda on OS X, you have pip installed; but you will need to refer to the python package installer by a full path name:


More generally, you’ll need to prefix many of the commands below with ‘~/anaconda/bin/’. You can set this as a default for your current shell by doing:

export PATH=~/anaconda/bin:$PATH

(or you can add that command to the file ~/.bashrc using nano. Ask a TA for help!)

OPTIONAL: Using virtualenv

Down the road, if you’re running on a machine where you don’t have sysadmin access, you can use a python package called ‘virtualenv’ to set up your own installation of Python into which you can install your own packages. Once virtualenv is installed (by a sysadmin, presumably) It’s as simple as

python -m virtualenv NAME

where NAME is the name of your workspace, e.g.

python -m virtualenv env

followed by

. env/bin/activate

From that point on, you will be able to use pip to install things within this workspace, and Python (again from within that workspace) will be able to access and use those installed packages.

Python packages

There are, literally, thousands of Python packages. The basic deal is this: Python comes with “batteries included”, which means that you can do amazing numbers of things with just a basic Python install. The anaconda install and VirtualBox virtual machine come with tons more stuff. But there’s always the need to use an updated version of something, or a little package that someone wrote that addresses just your concern... so you’ll always need to install stuff.

Here’s how to install and use some potentially useful packages from my lab, but there’s a whole world of Python packages out there. See for packages that come included with Python, and for the Python package index for third-party packages.


Screed is a little Python package from Titus’s lab that reads in DNA sequences – more explicitly, it’s a FASTA and FASTQ parser. You can see some documentation here:

But how do you use it?

To install screed directly from github, do:

pip install git+

Using screed:

screed can read FASTA and FASTQ files, as well as gzip or bzip2 versions of those files. For example, in the python directory there is a file called ‘25k.fq.gz’.


All of the below screed commands are in the using-screed.ipynb notebook.

screed, in a nutshell, lets you read in all that data and access it in Python. Try:

import screed
for record in'/path/to/2012-11-scripps/python/25k.fq.gz'):
   print record.sequence
   print record.accuracy

A couple of points here.

First, there are 25,000 sequences in this file. You might want to avoid printing them all out (hence the ‘break’ command at the end of the loop!) This is a typical approach to reading through big files – just put in a “if I’ve done more than 10 things, stop”

Second, you can use this for short read data or genomic sequences or whatever. We’ve mostly designed it for short-read data but it works fine for genome-scale data (which is, after all, rather smaller than most short-read data...)

Third, you can open any kind of sequence file with this command.

This can be a simple and handy way to extract a particular sequence from a large file –

for record in'/path/to/2012-11-scripps/python/25k.fq.gz'):
   if == '@895:1:4:1596:8538/2':

# do stuff with record

You can even pull out a list:

list_of_names = ['@895:1:4:1596:8538/2', '@895:1:4:1596:6003/2']
list_of_records = []

for record in'/path/to/2012-11-scripps/python/25k.fq.gz'):
   if in list_of_names:

# do stuff with list_of_records

(You might want to use a ‘set’ here, note.)

So how is this stuff useful!?

Well, here’s one simple example –

n = 0.
m = 0.
for record in'/path/to/2012-11-scripps/python/25k.fq.gz'):
   n += len(record.sequence)
   m += record.sequence.count('G') + record.sequence.count('C')

print '%.3f G/C content' % (m / n,)

You can also do your quality trimming, or analysis of the first bases, or... whatever.

Another example –

outfp = open('out.fa', 'w')
for record in'/path/to/2012-11-scripps/python/25k.fq.gz'):
   outfp.write('>%s\n%s\n' % (, record.sequence))

This converts FASTQ to FASTA.

Also see the IPython Notebook, using-screed.ipynb.

comments powered by Disqus

Edit this document!

This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.

  1. Go to Installing Python packages; useful Python packages on GitHub.
  2. Edit files using GitHub's text editor in your web browser (see the 'Edit' tab on the top right of the file)
  3. Fill in the Commit message text box at the bottom of the page describing why you made the changes. Press the Propose file change button next to it when done.
  4. Then click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format please see the reST primer.