monoseq

monoseq is a Python library for pretty-printing DNA and protein sequences using a monospace font. It also provides a simple command line interface.

Sequences are pretty-printed in the traditional way using blocks of letters where each line is prefixed with the sequence position. User-specified regions are highlighted and the output format can be HTML or plaintext with optional styling using ANSI escape codes for use in a terminal.

A simple example:

>>> from monoseq import pprint_sequence
>>> sequence = 'MIMANQPLWLDSEVEMNHYQQSHIKSKSPYFPEDKHICWIKIFKAFGT' * 4
>>> print pprint_sequence(sequence)
  1  MIMANQPLWL DSEVEMNHYQ QSHIKSKSPY FPEDKHICWI KIFKAFGTMI MANQPLWLDS
 61  EVEMNHYQQS HIKSKSPYFP EDKHICWIKI FKAFGTMIMA NQPLWLDSEV EMNHYQQSHI
121  KSKSPYFPED KHICWIKIFK AFGTMIMANQ PLWLDSEVEM NHYQQSHIKS KSPYFPEDKH
181  ICWIKIFKAF GT

An example, admittedly contrived, with annotations:

>>> from monoseq import AnsiFormat
>>> twelves = [(p, p + 1) for p in range(11, len(sequence), 12)]
>>> conserved = [[(11, 37), (222, 247)]
>>> middle = [(len(sequence) / 3, len(sequence) / 3 * 2)]
>>> print pprint_sequence(sequence, format=AnsiFormat,
...                       annotations=[conserved, twelves, middle])
  1  cgcactcaaa acaaaggaag accgtcctcg actgcagagg aagcaggaag ctgtcggccc
 61  agctctgagc ccagctgctg gagccccgag cagcggcatg gagtccgtgg ccctgtacag
121  ctttcaggct acagagagcg acgagctggc cttcaacaag ggagacacac tcaagatcct
181  gaacatggag gatgaccaga actggtacaa ggccgagctc cggggtgtcg agggatttat
241  tcccaagaac tacatccgcg tcaag

This IPython Notebook shows how to pretty-print sequences in an IPython Notebook.

API reference

Documentation on a specific function, class or method can be found in the API reference.

Indices and tables