English 中文(简体)
Biopython - Plotting
  • 时间:2024-12-22

Biopython - Plotting


Previous Page Next Page  

This chapter explains about how to plot sequences. Before moving to this topic, let us understand the basics of plotting.

Plotting

Matplotpb is a Python plotting pbrary which produces quapty figures in a variety of formats. We can create different types of plots pke pne chart, histograms, bar chart, pie chart, scatter chart, etc.

pyLab is a module that belongs to the matplotpb which combines the numerical module numpy with the graphical plotting module pyplot.Biopython uses pylab module for plotting sequences. To do this, we need to import the below code −

import pylab

Before importing, we need to install the matplotpb package using pip command with the command given below −

pip install matplotpb

Sample Input File

Create a sample file named plot.fasta in your Biopython directory and add the following changes −

>seq0 FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF 
>seq1 KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME 
>seq2 EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK 
>seq3 MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDV
>seq4 EEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVVSYEMRLFGVQKDNFALEHSLL 
>seq5 SWEEFAKAAEVLYLEDPMKCRMCTKYRHVDHKLVVKLTDNHTVLKYVTDMAQDVKKIEKLTTLLMR 
>seq6 FTNWEEFAKAAERLHSANPEKCRFVTKYNHTKGELVLKLTDDVVCLQYSTNQLQDVKKLEKLSSTLLRSI 
>seq7 SWEEFVERSVQLFRGDPNATRYVMKYRHCEGKLVLKVTDDRECLKFKTDQAQDAKKMEKLNNIFF 
>seq8 SWDEFVDRSVQLFRADPESTRYVMKYRHCDGKLVLKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM 
>seq9 KNWEDFEIAAENMYMANPQNCRYTMKYVHSKGHILLKMSDNVKCVQYRAENMPDLKK
>seq10 FDSWDEFVSKSVELFRNHPDTTRYVVKYRHCEGKLVLKVTDNHECLKFKTDQAQDAKKMEK

Line Plot

Now, let us create a simple pne plot for the above fasta file.

Step 1 − Import SeqIO module to read fasta file.

>>> from Bio import SeqIO

Step 2 − Parse the input file.

>>> records = [len(rec) for rec in SeqIO.parse("plot.fasta", "fasta")] 
>>> len(records) 
11 
>>> max(records) 
72 
>>> min(records) 
57

Step 3 − Let us import pylab module.

>>> import pylab

Step 4 − Configure the pne chart by assigning x and y axis labels.

>>> pylab.xlabel("sequence length") 
Text(0.5, 0,  sequence length ) 

>>> pylab.ylabel("count") 
Text(0, 0.5,  count ) 
>>>

Step 5 − Configure the pne chart by setting grid display.

>>> pylab.grid()

Step 6 − Draw simple pne chart by calpng plot method and supplying records as input.

>>> pylab.plot(records) 
[<matplotpb.pnes.Line2D object at 0x10b6869d 0>]

Step 7 − Finally save the chart using the below command.

>>> pylab.savefig("pnes.png")

Result

After executing the above command, you could see the following image saved in your Biopython directory.

Line Plot

Histogram Chart

A histogram is used for continuous data, where the bins represent ranges of data. Drawing histogram is same as pne chart except pylab.plot. Instead, call hist method of pylab module with records and some custum value for bins (5). The complete coding is as follows −

Step 1 − Import SeqIO module to read fasta file.

>>> from Bio import SeqIO

Step 2 − Parse the input file.

>>> records = [len(rec) for rec in SeqIO.parse("plot.fasta", "fasta")] 
>>> len(records) 
11 
>>> max(records) 
72 
>>> min(records) 
57

Step 3 − Let us import pylab module.

>>> import pylab

Step 4 − Configure the pne chart by assigning x and y axis labels.

>>> pylab.xlabel("sequence length") 
Text(0.5, 0,  sequence length ) 

>>> pylab.ylabel("count") 
Text(0, 0.5,  count ) 
>>>

Step 5 − Configure the pne chart by setting grid display.

>>> pylab.grid()

Step 6 − Draw simple pne chart by calpng plot method and supplying records as input.

>>> pylab.hist(records,bins=5) 
(array([2., 3., 1., 3., 2.]), array([57., 60., 63., 66., 69., 72.]), <a pst 
of 5 Patch objects>) 
>>>

Step 7 − Finally save the chart using the below command.

>>> pylab.savefig("hist.png")

Result

After executing the above command, you could see the following image saved in your Biopython directory.

Histogram Chart

GC Percentage in Sequence

GC percentage is one of the commonly used analytic data to compare different sequences. We can do a simple pne chart using GC Percentage of a set of sequences and immediately compare it. Here, we can just change the data from sequence length to GC percentage. The complete coding is given below −

Step 1 − Import SeqIO module to read fasta file.

>>> from Bio import SeqIO

Step 2 − Parse the input file.

>>> from Bio.SeqUtils import GC 
>>> gc = sorted(GC(rec.seq) for rec in SeqIO.parse("plot.fasta", "fasta"))

Step 3 − Let us import pylab module.

>>> import pylab

Step 4 − Configure the pne chart by assigning x and y axis labels.

>>> pylab.xlabel("Genes") 
Text(0.5, 0,  Genes ) 

>>> pylab.ylabel("GC Percentage") 
Text(0, 0.5,  GC Percentage ) 
>>>

Step 5 − Configure the pne chart by setting grid display.

>>> pylab.grid()

Step 6 − Draw simple pne chart by calpng plot method and supplying records as input.

>>> pylab.plot(gc) 
[<matplotpb.pnes.Line2D object at 0x10b6869d 0>]

Step 7 − Finally save the chart using the below command.

>>> pylab.savefig("gc.png")

Result

After executing the above command, you could see the following image saved in your Biopython directory.

GC Percentage in Sequence Advertisements