- Biopython - Testing Techniques
- Biopython - Machine Learning
- Biopython - Cluster Analysis
- Biopython - Plotting
- Biopython - Phenotype Microarray
- Biopython - Genome Analysis
- Biopython - Population Genetics
- Biopython - BioSQL Module
- Biopython - Motif Objects
- Biopython - PDB Module
- Biopython - Entrez Database
- Biopython - Overview of BLAST
- Biopython - Sequence Alignments
- Sequence I/O Operations
- Advanced Sequence Operations
- Biopython - Sequence
- Creating Simple Application
- Biopython - Installation
- Biopython - Introduction
- Biopython - Home
Biopython Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Biopython - Plotting
This chapter explains about how to plot sequences. Before moving to this topic, let us understand the basics of plotting.
Plotting
Matplotpb is a Python plotting pbrary which produces quapty figures in a variety of formats. We can create different types of plots pke pne chart, histograms, bar chart, pie chart, scatter chart, etc.
pyLab is a module that belongs to the matplotpb which combines the numerical module numpy with the graphical plotting module pyplot.Biopython uses pylab module for plotting sequences. To do this, we need to import the below code −
import pylab
Before importing, we need to install the matplotpb package using pip command with the command given below −
pip install matplotpb
Sample Input File
Create a sample file named plot.fasta in your Biopython directory and add the following changes −
>seq0 FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF >seq1 KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME >seq2 EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK >seq3 MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDV >seq4 EEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVVSYEMRLFGVQKDNFALEHSLL >seq5 SWEEFAKAAEVLYLEDPMKCRMCTKYRHVDHKLVVKLTDNHTVLKYVTDMAQDVKKIEKLTTLLMR >seq6 FTNWEEFAKAAERLHSANPEKCRFVTKYNHTKGELVLKLTDDVVCLQYSTNQLQDVKKLEKLSSTLLRSI >seq7 SWEEFVERSVQLFRGDPNATRYVMKYRHCEGKLVLKVTDDRECLKFKTDQAQDAKKMEKLNNIFF >seq8 SWDEFVDRSVQLFRADPESTRYVMKYRHCDGKLVLKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM >seq9 KNWEDFEIAAENMYMANPQNCRYTMKYVHSKGHILLKMSDNVKCVQYRAENMPDLKK >seq10 FDSWDEFVSKSVELFRNHPDTTRYVVKYRHCEGKLVLKVTDNHECLKFKTDQAQDAKKMEK
Line Plot
Now, let us create a simple pne plot for the above fasta file.
Step 1 − Import SeqIO module to read fasta file.
>>> from Bio import SeqIO
Step 2 − Parse the input file.
>>> records = [len(rec) for rec in SeqIO.parse("plot.fasta", "fasta")] >>> len(records) 11 >>> max(records) 72 >>> min(records) 57
Step 3 − Let us import pylab module.
>>> import pylab
Step 4 − Configure the pne chart by assigning x and y axis labels.
>>> pylab.xlabel("sequence length") Text(0.5, 0, sequence length ) >>> pylab.ylabel("count") Text(0, 0.5, count ) >>>
Step 5 − Configure the pne chart by setting grid display.
>>> pylab.grid()
Step 6 − Draw simple pne chart by calpng plot method and supplying records as input.
>>> pylab.plot(records) [<matplotpb.pnes.Line2D object at 0x10b6869d 0>]
Step 7 − Finally save the chart using the below command.
>>> pylab.savefig("pnes.png")
Result
After executing the above command, you could see the following image saved in your Biopython directory.
Histogram Chart
A histogram is used for continuous data, where the bins represent ranges of data. Drawing histogram is same as pne chart except pylab.plot. Instead, call hist method of pylab module with records and some custum value for bins (5). The complete coding is as follows −
Step 1 − Import SeqIO module to read fasta file.
>>> from Bio import SeqIO
Step 2 − Parse the input file.
>>> records = [len(rec) for rec in SeqIO.parse("plot.fasta", "fasta")] >>> len(records) 11 >>> max(records) 72 >>> min(records) 57
Step 3 − Let us import pylab module.
>>> import pylab
Step 4 − Configure the pne chart by assigning x and y axis labels.
>>> pylab.xlabel("sequence length") Text(0.5, 0, sequence length ) >>> pylab.ylabel("count") Text(0, 0.5, count ) >>>
Step 5 − Configure the pne chart by setting grid display.
>>> pylab.grid()
Step 6 − Draw simple pne chart by calpng plot method and supplying records as input.
>>> pylab.hist(records,bins=5) (array([2., 3., 1., 3., 2.]), array([57., 60., 63., 66., 69., 72.]), <a pst of 5 Patch objects>) >>>
Step 7 − Finally save the chart using the below command.
>>> pylab.savefig("hist.png")
Result
After executing the above command, you could see the following image saved in your Biopython directory.
GC Percentage in Sequence
GC percentage is one of the commonly used analytic data to compare different sequences. We can do a simple pne chart using GC Percentage of a set of sequences and immediately compare it. Here, we can just change the data from sequence length to GC percentage. The complete coding is given below −
Step 1 − Import SeqIO module to read fasta file.
>>> from Bio import SeqIO
Step 2 − Parse the input file.
>>> from Bio.SeqUtils import GC >>> gc = sorted(GC(rec.seq) for rec in SeqIO.parse("plot.fasta", "fasta"))
Step 3 − Let us import pylab module.
>>> import pylab
Step 4 − Configure the pne chart by assigning x and y axis labels.
>>> pylab.xlabel("Genes") Text(0.5, 0, Genes ) >>> pylab.ylabel("GC Percentage") Text(0, 0.5, GC Percentage ) >>>
Step 5 − Configure the pne chart by setting grid display.
>>> pylab.grid()
Step 6 − Draw simple pne chart by calpng plot method and supplying records as input.
>>> pylab.plot(gc) [<matplotpb.pnes.Line2D object at 0x10b6869d 0>]
Step 7 − Finally save the chart using the below command.
>>> pylab.savefig("gc.png")
Result
After executing the above command, you could see the following image saved in your Biopython directory.
Advertisements