- Biopython - Testing Techniques
- Biopython - Machine Learning
- Biopython - Cluster Analysis
- Biopython - Plotting
- Biopython - Phenotype Microarray
- Biopython - Genome Analysis
- Biopython - Population Genetics
- Biopython - BioSQL Module
- Biopython - Motif Objects
- Biopython - PDB Module
- Biopython - Entrez Database
- Biopython - Overview of BLAST
- Biopython - Sequence Alignments
- Sequence I/O Operations
- Advanced Sequence Operations
- Biopython - Sequence
- Creating Simple Application
- Biopython - Installation
- Biopython - Introduction
- Biopython - Home
Biopython Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Biopython - Motif Objects
A sequence motif is a nucleotide or amino-acid sequence pattern. Sequence motifs are formed by three-dimensional arrangement of amino acids which may not be adjacent. Biopython provides a separate module, Bio.motifs to access the functionapties of sequence motif as specified below −
from Bio import motifs
Creating Simple DNA Motif
Let us create a simple DNA motif sequence using the below command −
>>> from Bio import motifs >>> from Bio.Seq import Seq >>> DNA_motif = [ Seq("AGCT"), ... Seq("TCGA"), ... Seq("AACT"), ... ] >>> seq = motifs.create(DNA_motif) >>> print(seq) AGCT TCGA AACT
To count the sequence values, use the below command −
>>> print(seq.counts) 0 1 2 3 A: 2.00 1.00 0.00 1.00 C: 0.00 1.00 2.00 0.00 G: 0.00 1.00 1.00 0.00 T: 1.00 0.00 0.00 2.00
Use the following code to count ‘A’ in the sequence −
>>> seq.counts["A", :] (2, 1, 0, 1)
If you want to access the columns of counts, use the below command −
>>> seq.counts[:, 3] { A : 1, C : 0, T : 2, G : 0}
Creating a Sequence Logo
We shall now discuss how to create a Sequence Logo.
Consider the below sequence −
AGCTTACG ATCGTACC TTCCGAAT GGTACGTA AAGCTTGG
You can create your own logo using the following pnk −
Add the above sequence and create a new logo and save the image named seq.png in your biopython folder.
seq.png
After creating the image, now run the following command −
>>> seq.weblogo("seq.png")
This DNA sequence motif is represented as a sequence logo for the LexA-binding motif.
JASPAR Database
JASPAR is one of the most popular databases. It provides facipties of any of the motif formats for reading, writing and scanning sequences. It stores meta-information for each motif. The module Bio.motifs contains a speciapzed class jaspar.Motif to represent meta-information attributes.
It has the following notable attributes types −
matrix_id − Unique JASPAR motif ID
name − The name of the motif
tf_family − The family of motif, e.g. ’Hepx-Loop-Hepx’
data_type − the type of data used in motif.
Let us create a JASPAR sites format named in sample.sites in biopython folder. It is defined below −
sample.sites >MA0001 ARNT 1 AACGTGatgtccta >MA0001 ARNT 2 CAGGTGggatgtac >MA0001 ARNT 3 TACGTAgctcatgc >MA0001 ARNT 4 AACGTGacagcgct >MA0001 ARNT 5 CACGTGcacgtcgt >MA0001 ARNT 6 cggcctCGCGTGc
In the above file, we have created motif instances. Now, let us create a motif object from the above instances −
>>> from Bio import motifs >>> with open("sample.sites") as handle: ... data = motifs.read(handle,"sites") ... >>> print(data) TF name None Matrix ID None Matrix: 0 1 2 3 4 5 A: 2.00 5.00 0.00 0.00 0.00 1.00 C: 3.00 0.00 5.00 0.00 0.00 0.00 G: 0.00 1.00 1.00 6.00 0.00 5.00 T: 1.00 0.00 0.00 0.00 6.00 0.00
Here, data reads all the motif instances from sample.sites file.
To print all the instances from data, use the below command −
>>> for instance in data.instances: ... print(instance) ... AACGTG CAGGTG TACGTA AACGTG CACGTG CGCGTG
Use the below command to count all the values −
>>> print(data.counts) 0 1 2 3 4 5 A: 2.00 5.00 0.00 0.00 0.00 1.00 C: 3.00 0.00 5.00 0.00 0.00 0.00 G: 0.00 1.00 1.00 6.00 0.00 5.00 T: 1.00 0.00 0.00 0.00 6.00 0.00 >>>Advertisements