BIO 332 BIOINFORMATICS
EXERCISE 3: Retrieving and comparing amino acid sequences from different species
Name: ______________________________________
Abstract
Sequence alignment in bioinformatics is a method of arranging protein sequences to point out areas of similarity that may have resulted from functional, structural, and evolutionary relationships between several sequences. All mammals are related as they evolved from tiny creatures that were dominated by the dinosaurs but the degree of relation varies. Amino acid sequence data is used in science to infer the patterns of evolutionary relationships between species.
The methods used to infer the revolutionary relationship between Mnke whales Balaenoptera sp. Include the retrieval of amino acid sequences from GenBank (NCBI: National Center for Biotechnology Information). The amino acids sequences are then used in Multiple Sequence Alignment (MSA) to study the evolutionary relationship between sequences of three or more species and Pairwise Sequence Alignment to infer the evolutionary relationship of two protein sequences. The sequencing tool used is Tcoffee. This paper seeks to determine the level of truth in the statement multiple sequencing alignment indicate that Minke Whales are closely related to Perissodactyl through phylogenetic inference.
Introduction
Molecular evidence suggests that are closely related to hoofed mammals such as deer, cows, camels, hippos, giraffes and so forth. Today, whales alongside dolphins and porpoises make up the Cetacea group. Many features common in land mammals believe to be related to whales have changed through evolution. Although the many features of land mammals are not present in whales, their embryo grows hind limbs early on in their development but disappear as the development continues. The topic of which group is closely related to whales is still in contention but molecular biology points to the idea that artiodactyls are the closets to Cetaceans where whales belong. The hippo in this paper represents the artiodactyls. Determining the relationship between whales and Perissodactyl will require amino acid sequence alignment. A protein’s amino acid sequence is determined by the DNA sequence and so a gene shared by several closely related species should posses almost identical or just similar amino acid sequences. This is an indication that species that are found to be closely related through sequence alignment share the same ancestry and diverged from similar species or from one another fairly recently in the span of evolution. The evolutionary span is recent because the species have not had enough time to gather random mutations to their genetic codes.
Methods
The first step is retrieving the amino acid sequences of the animals under study including horse, hippopotamus, naked mole rat, minke whale, and kangaroo. The sequences were retrieved from National Center for Biotechnology Information under GenBank. The process included opening the NCBI website and typing the name of the species followed by the phrase pancreatic ribonuclease and selecting the most suitable query based on the optimum number of amino acids, roughly 150 residues. The FASTA format of the protein annotation was then copied into a sequence alignment tool by the name TCoffee. The first step was to identify regions of similarity between all the protein sequences using multiple sequence alignment. The next step was to align two protein sequences and infer the evolutionary relationships between two protein sequences.
Results
The amino-acid sequence Minke whale
Accession number
NCBI Reference Sequence: XP_007180555.1
Source Database
NCBI
>XP_007180555.1 ribonuclease pancreatic [Balaenoptera acutorostrata scammoni]
MAPKSLVLLPWLVLVLLVLGWVQPSLGRESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPV
NTFVHESLEDVKAVCSQKNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGD
PYVPVHFDNSV
The amino-acid sequence of kangaroo
Accession number
sp|P00686.1
Source Database
NCBI
>sp|P00686.1|RNAS1_MACRU RecName: Full=Ribonuclease pancreatic; AltName: Full=RNase 1; AltName: Full=RNase A
ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQENVTCKNGRTN
CYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEGQYVPVHFDAYV
The amino-acid sequence of Naked-mole rat
>EHB02901.1 Ribonuclease pancreatic [Heterocephalus glaber]
MAQKQSLVLFPLLILVLLGLVNTNYCNEMMKCRNMTERCCKLVNTFMHDPLADVQAVCFQKNVTCKNAQT
NFYQSSSNMHITGCRLTSNSKYPTCSYRTRQVERSITVACEGNPYVPGHFDALWSPPPQPEQRLISSLLR
ISTPAFPSLPPKK
The Amino-acid sequence of hippopotamus
Accession Number
>CAA06576.1
Source Database
>CAA06576.1 pancreatic ribonuclease, partial [Hippopotamus amphibius]
KETAAEKFQRQHMDTSSSLSNDSNYCNQMMVRRNMTKDRCKPVNTFVHESEADVKAVCSQKNVTCKNGQT
NCYESNSTMHITDCRETGSSKYPNCAYKTSQLQKHIIVACEGDPYV
The Amino-acid sequence of goat
Accession Number
XP_005685452.1’
Source Database
NCBI
XP_005685452.1 PREDICTED: ribonuclease K6 [Capra hircus]
MGPHLLGCSSLLLLLLGMWWSVCPLCAVPKGLTKARWFEIQHIQPRLLQCNTAMSGVNNYTRHCKPENTF
LHNFFQDVTAVCNLPNIICKNGRHNCHQSPKPVNLTQCNLIAGRYPDCRYHDDAQYKFFVVACDPPQKTD
PPYHLVPVHLDKVV
Figure 1. Result of multiple sequence alignment
CLUSTAL 2.1 Multiple Sequence Alignments
Sequence type explicitly set to Protein
Sequence format is Pearson
Sequence 1: XP_023477016.1 153 aaSequence 2: XP_007180555.1 151 aaSequence 3: sp|P00686.1|RNAS1_MACRU 122 aaSequence 4: EHB02901.1 153 aaStart of Pairwise alignments
Aligning…Sequences (1:2) Aligned. Score: 15.894
Sequences (1:3) Aligned. Score: 14.7541
Sequences (1:4) Aligned. Score: 13.0719
Sequences (2:3) Aligned. Score: 64.7541
Sequences (2:4) Aligned. Score: 47.6821
Sequences (3:4) Aligned. Score: 45.082
Guide tree file created: [clustalw.dnd]There are 3 groups
Start of Multiple Alignment
Aligning…Group 1: Sequences: 2 Score:1840
Group 2: Sequences: 3 Score:982
Group 3: Delayed
Alignment Score 1366
CLUSTAL-Alignment file created [clustalw.aln]
clustalw.aln
CLUSTAL 2.1 multiple sequence alignment
XP_007180555.1 MAPKSLVLLPWLVLVLLVLGWVQPSLGRESPAMKFQRQHMDSGNSPGNNP
sp|P00686.1|RNAS1_MACRU —————————-ETPAEKFQRQHMDTEHSTASSS
EHB02901.1 —————————MAQKQSLVLFPLLILVLLGLVNT
XP_023477016.1 -MAQAVAWLLFLQLVLEETQVVDSKLQIAIKNFRTLHIDYPMVNYPEGFQ
XP_007180555.1 NYCN–QMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQKNVLCKNGRTN
sp|P00686.1|RNAS1_MACRU NYCN–LMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQENVTCKNGRTN
EHB02901.1 NYCN–EMMKCRNMTERCCKLVNTFMHDPLADVQAVCFQKNVTCKNAQTN
XP_023477016.1 GYCNGLMAYVRDVKQSWYCPKTHYVVHAPWKAVREFCKYSESFCENYNEY
.*** . * : .:* . * .* .: *:* .
XP_007180555.1 CYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGDPYVPVHFD
sp|P00686.1|RNAS1_MACRU CYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEG-QYVPVHFD
EHB02901.1 FYQSSSNMHITGCRLTSNSKYPTCSYRTRQVERSITVACEGNPYVPGHFD
XP_023477016.1 CTLTHDSYPLTICSLGSIQPPTSCRYNGTLTNQRLYLLCSRKYDAEPIGI
: . :* * . . ..* *. :: : : *. .
XP_007180555.1 NSV—————————–
sp|P00686.1|RNAS1_MACRU AYV—————————–
EHB02901.1 ALWSPPPQPEQRLISSLLRISTPAFPSLPPKK
XP_023477016.1 IGLY—————————-
clustalw.dnd
(
XP_023477016.1:0.58993,
(
XP_007180555.1:0.16688,
sp|P00686.1|RNAS1_MACRU:0.18558)
:0.08060,
EHB02901.1:0.27935);
Table2. Number of identical residues in aligned ribonuclease among species of mammals
Pairwise comparison Number of identical residues between pairs of species
Horse and Minke Whale 8
Minke whale and kangaroo 9
Minke whale and naked mole rat 9
Kangaroo and naked mole rat 9
Horse and naked mole rat 8
Kangaroo and horse 8
Mink Whale and Hippopotamus 10
Table 3: Percent pairwise sequence similarity matrix
Minke Whale Kangaroo Naked Mole Rat Horse Hippopotamus
Minke Whale 100% 99% 97% 80% 100%
Kangaroo
99% 100% 99% 83% Naked Mole Rat 97% 99% 100% 86% Horse
80% 83% 86% 100% Hippopotamus 100%
Discussion
Q: Based on pairwise comparison among species, what species is most closely related to Minke whale?
The kangaroo is the species that is closely related to the Minke Whale with a score of 99
Q: Which species is most distant to the Minke whale?
The horse is the most distant relative of the minke whale with a score of 80
Phylogenetic Tree
The phylogenic tree indicates that all the four species originated from a common ancestor. However, the horse does not have a more recent common ancestor with the other three species thus is less related to not only the minke whale but the kangaroo and Naked-mole rat as well. The kangaroo shares a more recent ancestor with both the naked-mole rat and the minke whale compared to the horse. The naked-mole rat and the minke whale converge at a more recent common ancestor when we use a method where we follow the branches going backwards towards the root of the tree.