BIO 332 BIOINFORMATICS

BIO 332 BIOINFORMATICS

EXERCISE 3: Retrieving and comparing amino acid sequences from different species

Name: ______________________________________

Abstract

Sequence alignment in bioinformatics is a method of arranging protein sequences to point out areas of similarity that may have resulted from functional, structural, and evolutionary relationships between several sequences. All mammals are related as they evolved from tiny creatures that were dominated by the dinosaurs but the degree of relation varies. Amino acid sequence data is used in science to infer the patterns of evolutionary relationships between species.

The methods used to infer the revolutionary relationship between Mnke whales Balaenoptera sp. Include the retrieval of amino acid sequences from GenBank (NCBI: National Center for Biotechnology Information). The amino acids sequences are then used in Multiple Sequence Alignment (MSA) to study the evolutionary relationship between sequences of three or more species and Pairwise Sequence Alignment to infer the evolutionary relationship of two protein sequences. The sequencing tool used is Tcoffee. This paper seeks to determine the level of truth in the statement multiple sequencing alignment indicate that Minke Whales are closely related to Perissodactyl through phylogenetic inference.

Introduction

Molecular evidence suggests that are closely related to hoofed mammals such as deer, cows, camels, hippos, giraffes and so forth. Today, whales alongside dolphins and porpoises make up the Cetacea group. Many features common in land mammals believe to be related to whales have changed through evolution. Although the many features of land mammals are not present in whales, their embryo grows hind limbs early on in their development but disappear as the development continues. The topic of which group is closely related to whales is still in contention but molecular biology points to the idea that artiodactyls are the closets to Cetaceans where whales belong. The hippo in this paper represents the artiodactyls. Determining the relationship between whales and Perissodactyl will require amino acid sequence alignment. A protein’s amino acid sequence is determined by the DNA sequence and so a gene shared by several closely related species should posses almost identical or just similar amino acid sequences. This is an indication that species that are found to be closely related through sequence alignment share the same ancestry and diverged from similar species or from one another fairly recently in the span of evolution. The evolutionary span is recent because the species have not had enough time to gather random mutations to their genetic codes.

Methods

The first step is retrieving the amino acid sequences of the animals under study including horse, hippopotamus, naked mole rat, minke whale, and kangaroo. The sequences were retrieved from National Center for Biotechnology Information under GenBank. The process included opening the NCBI website and typing the name of the species followed by the phrase pancreatic ribonuclease and selecting the most suitable query based on the optimum number of amino acids, roughly 150 residues. The FASTA format of the protein annotation was then copied into a sequence alignment tool by the name TCoffee. The first step was to identify regions of similarity between all the protein sequences using multiple sequence alignment. The next step was to align two protein sequences and infer the evolutionary relationships between two protein sequences.

Results

The amino-acid sequence Minke whale

Accession number

NCBI Reference Sequence: XP_007180555.1

Source Database

NCBI

>XP_007180555.1 ribonuclease pancreatic [Balaenoptera acutorostrata scammoni]

MAPKSLVLLPWLVLVLLVLGWVQPSLGRESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPV

NTFVHESLEDVKAVCSQKNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGD

PYVPVHFDNSV

The amino-acid sequence of kangaroo

Accession number

sp|P00686.1

Source Database

NCBI

>sp|P00686.1|RNAS1_MACRU RecName: Full=Ribonuclease pancreatic; AltName: Full=RNase 1; AltName: Full=RNase A

ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQENVTCKNGRTN

CYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEGQYVPVHFDAYV

The amino-acid sequence of Naked-mole rat

>EHB02901.1 Ribonuclease pancreatic [Heterocephalus glaber]

MAQKQSLVLFPLLILVLLGLVNTNYCNEMMKCRNMTERCCKLVNTFMHDPLADVQAVCFQKNVTCKNAQT

NFYQSSSNMHITGCRLTSNSKYPTCSYRTRQVERSITVACEGNPYVPGHFDALWSPPPQPEQRLISSLLR

ISTPAFPSLPPKK

The Amino-acid sequence of hippopotamus

Accession Number

>CAA06576.1

Source Database

>CAA06576.1 pancreatic ribonuclease, partial [Hippopotamus amphibius]

KETAAEKFQRQHMDTSSSLSNDSNYCNQMMVRRNMTKDRCKPVNTFVHESEADVKAVCSQKNVTCKNGQT

NCYESNSTMHITDCRETGSSKYPNCAYKTSQLQKHIIVACEGDPYV

The Amino-acid sequence of goat

Accession Number

XP_005685452.1’

Source Database

NCBI

XP_005685452.1 PREDICTED: ribonuclease K6 [Capra hircus]

MGPHLLGCSSLLLLLLGMWWSVCPLCAVPKGLTKARWFEIQHIQPRLLQCNTAMSGVNNYTRHCKPENTF

LHNFFQDVTAVCNLPNIICKNGRHNCHQSPKPVNLTQCNLIAGRYPDCRYHDDAQYKFFVVACDPPQKTD

PPYHLVPVHLDKVV

Figure 1. Result of multiple sequence alignment

CLUSTAL 2.1 Multiple Sequence Alignments

Sequence type explicitly set to Protein

Sequence format is Pearson

Sequence 1: XP_023477016.1 153 aaSequence 2: XP_007180555.1 151 aaSequence 3: sp|P00686.1|RNAS1_MACRU 122 aaSequence 4: EHB02901.1 153 aaStart of Pairwise alignments

Aligning…Sequences (1:2) Aligned. Score: 15.894

Sequences (1:3) Aligned. Score: 14.7541

Sequences (1:4) Aligned. Score: 13.0719

Sequences (2:3) Aligned. Score: 64.7541

Sequences (2:4) Aligned. Score: 47.6821

Sequences (3:4) Aligned. Score: 45.082

Guide tree file created: [clustalw.dnd]There are 3 groups

Start of Multiple Alignment

Aligning…Group 1: Sequences: 2 Score:1840

Group 2: Sequences: 3 Score:982

Group 3: Delayed

Alignment Score 1366

CLUSTAL-Alignment file created [clustalw.aln]

clustalw.aln

CLUSTAL 2.1 multiple sequence alignment

XP_007180555.1 MAPKSLVLLPWLVLVLLVLGWVQPSLGRESPAMKFQRQHMDSGNSPGNNP

sp|P00686.1|RNAS1_MACRU —————————-ETPAEKFQRQHMDTEHSTASSS

EHB02901.1 —————————MAQKQSLVLFPLLILVLLGLVNT

XP_023477016.1 -MAQAVAWLLFLQLVLEETQVVDSKLQIAIKNFRTLHIDYPMVNYPEGFQ

XP_007180555.1 NYCN–QMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQKNVLCKNGRTN

sp|P00686.1|RNAS1_MACRU NYCN–LMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQENVTCKNGRTN

EHB02901.1 NYCN–EMMKCRNMTERCCKLVNTFMHDPLADVQAVCFQKNVTCKNAQTN

XP_023477016.1 GYCNGLMAYVRDVKQSWYCPKTHYVVHAPWKAVREFCKYSESFCENYNEY

.*** . * : .:* . * .* .: *:* .

XP_007180555.1 CYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGDPYVPVHFD

sp|P00686.1|RNAS1_MACRU CYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEG-QYVPVHFD

EHB02901.1 FYQSSSNMHITGCRLTSNSKYPTCSYRTRQVERSITVACEGNPYVPGHFD

XP_023477016.1 CTLTHDSYPLTICSLGSIQPPTSCRYNGTLTNQRLYLLCSRKYDAEPIGI

: . :* * . . ..* *. :: : : *. .

XP_007180555.1 NSV—————————–

sp|P00686.1|RNAS1_MACRU AYV—————————–

EHB02901.1 ALWSPPPQPEQRLISSLLRISTPAFPSLPPKK

XP_023477016.1 IGLY—————————-

clustalw.dnd

(

XP_023477016.1:0.58993,

(

XP_007180555.1:0.16688,

sp|P00686.1|RNAS1_MACRU:0.18558)

:0.08060,

EHB02901.1:0.27935);

Table2. Number of identical residues in aligned ribonuclease among species of mammals

Pairwise comparison Number of identical residues between pairs of species

Horse and Minke Whale 8

Minke whale and kangaroo 9

Minke whale and naked mole rat 9

Kangaroo and naked mole rat 9

Horse and naked mole rat 8

Kangaroo and horse 8

Mink Whale and Hippopotamus 10

Table 3: Percent pairwise sequence similarity matrix

Minke Whale Kangaroo Naked Mole Rat Horse Hippopotamus

Minke Whale 100% 99% 97% 80% 100%

Kangaroo

99% 100% 99% 83% Naked Mole Rat 97% 99% 100% 86% Horse

80% 83% 86% 100% Hippopotamus 100%

Discussion

Q: Based on pairwise comparison among species, what species is most closely related to Minke whale?

The kangaroo is the species that is closely related to the Minke Whale with a score of 99

Q: Which species is most distant to the Minke whale?

The horse is the most distant relative of the minke whale with a score of 80

Phylogenetic Tree

The phylogenic tree indicates that all the four species originated from a common ancestor. However, the horse does not have a more recent common ancestor with the other three species thus is less related to not only the minke whale but the kangaroo and Naked-mole rat as well. The kangaroo shares a more recent ancestor with both the naked-mole rat and the minke whale compared to the horse. The naked-mole rat and the minke whale converge at a more recent common ancestor when we use a method where we follow the branches going backwards towards the root of the tree.