Step 5: Aligning Your Sequence
We will now select some sequences from which we will make an alignment. An alignment puts sequences next to one another so you can see where they're the same and where they're different. This is useful for everything from designing PCR primers to doing phylogeny to finding polymorphisms. You can do this for either nucleotide or amino acid sequences (but not a mixture, obviously). The first thing we need is a set of sequences. Six will do for today.
- Bring up Windows Notepad.
- Copy your FASTA-formatted amino acid sequence into a new file.
- Pick 5 more sequences from the list of returned hits from your BLASTP search. Use your browser's Find menu to pick the top hit (GI 27375553), the bottom hit (GI 2314285), and then some middle hits like GI 6561623
(a mitochondrion, like most of the returned hits!), GI 57504651 (Campylobacter coli, epsilon Proteobacteria), and GI 56417187 (Anaplasma marginale, alpha Proteobacteria)
- To retrieve the FASTA amino acid sequence for each of these, click the hyperlink that begins the entry and it will take you to the entry for that gene.
- At the top of each gene entry, there is a pair of drop-down menus that reads "Display GenPept Send All to file". Change "GenPept" to "FASTA" and "All to file" to "this page to text" and press Send.
- Cut and paste the sequence into Notepad. Leave a space between each sequence for readability.
- When you have all six sequences in your file, save the file to your desktop as "align.txt".
You now have a file that's ready to be aligned. You will align it using a utility called ClustalW, which is free and one of the standards for this sort of thing. There are two ways you can use ClustalW. You can use the web client or you can download the program to your computer and run it. For the sake of simplicity in the tutorial, we will stay with the web client, but I personally prefer to download and use the command line client.
- Open ClustalW in a new window.
- Enter a title under "Alignment Title".
- Either cut and paste your sequences into the window or upload the file from desktop using the Browse button.
- Submit it.
- When the results come up, you can save your alignment file (.aln) and tree files (.dnd) to your disk to save the alignment.
- Scrolling down the page, you can see your alignment in ASCII characters and a basic version of your produced tree.
- You can also click "JalView" to view your alignment in a Java editor. This will bring up a Java client that will let you view your alignment. You can use this viewer to find conserved regions, gaps and polymorphisms.
- You can save the alignment by going to "File" and selecting "Output alignment via text box". Clicking "Apply" will give you FASTA-formatted sequences you can save in a text file.
- Select "Calculate" and then "Average distance tree using PID". You will have a tree automatically generated which looks like the below. You can change font sizes and add distances.
- Select "Calculate" and then "Neighbor joining tree using PID". You will have a tree automatically generated which looks like the below. You can change font sizes and add distances.
A couple of technical notes:
- Saving the JalView-generated trees is a pain using their save function. It is easier to use the Print Screen button to copy a screen shot to the clipboard and then crop your tree out of it by pasting the image into an image editor.
- ClustalW will also let you make Phylip-format trees (.ph) for use in other programs by selecting "Phylip" under "Tree type" when inputting your sequences into the ClustalW window.
- The downloadable version of ClustalW will also let you bootstrap your trees, producing a .phb file.