Step 4: Confirming Your Frame! (Part II)
OK, so you think you have an ATP synthase... or you should by now, anyway. Now we can use the most common tool for identification of a sequence, the BLAST. BLAST searches take your sequence and compare them against all the sequences in the GenBank repository, the biggest such database on Earth. This kind of search returns the most similar hits in the database, sequences most similar to the one you submitted, with the most familiar sequences coming up on top of the list.
Sequences are given with a Score, and E-Value and a % Identity. With Score, the larger the number, the closer your sequences are. With E-value, the lower the number, the more similar your sequences are. With identity, the higher the percentage, the more similar your sequences are.
There are multiple kinds of BLAST.
- BLASTN compares a nucleotide sequence against the nucleotide sequences in the database. This is a good choice for rRNA, as the best conservation is at the nucleotide level.
- BLASTP compares a protein sequence against the protein sequences and translations in the database. This is a good choice for genes that produce protein products, as the best conservation is at the amino acid level.
- BLAST can also compare a nucleotide sequence against the protein sequences of the database by BLASTing all six frames. This allows you to skip selecting frames manually, but it takes the search much, much longer to run.
BLAST searches can be limited to specific categories using the "Limit by entrez query" field. For instance, if one is specifically interested in reductases, one can search only for entries with "reductase" in them. One can BLAST against only a specific organism or taxon by specifying "name[Organism]", e.g. "escherichia coli[Organism]". Multiple word termss require quotes, and boolean logic is accepted by the engine when making limits (this AND "that OR this" NOT the other thing).
To BLAST your protein sequence,
- Open BLAST in a new window by right clicking
- Select BLASTP
- Paste your FASTA-formatted amino sequence into the window
- Click BLAST!
- Note that the resulting FORMAT window gives you a preview of what you can expect your sequence to be using conserved domains!
- Click FORMAT
- A new window will open that will return your hit. This will show your results when the search is complete.
Your search will return hits that look like the below. Notice how the most prominent hits are all the expected type of protein, ATP synthases. Also note how many of the hits return with an E-value of zero; this means BLAST believes there is fundamentally no chance these proteins are not equivalent. ATP synthase is an essential and strongly conserved enzyme, so it shouldn't be a big shocker to see a bunch of highly similar hits.Please do not close this window out as you will need it to select sequences for the next part.