Download CLC Main Workbench

Transcript
CHAPTER 12. BLAST SEARCH
218
to improve performance). If you have added new databases that are not listed, you can press
Refresh Locations to clear the cache and search the database locations again.
By default a BLAST database location will be added under your home area in a folder called
CLCdatabases. This folder is scanned recursively, through all subfolders, to look for valid
databases. All other folderlocations are scanned only at the top level.
Below the list of locations, all the BLAST databases are listed with the following information:
• Name. The name of the BLAST database.
• Description. Detailed description of the contents of the database.
• Date. The date the database was created.
• Sequences. The number of sequences in the database.
• Type. The type can be either nucleotide (DNA) or protein.
• Total size (1000 residues). The number of residues in the database, either bases or amino
acid.
• Location. The location of the database.
Below the list of BLAST databases, there is a button to Remove Database. This option will delete
the database files belonging to the database selected.
12.4.1
Migrating from a previous version of the Workbench
In versions released before 2011, the BLAST database management was very different from this.
In order to migrate from the older versions, please add the folders of the old BLAST databases
as locations in the BLAST database manager (see section 12.4). The old representations of the
BLAST databases in the Navigation Area can be deleted.
If you have saved the BLAST databases in the default folder, they will automatically appear
because the default database location used in CLC Main Workbench 6.1 is the same as the
default folder specified for saving BLAST databases in the old version.
12.5
Bioinformatics explained: BLAST
BLAST (Basic Local Alignment Search Tool) has become the defacto standard in search and
alignment tools [Altschul et al., 1990]. The BLAST algorithm is still actively being developed
and is one of the most cited papers ever written in this field of biology. Many researchers
use BLAST as an initial screening of their sequence data from the laboratory and to get an
idea of what they are working on. BLAST is far from being basic as the name indicates; it
is a highly advanced algorithm which has become very popular due to availability, speed, and
accuracy. In short, a BLAST search identifies homologous sequences by searching one or
more databases usually hosted by NCBI (http://www.ncbi.nlm.nih.gov/), on the query
sequence of interest [McGinnis and Madden, 2004].
BLAST is an open source program and anyone can download and change the program code. This
has also given rise to a number of BLAST derivatives; WU-BLAST is probably the most commonly
used [Altschul and Gish, 1996].