Download Geneious User Manual

Transcript
15.5. BLAST ISSUES
201
specified, Geneious will just grab the host address and port settings it specifies and use them
to fill in the fields automatically.
15.5.2
Setting up BLAST for multiple users
The correct solution is to set up a WWWBLAST NCBI mirror locally and mirror all the BLAST
databases as well as add some of your own. This will replace access to the NCBI service itself though. This may be too much for some people so they consider using CustomBLAST to
achieve something similar.
One approach is to provide users with a set of sequences in FASTA format that they can create
a CustomBLAST database from and keep that up to date and have them replace their local
copies. This has the advantage that it is essentially purely parallel so it will scale indefinitely
but it has the disadvantage that you can’t be sure they’re all searching the same database.
Since the Custom BLAST service access a folder on the user’s hard drive, it is possible to put
this folder on a share and have each user point at it. Their CPU will do the work but that data
will be centralised. It is possible that this could cause performance issues over the network
though and you’ll need to deal with ownership and ensure that your users don’t try adding
databases themselves. You don’t need to format the databases yourself from within Geneious
but can use formatdb as normal to create BLAST databases and put them into the data folder.
Geneious users will then be able to see them. You could also consider doing this with symlinks
for some databases and then the users can create their own CustomBLAST databases while
benefitting from your shared ones.
Note that if the database is formatted manually using formatdb, there will be no annotations
on the resulting alignments. If it is formatted from within Geneious, then an extra file is created
with the annotations so Geneious can put them back onto the alignments after a search.
15.5.3
BLASTing short sequences
Users should be aware that there are issues with BLAST when searching for short sequences.
It is not guaranteed that it will find all occurrences of a short sequence in a database so users
should not be surprised. Statistically, even with the word size set to 7 (the minimum for DNA
searches) BLAST will miss 40% of possible hits when dealing with sequences of 20bp. This is
why Biomatters has not implemented Primer BLAST. Users may want to use BLAST to test if
primers match against their sequences because ‘Test with Saved Primers’ requires 5’ extensions
to be annotated so the test will ignore them, but this is also a bad idea since any matches it
does produce will be local alignments rather than full length matches potentially truncating
both ends, not just where the 5’ extension is. It is possible to repurpose the assembler to do this
though so see the chapter on primers.
If the primer has a 5’ extension this should be annotated onto the sequence correctly and then