Caution: fastacmd is not case-sensitive

I have used the NCBI BLAST toolkit to extract sequences from a fasta-formatted file. Briefly, a fasta file can be formatted with the program formatdb and then sequences can be extracted from the database using fastacmd.

Note, fastacmd is not case-sensitive, which means that one can use the following two commands to retrieve the same sequence ‘p53’:

fastacmd -d testdb -s ‘p53’
fastacmd -d testdb -s ‘P53’

This feature however raises a problem when two different sequences in a database have the same name but different cases. For example, if ‘P53’ and ‘p53’ in a database represents two different sequences, then the above two commands will always extract the same sequence (the one near the beginning of the database).

If this is an issue, one can use UCSC’s twoBitToFa or samtools faidx to extract sequences, which need an exact match in sequence names.

Hope this tip helps.

Comments are closed.