Caution: fastacmd is not case-sensitive

I have used the NCBI BLAST toolkit to extract sequences from a fasta-formatted file. Briefly, a fasta file can be formatted with the program formatdb and then sequences can be extracted from the database using fastacmd.

Note, fastacmd is not case-sensitive, which means that one can use the following two commands to retrieve the same sequence ‘p53’:

fastacmd -d testdb -s ‘p53’
fastacmd -d testdb -s ‘P53’

This feature however raises a problem when two different sequences in a database have the same name but different cases. For example, if ‘P53’ and ‘p53’ in a database represents two different sequences, then the above two commands will always extract the same sequence (the one near the beginning of the database).

If this is an issue, one can use UCSC’s twoBitToFa or samtools faidx to extract sequences, which need an exact match in sequence names.

Hope this tip helps.

Caution: fastacmd is not case-sensitive

Caution: fastacmd is not case-sensitive

How to add Chinese Pinyin (拼音) in Microsoft Word 2007?

Install Jekyll on Windows

ROC curve and Area Under ROC Curve (AUC)

The history of sequencing in industry

Conditional regular expression and Branch reset in Perl

My paper on Drosophila X chromosome regulation is online now

A note on Globus