Distribute Perl Modules to CPAN

I have used Perl in my research for 10 years, but I did not distribute my codes to CPAN at all, partially because distributing packages to CPAN needs extra effort other than simply uploading files there.

Last week, I contributed one package Bio-CUA to CPAN. This is a package to compute codon usage bias metrics, which is important for genetics studies. The most valuable of this package is its flexibility and comprehensiveness: it allows user to compute metrics for any species or tissues, and can compute all metrics (CAI, tAI, Fop, ENC).

Through the distributing process, I also learned the basics of how to make a package distributable to CPAN, so I would like to share them here. Since there are already many posts discussing how to create Perl modules (see the references at the end), I will describe the process briefly, focusing on the steps and some noteworthy points.

The steps are:

1. use module-start to create the skeleton structure of a package:

module-starter –module=Bio::CUA::CUB –distro=Bio-CUA –author=”Zhenguo Zhang” –email=”zhangz.sci@gmail.com” –builder=”ExtUtils::MakeMaker” –ignores=git,manifest –license=gpl3

check the document of module-start for the usage. Here I specify –distro=Bio-CUA because I want to put all the stuffs in the directory Bio-CUA. One can also specify more –module options which will simply create the module files in the corresponding directory; this is not essential because one can create them by yourself by simply copy one created module file and modify it.

The resulted folders include lib/, t/, xt/, Makefile.PL and other information files. If you have some perl scripts to distribute, make a directory bin/ there to store the scripts.

2. Modify all the module files in the lib/ directory Bio-CUA to meet your needs. If you already have written the modules, just copy the content into the new module files. It needs some time to write POD document in each module so that the beautiful html documents can be produced by CPAN.

3. Write tests for the package. All the files in the directory t/ ending with ‘.t‘ will be run during ‘make test‘ step. Writing good test files will make sure the modules run well in users’ computer environment. The module used to write tests is ‘Test::More‘ which is very convenient. Check examples there.

Note that when you need read and write files during testing, keep in mind the tests are run from the top directory (here Bio-CUA), so you need ‘t/file.txt‘ to refer to the file under directory ‘t/‘, similarly for files in other directories. Related to this, you need use system-dependent way to locate files, such as ‘/’ and ” being path separators in Linux and Windows respectively. For safety, use the method File::Spec->catfile(“t”,’file.txt’) to create a correct file path.

4. Modify Makefile.PL. This is the most important step for distributing the modules. Run ‘perl Makefile.PL‘ will generate ‘Makefile‘ according to the user’s system environment. Basically, this file contains one function ‘WriteMakefile‘ which uses a hash as parameter. The hash keys are the attributes to set. Check ExtUtils::MakeMaker for the supported attributes. Some important attributes are ‘VERSION_FROM‘ (read version number from the variable ‘$VERSION’ in the specified module), ‘BUILD_REQUIRES‘ (required modules to build the distribution during installation but not necessary when run it), ‘PREREQ_PM‘ (dependent modules for running the distribution).

In the Makefile.PL, one can add one module ‘MY‘ at the end, and create one function ‘postamble‘. The returned value of this function will simply written into the generated Makefile, so make sure the value is understandable by different types of makes, such as dmake/nmake in windows.

5. Package the distribution. Before packing everything into a ‘tar.gz’ file, you need modify the file MANIFEST to indicate which files are finally included in the final distribution. You can also set MANIFEST.SKIP to skip unwanted files. Any files can be included in the list to be distributed, not just .pm and those generated by module-start.

Then run

perl Makefile.PL
make
make test
make dist

This produced a file Bio-CUA-1.02.tar.gz; 1.02 was my version number read from the module ‘Bio::CUA’. Bio-CUA-1.02.tar.gz is ready to upload to PAUSE, the server of file uploading for CPAN.

6. Apply an account of PAUSE. You need register one account of PAUSE if you do not have one. After login, click ‘Upload a file to CPAN’ to upload your file. You can also try cpan-upload-http if you want to load from command line.

7. Correct errors. The files appear at search.cpan.org in about one hour. My is at http://search.cpan.org/~fortune/ where ‘fortune’ is my PAUSE account name. After that, CPAN testers (automatic computers) will test the distribution by installing them in different systems, including Linux, Windows, Mac, FreeBSD, and so on. It will generate reports for the tests: PASS, FAIL, or UNKNOWN. You can click on them and find what happened for the failures.

Finally, good luck. Comments are welcome.

Here are some references when one wants to distribute modules to CPAN.

http://www.perlmonks.org/?node_id=158999
http://www.slideshare.net/brian_d_foy/create-and-upload-to-cpan (more about PAUSE)
http://perldoc.perl.org/perlnewmod.html (including some tips of error reporting)
http://www.perlmonks.org/?node_id=879515 (step-by-step introduction of making distribution by Build.PL)
http://www.ibm.com/developerworks/cn/linux/sdk/perl/makemaker/ (a long tutorial of MakeMaker in Chinese)

Distribute Perl Modules to CPAN

Caution: fastacmd is not case-sensitive

How to add Chinese Pinyin (拼音) in Microsoft Word 2007?

Install Jekyll on Windows

ROC curve and Area Under ROC Curve (AUC)

The history of sequencing in industry

Conditional regular expression and Branch reset in Perl

My paper on Drosophila X chromosome regulation is online now

A note on Globus