ABC_GWH
A cookbook to study Genome Wide Heterogeneity in introgression rates
The following archive contains the three scripts I used in the Ciona instestinalis study.
Priorgen
Developped by Jeffrey Ross-Ibarra in C++, and somewhat modified to investigate GWH.
This script requires two C++ libraries Boost and GSL.
You can compile it by typing in the priorgen folder something like :
g++ -o priorgen priorgen.cc -I /home/roux/Programmes/boost_1_47_0/ -L /home/roux/Programmes/boost_1_47_0/libs -lboost_program_options-mt -lgsl -lgslcblas -lm -static -O3 -I /home/roux/Programmes/gsl-1.15/
I know, it's frustrating, I had the same feeling the first time. You just have to adapt this line according the username of your session (mine is roux), according to the path to Boost.
If it doesn't work, you can try to use the compiled file I provided by :
chmod +x priorgen
./priorgen --help or ./prior --version
If the released priorgen works, please create a symbolic link into your bin :
sudo ln -s WholeCurrentPathToPriorgen /usr/bin/priorgen
In the archive you will find a folder named example_files. To run priorgen, for example, you can use the following command line :
priorgen --arg-file argfile_halleri_lyrata.txt --bpfile bpfile_halleri_lyrata.txt --seed 143 --prior parameters.txt
Please, play with the different arguments in the argfile :
model = iso, sym, allo or island
mig = homo or hetero
dir = sym or asym
The created parameters.txt contains multilocus parameters while the standard output will be read by msnsam
Msnsam
The lifesaving version by Ross-Ibarra of the lifesaving program MS by Hudson.
To compile it, just write :
./clms
You can simply test it :
./msnsam 10 1 -t 5
If it works :
sudo ln -s WholeCurrentPathToMsnsam /usr/bin/msnsam
priorgen | msnsam
If you have selected the 'iso' model in argfile:
priorgen --arg-file argfile_halleri_lyrata.txt --bpfile bpfile_halleri_lyrata.txt --seed 143 --prior parameters.txt | msnsam tbs 200 -t tbs -r tbs tbs -I 2 tbs tbs 0 -m 1 2 tbs -m 2 1 tbs -n 1 tbs -n 2 tbs -ej tbs 2 1 -eN tbs tbs
If you have selected the 'island' model in argfile:
priorgen --arg-file argfile_halleri_lyrata.txt --bpfile bpfile_halleri_lyrata.txt --seed 143 --prior parameters.txt | msnsam tbs 200 -t tbs -r tbs tbs -I 2 tbs tbs 0 -m 1 2 tbs -m 2 1 tbs -n 1 tbs -n 2 tbs -ej tbs 2 1 -eN tbs tbs
If you have selected the 'sym' model in argfile:
priorgen --arg-file argfile_halleri_lyrata.txt --bpfile bpfile_halleri_lyrata.txt --seed 143 --prior parameters.txt | msnsam tbs 200 -t tbs -r tbs tbs -I 2 tbs tbs 0 -m 1 2 0 -m 2 1 0 -n 1 tbs -n 2 tbs -ema tbs 2 0 tbs tbs 0 -ej tbs 2 1 -eN tbs tbs
If you have selected the 'allo' model in argfile:
priorgen --arg-file argfile_halleri_lyrata.txt --bpfile bpfile_halleri_lyrata.txt --seed 143 --prior parameters.txt | msnsam tbs 200 -t tbs -r tbs tbs -I 2 tbs tbs 0 -m 1 2 tbs -m 2 1 tbs -n 1 tbs -n 2 tbs -eM tbs 0 -ej tbs 2 1 -eN tbs tbs
Why 200 ? Because 10 multilocus simulations of 20 loci.
mscalc
To compile it, just write :
gcc *.c -o mscalc -lm
Please, don't pay any attention to the warning messages :
spinout.c: In function ‘get_initial_conditions_dynamics’:
spinout.c:448: warning: format ‘%ld’ expects type ‘long int’, but argument 4 has type ‘int’
spinout.c:547: warning: format ‘%ld’ expects type ‘long int’, but argument 4 has type ‘int’
If you succesfully compiled mscalc :
sudo ln -s WholeCurrentPathToMscalc /usr/bin/mscalc
It requires two input files : spinput.txt + the output from ms/msnsam
The released version compute summary statistics without orientate mutations by using an outgroup. If you want to take into account an outgroup, just send me an e-mail
The whole stuff
mknod myfifo p
mscalc <myfifo &
./priorgen --arg-file argfile_halleri_lyrata.txt --bpfile bpfile_halleri_lyrata.txt --seed 143 --prior parameters.txt | msnsam tbs 200 -t tbs -r tbs tbs -I 2 tbs tbs 0 -m 1 2 tbs -m 2 1 tbs -n 1 tbs -n 2 tbs -ej tbs 2 1 -eN tbs tbs >myfifo &
The output containing the summary statistics is ABCstat.txt
Reminder
If you change the number of multilocus simulations, change the argfile, the spinput.txt and the second argument of msnsam.
Personnally, I run on a cluster of CPUs hundreds of jobs of 100,000 multilocus iterations. So, If you plan to do the same thing, don't forget to change the seed value of priorgen among different jobs ;)
The R package 'abc' is essential for the rest of the analyses [link]