DRAT - Directed terminal Restriction Analysis Tool
- Conditions of use
- Detailed description
- Using DRAT
DRAT (Directed terminal Restriction Analysis Tool) is a software tool, developed to aid selection of restriction enzyme(s) to differentially identify targeted species/groups within complex communities based on user-supplied sets of sequences. It is applicable to any organisms/gene targets and can be directed to search for enzymes, or combinations of enzymes, to identify diagnostic Terminal Restriction Fragments (TRFs). It can use input files with multiple examples of each type, thus allowing understanding of intra-group variation.
DRAT is aimed at users designing diagnostic or monitoring tools for specific species or ‘functional groups’ within complex communities.
DRAT is available for use by any academic group for non-commercial research. If you would like to use DRAT for commercial work please contact Jonathan Snape. Please note that The James Hutton Institute does not give any representation or warranty, express or implied, as to the suitability, fitness for purpose or accuracy of DRAT and does not accept any liability for any losses incurred as a result of the use of DRAT.
DRAT is a computer application written in Perl and C++ and requires the latest version of Cygwin (version 1.7.7-1) and either active or strawberry Perl to run from the command line in windows. These can be downloaded by following the links.
DRAT consists of two separate programs, a fragment length generating program (FLGP) and an enzyme scoring program (ESP) that automatically communicate through the creation of a temporary file.
The FLGP is a perl script that borrows heavily from Tisdall 2003 (Mastering Perl for Bioinformatics. O’Reilly Media). It reads the input file, and the REBASE bionetc restriction enzyme datafile. The program uses regular expression matching to discover cut sites in each sequence for each the enzymes in the enzyme file (unless a single enzyme is specified) It then generates an mxn table of 5' and 3' terminal fragments lengths where m is the number of sequences and n the number of enzymes and writes this table to a delimited text file. The FLGP then calls the compiled C++ ESP, passing the delimited text file name as a parameter, this program scores enzyme performance.
The ESP reads the temporary file created by the FLGP. It filters out isoschizomers by testing for enzymes or enzyme combinations that create unique terminal fragment length combinations from the sequences. These enzymes or enzyme combinations are then scored for the ability to resolve groups using their unique fragment length combinations. The score has three components:
1. The inter group distinguishing score is the number of pair-wise group combinations the enzyme can distinguish from the maximum number of pairs G(1-G)/2 where G is the number of groups;
2. The percentage of sequence pairs that fail in groups that can not be resolved and
3. The fidelity of fragment lengths within groups.
The program uses the concept of nucleotide distances between sequences of the terminal fragments for pairs of input sequences. For all sequences within a group distances are calculated to all other sequences in that group to give the intra group distances. Then distances are calculated between each sequence within a group and each sequence from the other groups, these are the inter-group distances. If for any inter-group sequence comparison distances are less than the specified minimum distance threshold that group combination is considered irresolvable for that enzyme or enzyme combination and the maximum inter-group resolving score is decremented.
Every attempt has been made to ensure this is a clean and error/virus free release of DRAT. The James Hutton Institute accepts no responsibility for any issues arising from the use of this free program. The .zip file contains both MS windows and Mac OSX versions plus all libraries and associated files necessary for DRAT.
UnZip the contents of the compressed file into a local folder eg C:DRAT. The following files should be present:
- DRAT.pl – a 6k PL file
- Drat – a 970KB application file
- Drat – an 18KB CPP file
- Bionetc.512 – a 32KB file
- Lib – a new sub-folder
- TestA.fasta – a small fasta file used for training and troubleshooting purposes.It is worth storing a copy of this fasta file as it can be used for troubleshooting purposes (see Troubleshooting).
Access DRAT through the command line [windows start menu, click “run”, type cmd into the box then click OK.
From here navigate to the drat folder [eg type C: <then return>, then cd DRAT <return>. this should take you to your DRAT folder.
It is in this command window that you run DRAT using the correct script (see using DRAT).
If the appropriate software is installed (see the requirements section for details), the DRAT program can be run from the command line in Windows.
DRAT read sequences from Fasta format. To ensure accurate fragment sizing, sequences should be trimmed to include the full primer sequence at both ends of the PCR fragment to be digested. The first 3 letters of the sequence name defines the taxonomic grouping for that sequence. The aim of DRAT is to determine terminal restriction fragments (TRFs) that are common within a taxonomic group and that separates these taxonomic groups.
Sequences should be compiled into a single .fasta document with the taxonomic group-designators and submitted to the DRAT program using the following script:
"Perl drat.pl –fasta=*A* -maxenz=*B* –mindist=*C* –topn=*D* –sense=*E*–enzfile=bionetc.512 –enzname=*F*"
A description of parameters is given in table A.
the name of the fasta file containing sequences (with group
|*B*||the maximum number of enzymes to try in combination|
the minimum distance in bp threshold that is required to
|*D*||the number of enzymes to report|
|*E*||f (forward) 5‘, r (reverse) 3' or b (both) fragments will be scored|
|enzfile||the name of the file containing the enzyme cut data|
|*F*||all or the name of a specific enzyme to test|
Using the supplied TestA.fasta file, the script would be (for a single enzyme digest):
Perl drat.pl –fasta=TestA -maxenz=1 –mindist=4 –topn=10 –sense=f –enzfile=bionetc.512 –enzname=all
There are three primary outputs from the DRAT tool indicating the ability of the selected restriction enzymes to resolve the groups designated in the FASTA file. These are the .Scores file, the individual enzyme files and the .Cuts file.
The .Scores file. This is a summary table of the top N enzymes (top 10 in our example) ranking to maximise number of group-combinations resolved, the % of individual sequences that fail in groups that can not be resolved. The .scores file (Table 2) has five components.
1. The enzyme name.
2. The total number of taxonomic group combinations.
3. Success: the number of pairwise group combinations the enzyme can distinguish given the minimum distance specified by the user.
4. The percentage of sequence pairs that fail in groups that can not be resolved.
5. The fidelity of fragment lengths within groups.
Using the supplied TestA.fasta file, the .Scores file should look like this:
|Enzyme name(s)||total group combs||group combs success||Ave % seq fails||Ave group fidelity|
In this example, CviAII, FatI and Hin1II are able to fully resolve the 4 species-groups in the supplied input file with a minimum of 4bp between the diagnostic peaks. Whilst the remaining enzymes fail both to resolve between groups and to produce single diagnostic TRFs within groups.
Individual files are produced for each of the TopN enzymes. These files indicate the ability of the restriction enzyme to resolve all the possible group-combinations within the submitted FASTA file.
If using the supplied TestA.fasta file, the CviAII file should look like this, indicating that CviAII produces diagnostic fragments for all the possible group-combinations with 100% group fidelity:
Table 3: Individual file for enzyme CviAII
The predicted TRFs for the Top N are given in the .cuts file. From these outputs it is a simple matter to select suitable candidate restriction enzymes.
If using the supplied TestA.fasta file, the .cuts file data for the first 3 enzymes will look like this:
Table 4: The TestA.cuts file
GID = Group Identifier, this is used by DRAT
Length = total uncut length
CviAII_5 = the 5 prime TRFs
CviAII_3= the 3 prime TRFs
DRAT was developed by:
Pietà Schofield, Wellcome Trust Biocentre, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, Scotland, UK
David M. Roberts, Environment Plant Interactions, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA
Suzanne Donn, CSIRO Plant Industry, GPO Box 1600, Canberra, ACT 2601, Australia
Tim J. Daniell, Environment Plant Interactions, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA
DRAT is supplied free-of-charge and as such no formal technical support is offered.
Whilst every attempt has been made to ensure that DRAT is bug-free and stable on all Windows and Mac platforms, it must be recognised that issues may arise on occasion. Experience has shown that the vast majority of errors encountered using of DRAT during its development and subsequent use are due to errors in compiling the Fasta file. DRAT is unable to process any inconsistencies in the Fasta file and great care must be taken to ensure no unexpected characters are present [typically N-characters, spaces or unexpected paragraph characters].
If a submitted Fasta file returns an error message the first recommended action is to check the Fasta file for unexpected characters.
If an error message is still returned despite checking for errors in the Fasta file we recommend you re-run the supplied TestA.fasta file. It is possible that your version of DRAT could become corrupted during use and if so you will be unable to run the TestA.fasta file. If this is the case, simply delete your current version of DRAT and download again.
Issues with Cygwin.
Information is passed from Perl to Cygwin to complete the DRAT process. Its is essential that Perl is able to access Cygwin. If an error is reported that Perl is unable to find Cygwin1.dll this indicates that Cygwin is not included withi the PATH environment.
You may need to set the PATH environment variables to add c:\cygwin\bin (assuming cygwin was installed to here) to the PATH variable as specified here
Editing environment variables under XP is done as follows
under VISTA it is very similar, Windows & users MUST ensure they are logged in as Administrator to affect these changes.
Navigate to control panel, System and security (for classic view, otherwise select performance and maintenance), system. Select the Advanced system settings (or advanced and then Environment variables). Look within the System Variables window and select the PATH variable, select Edit and add the following extension to the end of the variable [insert all between “” without spaces]
If you are uncertain about changing your computers settings contact your IT department.
Within the command line it is possible to check if cygwin is included in the PATH environment by typing SET <return> and scrolling up to view the PATH details