[Piggy] MaxEntScan::build for Short Sequence Motifs

Burge Lab

MaxEntScan::build is a general way of building distributions over short sequence motifs that takes into account non-neighboring dependencies.

To score splice sites or score possible splice sites in a given sequence with different models described in the paper (below) refer to MaxEntScan::score splice


How to use MaxEntScan::build

Each sequence must be the same length. Input sequences as a FastA file with one sequence per line (no linebreaks). Non-ACGT sequences will not be processed

Example Positive File "5" is the positive label convention used in this program.
> 5
aagattg
> 5
cagaata
> 5
aagaaaa
...
Example Negative File "0" is the negative label convention used in this program.
> 0
tttaata
> 0
caaagtg
> 0
gtatgac
...


Maximum Entropy Distribution for Short Sequence Motifs

Select POSITIVE SEQUENCES filename: (REQUIRED)

Select NEGATIVE SEQUENCES filename: (REQUIRED)

Parameters

Input distribution parameters below:

sequence length (required)       The program currently does not handle lengths greater than 8. These are short motifs! Remember that all sequences (test and training) have to be the same length.
marginal order (default=1)       Refers to dependencies. Can go up to length-2.
marginal skip (default=0)       Works for marginal order equal to 2 only. Basically pair-wise dependencies.
maximum skip (default=0)       Works for marginal order equal to 2 only.

Distributions

Do you want the program to return Distributions? (Check if YES,default is NO)
Distributions are in lexicographic order. AAAAA, AAAAC, AAAAG ... TTTTT

Test Performance

Do you want to test the performance on test sequences? (Check if YES,default is NO).
Resultsfile: Returns in 6 columns:
Thresholds, True Positive rate, False Positive rate, Specificity, Approximate Correlation, Correlation Coefficient Values (Column Headings).
Enter TEST SEQUENCES filename:

Enter the RESULTS filename:
Enter the SCORE filename to output scored test sequences:


This algorithm was developed by Gene Yeo geneyeo@mit.edu and Christopher Burge cburge@mit.edu. This material is based upon work supported by the National Science Foundation (NSF) under Grant No. 0218506. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.