Website of Frank Rügheimer

Automated STructure identification for REgulatory networks

Description

Astre is a command line tool that implements a collection of association measures, discretization operators and ranking methods suited to network induction with quantitative data sets in biology and other fields. The measures draw on information theory, classical and robust statistics. They can be used, e.g., for clustering, classification and network induction tasks. The software is implemented in C and can run in batch mode or as an interactive query processor. Although astre output can be filtered to obtain simple relevance networks, it was designed to operate in conjunction with higher-level tools such as scoreKO.

Download

The program and sample workflows use libraries and tools from the table utility package written by Christan Borgelt (included in the compressed sourcecode). That package has been released under the GNU LESSER GENERAL PUBLIC LICENSE Version 2.1.

If you are planning to use the table utilities in your own programs, consider visiting Christian's website to obtain the most recent version of the package and its documentation.

Command-line interface

General usage and arguments are explained in the help text that comes with the program. It can be accessed from the command line by calling the program without any arguments or by including the "-?" option:

USAGE    : ./astre [OPTIONS] <tabfile>
DESCRIPTION:
   Determine association between pairs of items represented by feature
   vectors
INPUT:
   Astre loads a table of feature vector definitions from <tabfile>.
   It proceeds to read queries consisting of identifier pairs
   <id1> <id2> separated by whitespace from stdin or a file indicated
   via the "-q" option. If the query is syntactically correct and
   both <id1> and <id2> identify valid feature vectors, the program
   computes an association score f_assoc(<id1>, <id2>) and prints a
   copy of the query with the score attached at the end of the line
   to its output stream. By default output is sent to stdout. An
   alternative output file can be specified via the "-o" option.
   If an incorrect query is parsed the program triggers a warning
   message, prints a response indicating missing values. Processing
   will be resumed at the start of the next query.
ARGUMENTS:
   <tabfile>  name of a file containing feature vectors
   Feature vectors are specified as a line containing an item
   identifier followed by a sequence of values that represent
   the elements of the feature vector. The  symbols "?" or "*"
   can be used as placeholders for missing values. All values are
   separated from the identifier and from each other by tabulators.
   In the default configuration the first line of the table is inter-
   preted as a header, consisting of column names for the identifier
   data columns (this behavior can be changed using -h or -H options).
OPTIONS:
  -?           show this help screen
  -a           report absolute value rather than signed one for 
               measures that extend to negative range
  -m <key>     set association measures according to <key>;
               values for <key> and their associated measures are:
        corr - Pearson correlation coefficient (numerical vectors only)
        rcorr- Spearman rho rank correlation        (all vectors)
        d2   - d^2 (sum of element wise squared deviation of
               contingency table from expectation under independence)
        mi   - mutual information (=Shannon information gain) 
        igain- Shannon information gain (equivalent to mi)
        igr  - Shannon information gain ratio
        tau  - Kendall tau - (note: for efficiency it is advisable to
               use this measure with a coarse binning of the input)
        (default corr)
        Note:   for numerical feature vectors a discretization based
                on quantiles under fractional ranking will be applied
  -h <hdrfile> read table header from separate file (default tabfile)
  -H           indicate headerless expression file  (default disabled)
  -q <qryfile> read queries from <qryfile> instead of stdin
  -l <number>  set number of discretization levels to <number>
               value must be positive integer       (default 3)
  -o <outfile> set the name of the output file to <outfile>
                                                    (default <stdout>)


Running the ./demo script from the source code package will start a tutorial/demonstration of the astre and its interface.