Automated STructure identification for REgulatory networks
Astre is a command line tool that implements a collection of association measures, discretization operators and ranking methods suited to network induction with quantitative data sets in biology and other fields. The measures draw on information theory, classical and robust statistics. They can be used, e.g., for clustering, classification and network induction tasks. The software is implemented in C and can run in batch mode or as an interactive query processor. Although astre output can be filtered to obtain simple relevance networks, it was designed to operate in conjunction with higher-level tools such as scoreKO.
The program and sample workflows use libraries and tools from the table utility package written by Christan Borgelt (included in the compressed sourcecode). That package has been released under the GNU LESSER GENERAL PUBLIC LICENSE Version 2.1.
If you are planning to use the table utilities in your own programs, consider visiting Christian's website to obtain the most recent version of the package and its documentation.
General usage and arguments are explained in the help text that comes with the program. It can be accessed from the command line by calling the program without any arguments or by including the "-?" option:
USAGE : ./astre [OPTIONS] <tabfile> DESCRIPTION: Determine association between pairs of items represented by feature vectors INPUT: Astre loads a table of feature vector definitions from <tabfile>. It proceeds to read queries consisting of identifier pairs <id1> <id2> separated by whitespace from stdin or a file indicated via the "-q" option. If the query is syntactically correct and both <id1> and <id2> identify valid feature vectors, the program computes an association score f_assoc(<id1>, <id2>) and prints a copy of the query with the score attached at the end of the line to its output stream. By default output is sent to stdout. An alternative output file can be specified via the "-o" option. If an incorrect query is parsed the program triggers a warning message, prints a response indicating missing values. Processing will be resumed at the start of the next query. ARGUMENTS: <tabfile> name of a file containing feature vectors Feature vectors are specified as a line containing an item identifier followed by a sequence of values that represent the elements of the feature vector. The symbols "?" or "*" can be used as placeholders for missing values. All values are separated from the identifier and from each other by tabulators. In the default configuration the first line of the table is inter- preted as a header, consisting of column names for the identifier data columns (this behavior can be changed using -h or -H options). OPTIONS: -? show this help screen -a report absolute value rather than signed one for measures that extend to negative range -m <key> set association measures according to <key>; values for <key> and their associated measures are: corr - Pearson correlation coefficient (numerical vectors only) rcorr- Spearman rho rank correlation (all vectors) d2 - d^2 (sum of element wise squared deviation of contingency table from expectation under independence) mi - mutual information (=Shannon information gain) igain- Shannon information gain (equivalent to mi) igr - Shannon information gain ratio tau - Kendall tau - (note: for efficiency it is advisable to use this measure with a coarse binning of the input) (default corr) Note: for numerical feature vectors a discretization based on quantiles under fractional ranking will be applied -h <hdrfile> read table header from separate file (default tabfile) -H indicate headerless expression file (default disabled) -q <qryfile> read queries from <qryfile> instead of stdin -l <number> set number of discretization levels to <number> value must be positive integer (default 3) -o <outfile> set the name of the output file to <outfile> (default <stdout>)
Running the ./demo script from the source code package will start a tutorial/demonstration of the astre and its interface.