scoreKo regulatory network search tool
- Rue_jobim_2011.pdf short introduction to the tool and its application context (presented at JOBIM 2011)
The scoreKO program extracts plausible regulatory pathways from a network of weighted potential interactions. It does so by conducting a graph search on a given network based on aggregated link plausibility for the edges of each path. The tool can be configured to report the families of up to the n-most plausible pathway. The currently provided aggregation operators are associative, commutative and non-increasing. These properties are exploited to speed up the search by preventing the further exploration of pathways that can no longer achieve the selection criteria.
The program is intended to be used in conjunction with interaction measures that generate networks of potential interactions from large scale empirical data (optionally modified by prior knowledge form literature/protein interaction data). It aggregates predictions about local network structure into testable regulatory hypotheses linking perturbations to observable effects. It can also be used in an iterative manner in which its output is used to select experiments to develop an set of regulatory hypotheses into an increasingly refined and validated regulatory structure.
General usage and arguments are explained in the help text that comes with the program. It can be directly accessed from the command line by including the "-?" option:
USAGE: scoreKO [OPTIONS] sourcelist targetlist edgelist CONTENTS: score regulatory network node by path hypotheses. OPTIONS: '-?' or '-h' show this help screen '-x' enable output mode with additional columns '-p' print top scoring families of pathway hypotheses rather than node scores. If this option is spec- ified the program acts as a filter to the input graph that extracts the top scoring putative signaling pathways connecting the given sub-networks. This option may not be used in conjunction with -e or -x. (default=disabled) '-e' print list of non-zero edge scores rather than node scores. This option may not be combined with -p or -x. (default=disabled) '-l #' set maximum plausibility rank to be considered in pathway hypothesis family mode; used in con- junction with -p option. Output will be based on the top # score ranks (default=1) '-a <op>' set aggregation operator to be used for path scoring. <op> is a selector for an aggre- gation method. Currently supported values are: min - minimum (smallest edge weight on path) prod - product (multiply edge weights on path) hprod - Hamacher product (used when aggregating several edges with compar. low scores) lsum - sum (used with logarithmic weights from (-inf,0]; equivalent to prod on untrans- formed data (default: hprod) ARGUMENTS: sourcelist: File with node identifiers serving as points of origin (optional source of reg. signal Several nodes may be specified, but each iden-, tifier must be given on a separate line. targetlist: File with node identifiers serving as regulation targets. Several nodes may be specified, but each iden-, tifier must be given on a separate line. edgelist: File containing edge specifications of the form "nodeA<TAB>nodeB<TAB>value", were value is a number from the real interval [0,1] which designates the weight of the direc- ted edge nodeA->nodeB. Several edges may be specified, but each speci-, fication must be given on a separate line.
After unpacking the archive file a directory exam containing sample input
files will be created.
srcnodes.lst contains a list of nodes
considered as the source of a perturbation.
target nodes for which the effect of the perturbation can be observed.
Finally the file
edges.tab contains specifications of directed,
edges in the regulatory network and their respective plausibility score.
To assign a score to each node that reflects the plausibility of the optimal pathway through this node using the minimum as path aggregator type
> scoreKO -a min -x exam/srcnodes.lst exam/dstnodes.lst exam/edges.tab
ID path_score con2src con2trg A 0.8 1.0 0.8 B 0.67 1.0 0.67 C 0.8 1.0 0.8 I 0.8 0.8 1.0 E 0.67 0.8 0.67 F 0.12 0.75 0.12 H 0.8 0.8 0.97 D 0.48 0.8 0.48 G 0.79 0.79 0.8
The -x option enables two additional columns in the output reflecting the connectivity to the nearest nodes from the source and the target node set. To view output in pathway mode instead, using the default Hamacher product as pathway aggregator type:
> scoreKO -pl3 exam/srcnodes.lst exam/dstnodes.lst exam/edges.tab
from to weight A H 0.8 C A 0.83 C G 0.79 H I 0.97 G A 0.9
Note that the -p and the -l option were combined to obtain the pathway representations for the top-3 plausibility levels.
The Hamacher product is set as the default operator as it can distinguish paths that would be assigned the same score under the minimum aggregation operator. On the example this becomes quite evident when re-running the analysis using the minimum as pathway aggregation operator and comparing the results.
> scoreKO -pl3 -a min exam/srcnodes.lst exam/dstnodes.lst exam/edges.tab
from to weight A E 0.68 A H 0.8 B E 0.8 C A 0.83 C B 0.81 C G 0.79 E A 0.67 H I 0.97 G A 0.9
To convert the output of the pathway mode to popular graph language formats
two shell scripts are provided with the program package. Both scripts operate
as filters. This allows output to be piped directly from scoreKO to the filter
script, to cascade filters or to pass the processed output directly to other
net2dot converts edges from the resulting table format to
.dot format (used e.g. with the graphviz tool suite). Once converted to .dot
format the scripts
dot2reg can be applied
to create output readable, e.g. by Cytoscape. Since all output formats are
text-based simple stream processing can be used to adapt the scripts for output
in other formats if desired.