NetworkAnalyzer Settings
The following plugin settings can be configured by the user:
Store node / edge parameters in node / edge attributes
For every node in a network, NetworkAnalyzer computes its degree (in- and out-degrees for directed networks), its clustering coefficient, the number of self-loops, and a variety of other parameters. NetworkAnalyzer also computes edge betweenness for each edge in the network. If the respective options are enabled, NetworkAnalyzer stores the computed values as attributes of the corresponding nodes and edges. This enables the users to apply different visualizations or to filter nodes or edges based on the values of the computed attributes. For a complete list of the computed node and edge attributes, please see the section Attributes.
Use expandable interface for the dialog that displays analysis results
If this option is enabled, analysis results are presented in a window in which all charts are placed below each other in expandable boxes. If this option is disabled, analysis results are presented in a window that contains tabs for the group of simple parameters and for every complex parameter. Users who wish to view simultaneously two or more complex parameters of one network, should enable this option.
Change colors for parameter visualization
This options allow the user to change the default colors of parameter visualization. Please see the section Visualizing Computed Parameters for more details.
-
Background color
The color of the background in the network view. It is initially set to the default Cytoscape background color. -
Bright color
This color defines the brightest color that parameters can be mapped to. By default its value is green. -
Middle color
This color defines the intermediate color, that parameters can be mapped to. By default its value is yellow. -
Dark color
This color defines the darkest color that parameters can be mapped to. By default its value is red.
Location of the help documents
URL of this help web page for NetworkAnalyzer. This also enables the local download and storage of this help page.
Network Interpretations
NetworkAnalyzer can perform topological analysis on directed networks (containing only directed edges) as well as on undirected networks (containing only undirected edges). An example of a directed network and an undirected network is given in Figure 1a and Figure 1b, respectively.
In Cytoscape, a network may contain only directed edges even if they are undirected in their biological context. Moreover, one network may contain both directed and undirected edges if the network is compiled by combining data from different sources.
In the situations described above, NetworkAnalyzer needs user input how to interpret the edges. Figure 2 depicts two examples of small networks in Cytoscape and their interpretations.
In (a), the network contains solely directed edges. Here, NetworkAnalyzer provides three possible interpretations of the edge directions in the network. The user has to select one of the interpretations for further processing of the network.
In (b), the network contains both undirected and directed edges. Note that undirected edges cannot be converted unambiguously to directed ones. Therefore, networks with mixed edges are handled as undirected ones.
Simple Network Parameters
Number of connected components
In undirected networks, two nodes are connected if there is a path of edges between them. Within a network, all nodes that are pairwise connected form a connected component. The number of connected components indicates the connectivity of a network – a lower number of connected components suggests a stronger connectivity.
Parameters related to shortest paths
The length of a path is the number of edges forming it. There may be multiple paths connecting two given nodes. The shortest path length, also called distance, between two nodes n and m is denoted by L(n,m). The network diameter is the largest distance between two nodes. If a network is disconnected, its diameter is the maximum of all diameters of its connected components. The diameter can also be described as the maximum node eccentricity (eccentricity is defined in the Attributes section). The network radius, on the other hand, is the minimum among the non-zero eccentricites of the nodes in the network. The average shortest path length, also known as the characteristic path length, gives the expected distance between two connected nodes.
Parameters related to neighborhood
The neighborhood of a given node n is the set of its neighbors. The connectivity of n, denoted by kn, is the size of its neighborhood. The average number of neighbors indicates the average connectivity of a node in the network. A normalized version of this parameter is the network density. The density is a value between 0 and 1. It shows how densely the network is populated with edges (self-loops and duplicated edges are ignored). A network which contains no edges and solely isolated nodes has a density of 0. In contrast, the density of a clique is 1.
The number of isolated nodes may provide insight how the network density is distributed. Another related parameter is the network centralization [6]. Networks whose topologies resemble a star have a centralization close to 1, whereas decentralized networks are characterized by having a centralization close to 0. The network heterogeneity [6] reflects the tendency of a network to contain hub nodes.
In addition, the number of multi-edge node pairs indicates how often neighboring nodes are linked by more than one edge.
Clustering coefficient
In undirected networks, the clustering coefficient Cn of a node n is defined as Cn = 2en/(kn(kn-1)), where kn is the number of neighbors of n and en is the number of connected pairs between all neighbors of n [17, 2]. In directed networks, the definition is slightly different: Cn = en/(kn(kn-1)).
In both cases, the clustering coefficient is a ratio N / M, where N is the number of edges between the neighbors of n, and M is the maximum number of edges that could possibly exist between the neighbors of n. The clustering coefficient of a node is always a number between 0 and 1.
The network clustering coefficient is the average of the clustering coefficients for all nodes in the network. Here, nodes with less than two neighbors are assumed to have a clustering coefficient of 0.
Complex Network Parameters
Degree distributions
In undirected networks, the node degree of a node n is the number of edges linked to n. A self-loop of a node is counted like two edges for the node degree [5]. The node degree distribution gives the number of nodes with degree k for k = 0,1,….
In directed networks, the in-degree of a node n is the number of incoming edges and the out-degree is the number of outgoing edges. Similar to undirected networks, there are an in-degree distribution and an out-degree distribution.
Barabási and Albert used the node degree distribution to distinguish between random (as defined by Erdős and Rényi [7, 3]) and scale-free network topologies [2].
Neighborhood connectivity
The connectivity of a node is the number of its neighbors. The neighborhood connectivity of a node n is defined as the average connectivity of all neighbors of n [10]. The neighborhood connectivity distribution gives the average of the neighborhood connectivities of all nodes n with k neighbors for k = 0,1,…. Figure 3 shows the neighborhood connectivity distribution for the network presented in Figure 1b.
NetworkAnalyzer computes similar parameters for directed networks. In analogy to the in- and out-degree, every node n in a directed network has in- and out-connectivity. Thus, in directed networks, a node has the following types of neighborhood connectivity:
- only in - the average out-connectivity of all in-neighbors of n;
- only out - the average in-connectivity of all out-neighbors of n;
- in and out - the average connectivity of all neighbors of n (edge direction is ignored).
Based on the three definitions given above, there are three neighborhood connectivity distributions - "only in", "only out" and "in and out".
If the neighborhood connectivity distribution is a decreasing function in k, edges between low connected and highly connected nodes prevail in the network [10].
Shortest paths
The length of the shortest path between two nodes n and m is L(n,m). The shortest path length distribution gives the number of node pairs (n,m) with L(n,m) = k for k = 1,2,….
The network diameter is the maximum length of shortest paths between two nodes. If a network is disconnected, its diameter is the maximum of all diameters of its connected components.
The network diameter and the shortest path length distribution may indicate small-world properties of the analyzed network [17].
Clustering coefficients
In undirected networks, the clustering coefficient Cn of a node n is defined as Cn = 2en/(kn(kn-1)), where kn is the number of neighbors of n and en is the number of connected pairs between all neighbors of n [17, 2]. In directed networks, the definition is slightly different: Cn = en/(kn(kn-1)).
In both cases, the clustering coefficient is a ratio N / M, where N is the number of edges between the neighbors of n, and M is the maximum number of edges that could possibly exist between the neighbors of n. The clustering coefficient of a node is always a number between 0 and 1.
The average clustering coefficient distribution gives the average of the clustering coefficients for all nodes n with k neighbors for k = 2,…. NetworkAnalyzer also computes the network clustering coefficient that is the average of the clustering coefficients for all nodes in the network.
The clustering coefficient of a node is the number of triangles (3-loops) that pass through this node, relative to the maximum number of 3-loops that could pass through the node.
For example, in Figure 4, there is one triangle that passes through node b (the triangle bcd). The maximum number of triangles that could pass through b is three (in this case, the pairs (a, c) and (a, d) would be connected additionally). This yields a clustering coefficient of Cb = 1 / 3.
Ravasz et al. used the average clustering coefficient distribution to identify a modular organization of metabolic networks [13].
Shared neighbors
P(n,m) is the number of partners shared between the nodes n and m, that is, nodes that are neighbors of both n and m. The shared neighbors distribution gives the number of node pairs (n,m) with P(n,m) = k for k = 1,….
If a motif like the one presented in Figure 5 is over-represented in a network, this can be inferred from the shared neighbors distribution.
Topological coefficients
The topological coefficient [15] Tn
of a node n with kn neighbors is computed as follows:
Tn = avg ( J(n,m) ) / kn.
Here, J(n,m) is defined for all nodes m that share at least one
neighbor with n. The value J(n,m) is the number of neighbors shared
between the nodes n and m, plus one if there is a direct link between n and
m.
For example, in Figure 6, J(b,c) = J(b,d) = J(b,e) = 2. Therefore, Tb = 2 / 3.
The topological coefficient is a relative measure for the extent to which a node shares neighbors with other nodes. NetworkAnalyzer computes the topological coefficients for all nodes with more than one neighbor in the network. Nodes that have one or no neighbors are assigned a topological coefficient of 0 (zero).
The chart of the topological coefficients can be used to estimate the tendency of the nodes in the network to have shared neighbors.
Stress distribution
The stress [4, 14] of a node n is the number of shortest paths passing through n . A node has a high stress if it is traversed by a high number of shortest paths. This parameter is defined only for networks without multiple edges.
The stress distribution gives the number of nodes with stress s for different values of
s. The values for the stress are grouped into bins whose size grows exponentially by a
factor of 10. The bins used for this distribution are {0}; [1, 10); [10, 100);
...
Betweenness centrality
The betweenness centrality [4] Cb(n)
of a node n is computed as follows:
Cb(n) = ∑s≠n≠t (σst
(n) / σst),
where s and t are nodes in the network different from n,
σst denotes the number of shortest paths from s to t, and
σst (n) is the number of shortest paths from s to t
that n lies on.
Betweenness centrality is computed only for networks that do not contain multiple edges. The betweenness value for each node n is normalized by dividing by the number of node pairs excluding n: (N-1)(N-2)/2, where N is the total number of nodes in the connected component that n belongs to. Thus, the betweenness centrality of each node is a number between 0 and 1.
For example, the betweenness centrality of node b in Figure 7
is computed as follows:
Cb(b) =
((σac(b) / σac) +
(σad(b) / σad) +
(σae(b) / σae) +
(σcd(b) / σcd) +
(σce(b) / σce) +
(σde(b) / σde)) / 6 =
((1 / 1) + (1 / 1) + (2 / 2) + (1 / 2) + 0 + 0) / 6 = 3.5 / 6 ≈ 0.583
The betweenness centrality of a node reflects the amount of control that this node exerts over the interactions of other nodes in the network [19]. This measure favors nodes that join communities (dense subnetworks), rather than nodes that lie inside a community.
NetworkAnlayzer uses the fast algorithm by Brandes [4] for the computation of node betweenness centrality. This algorithm has a complexity of O(NM), N being the number of nodes and M - the number of edges in the network.
Closeness centrality
The closeness centrality [11] Cc(n)
of a node n is defined as the reciprocal of the average shortest path length and is computed
as follows:
Cc(n) = 1 / avg( L(n,m) ),
where L(n,m) is the length of the shortest path between two nodes
n and m. The closeness centrality of each node is a number between 0 and 1.
NetworkAnalyzer computes the closeness centrality of all nodes and plots it against the number of neighbors. The closeness centrality of isolated nodes is equal to 0.
Closeness centrality is a measure of how fast information spreads from a given node to other reachable nodes in the network [11].
For example, the closeness centrality of node b in Figure 7
is computed as follows:
Cc(b) = 1/ ( (L(b, a) + L(b, c)
+ L(b, d) + L(b, e)) / 4) = 4/ (1 + 1 + 1 + 2) = 4/5 = 0.8
Fitting a Line
NetworkAnalyzer provides another useful feature - fitting a line on the data points of some complex parameters. The method applied is the least squares method for linear regression [18]. NetworkAnalyzer gives the correlation between the given data points and the corresponding points on the fitted line. In addition, the R-squared value (also known as coefficient of determination) is reported.
Fitting a line can be used to identify linear dependencies between the values of the x and y coordinates in a complex parameter. Figure 8 shows the fitted line on a neighborhood connectivity distribution. The correlation between the data points and corresponding points on the line is approximately 0.969. The R-squared value is 0.939, giving a relatively high confidence that the underlying model is indeed linear.
Fitting a Power Law
The degree distribution of many biological networks approximates a power law: DD(k) ~ kα for some negative constant α. Several studies have reported similar properties of the average clustering coefficient distribution [13] and the topological coefficients [15].
NetworkAnalyzer can fit a power law to some topological parameters. Please note that NetworkAnalyzer uses the least squares method [18], and only points with positive coordinate values are considered for the fit. This approach fits a line on logarithmized data and may be inappropriate for supporting certain hypotheses [8, 16].
NetworkAnalyzer gives the correlation between the given data points and
the corresponding points on the fitted curve. In addition, the R-squared value
(also known as coefficient of determination) is reported. This coefficient gives the
proportion of variability in a data set, which is explained by a fitted linear model.
Therefore, the R-squared value is computed on logarithmized data, where the power-law
curve:
y = β xα
is transformed into linear model:
ln y = ln β + α ln x,
as shown in Figure 9.
Attributes
Node Attributes
While iterating over the connected components of a network, NetworkAnalyzer computes the following topological measures for each node n:
-
AverageShortestPathLength
Average length of a shortest path between n and any other node. If n is an isolated node, the value of this attribute is zero. -
BetweennessCentrality
Betweenness centrality of n as explained in the section Betweenness centrality. -
ClosenessCentrality
Closeness centrality of n as described in the section Closeness centrality. -
ClusteringCoefficient
This numerical attribute stores the clustering coefficient of n, as defined in [2]. Nodes with less than 2 neighbors have a clustering coefficient of zero. -
Degree
The degree of n as explained in the section Degree Distributions. -
Eccentricity
The maximum non-infinite length of a shortest path between n and another node in the network. If n is an isolated node, the value of this attribute is zero. -
IsSingleNode
This boolean attribute indicates if n is an isolated node, that is, if n has no neighbors. -
NeighborhoodConnectivity
The neighborhood connectivity of n as explained in the section Neighborhood Connectivity. -
NumberOfDirectedEdges
This attribute counts the number of directed edges that are connected to n. -
NumberOfUndirectedEdges
This attribute counts the number of undirected edges that are connected to n. -
PartnerOfMultiEdgedNodePairs
This attribute indicates if n is a partner of node pairs with multiple edges. -
Radiality
This attribute is a node centrality index [4] computed by subtracting the average shortest path length of a node n from the diameter of the connected component plus 1. The radiality of each node is divided by the diameter of the connected component. Thus it is a number between 0 and 1. -
SelfLoops
This attribute counts the number of self-loops at n. -
Stress
This attribute counts the number of shortest paths passing through a node. -
TopologicalCoefficient
This numerical attribute stores the topological coefficient of n, as defined in [15]. Nodes with less than 2 neighbors have a topological coefficient of zero.
Edge Attributes
While iterating over the connected components of a network, NetworkAnalyzer computes the following topological measures for each edge e:
-
EdgeBetweenness
This attribute stores the edge betweenness of each edge normalized by dividing by (M-1)(M-2), where M is the number of edges in the connected component that the edge belongs to. The edge betweenness of e=(v,w) is defined as the number of shortest paths between two nodes s and t that go through e divided by the total number of shortest paths that go from s to t [12], [19].
Edge betweenness is computed only for networks that do not contain multiple edges. In addition, self-loops are neglected in the calculation.
Visualizing Computed Parameters
For a visual inspection of the results of a topological analysis, NetworkAnalyzer includes an option to visualize the computed parameters by mapping them to node/edge size and color. The visualization can be applied to the attributes computed by NetworkAnalyzer, as well as to all other numerical attributes assigned to the nodes and edges of the network of interest.
The Visualize Computed Parameters dialog is shown on Figure 10. The visualization offers two ways of mapping computed parameters.
-
Map to node / edge size
The computed parameter is mapped to the size of the nodes or edges. Mapping can be straight or inverse, that is, low parameter values can be mapped to small sizes or to large sizes. The smallest node size is set to 10 and the largest one to 100. Regarding edges, size reflects the edge line width and varies between 1 and 8. Refer to the documentation of Cytoscape's VizMapperTM if interested in fine-tuning the mapping parameters. -
Map to node / edge color
A computed parameter is mapped to the color of the nodes or edges. Two mapping styles are possible - mapping low parameter values to bright colors or to dark colors. By default, the brightest color is green and the darkest color is red. The mapping also uses a middle (intermediate) color, which allows for fine-grained perception of differing values through the color gradient. The default middle color is yellow.
The default values for background, bright, middle and dark colors can be changed in the NetworkAnalyzer Settings menu option.
Note that visual mapping is possible only when the computed parameters for the selected
network are stored as node and edge attributes. Make sure the option "Store node / edge
parameters in node / edge attributes" in NetwowkAnalyzer Settings is enabled. Parameters
loaded from a .netstats
file cannot be visualized because the network itself is not
stored in the network statistics file. If, after performing topological analysis, the network is
modified by introducing or removing nodes or edges, it is recommended (and sometimes required)
to run NetworkAnalyzer again before visualizing any parameters.
Network Modifications
In addition to topological analysis, NetworkAnalyzer offers a useful set of network modifications.
Remove Duplicated Edges
NetworkAnalyzer can remove duplicated edges in a network. After applying this operation to a network, every node pair (a, b) is connected by at most three edges - a directed edge from a to b, a directed edge from b to a, and an undirected edge. For each of these edge types, if there are more than one edge, one is randomly chosen and all others are removed from the network. Any attributes of the considered edges, if present, are ignored. If the option "Ignore edge direction" is selected, all edges are treated as undirected and thus only one edge per node pair is retained.
This option does not affect self-loops in the network. All duplicated self-loops, if any, are retained.
Please note that this option effectively modifies the selected network(s) and the operations performed cannot be undone.
Remove Self-Loops
NetworkAnalyzer can remove self-loops in a network. A self-loop is an edge which connects a node to itself. By applying this operation to a network, all directed and undirected self-loops are removed.
This option does not affect any other edges present in the network.
Please note that this option effectively modifies the selected network(s) and the operations performed cannot be undone.
Network Operations
NetworkAnalyzer can compute the union, intersection and difference of any two networks. The result of such operation is stored as a new network in Cytoscape.
Connected Components
NetworkAnalyzer can list all its connected components of a disconnected network. The size (number of nodes) of each component is reported. By selecting a specific connected component, it can be exported as a separate network in Cytoscape. This feature allows users to perform topological analysis on the largest connected component of a network.
References
[1] | Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286 (1999) 509-512 |
[2] | Barabási, A.L., Oltvai, Z.N.: Network biology: understanding the cell's functional organization. Nat Rev Genet 5 (2004) 101-113 |
[3] | Bollobas, B.: Random graphs. Cambridge University Press, Cambridge (2001), ISBN: 0-521-80920-7 |
[4] | Brandes, U.: A faster algorithm for betweenness centrality. J Math Sociol 25 (2001) 163-177 |
[5] | Diestel, R.: Graph theory. Springer-Verlag, Heidelberg (2005), ISBN 3-540-26182-6 |
[6] | Dong, J., Horvath, S.: Understanding network concepts in modules. BMC Syst Biol 24 (2007) |
[7] | Erdős, P., Renyi, A.: Publ Math Inst Hung Acad Sci 5 (1960) 17 |
[8] | Goldstein, M.L., Morris, S.A., Yena, G.G.: Problems with fitting to the power-law distribution. European Physical Journal B 41 (2004) 255-258 |
[9] | Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer-Verlag (2001), ISBN 0-387-95284-5 |
[10] | Maslov, S., Sneppen, K: Specificity and stability in topology of protein networks. Science 296 (2002) 910-913. |
[11] | Newman, M.E.J.: A measure of betweenness centrality based on random walks. arXiv (2003) cond-mat/0309045 |
[12] | Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Soc Networks 27 (2005) 39-54 |
[13] | Ravasz, E., et al.: Hierarchical organization of modularity in metabolic networks. Science 297 (2002) 1551-1555 |
[14] | Shimbel, A.: Structural parameters of communication networks. Bull Math Biophys 15 (1953) 501-507. |
[15] | Stelzl, U., et al.: A human protein-protein interaction network: a resource for annotating the proteome. Cell 122 (2005) 957-968. |
[16] | Tanaka, R., Yi, T.M, Doyle, J.: Some protein interaction data do not exhibit power law statistics. FEBS Letters 579 (2005) 5140-5144 |
[17] | Watts D.J., Strogatz, S.H.: Collective dynamics of 'small-world' networks. Nature 393 (1998) 440-442 |
[18] | Weisstein, E. W.: Least Squares Fitting - Power Law. MathWorld - A Wolfram Web Resource. (http://mathworld.wolfram.com/LeastSquaresFittingPowerLaw.html) |
[19] | Yoon, J., Blumer, A., Lee, K.: An algorithm for modularity analysis of directed and
weighted biological networks based on edge-betweenness centrality. Bioinformatics, 22 (2006) 3106-8 |