Link analysis is an analysis technique that focuses on relationships and connections in a dataset. Link analysis gives you the ability to calculate centrality measures—namely degree, betweenness, closeness, and eigenvector—and see the connections on a link chart or link map.

## About link analysis

Link analysis uses a network of interconnected links and nodes to identify and analyze relationships that are not easily seen in raw data. Common types of networks include the following:

- Social networks that show who talks to whom.
- Semantic networks that illustrate topics that are related to each other.
- Conflict networks indicating alliances of connections between players.
- Airline networks indicating which airports have connecting flights.

## Examples

A crime analyst is investigating a criminal network. Data from cell phone records can be used to determine the relationship and hierarchy between members of the network.

A credit card company is developing a new system to detect credit card theft. The system uses the known patterns of transactions for each client, such as the city, stores, and types of transactions, to identify anomalies and alert the client of a potential theft.

A public health analyst is researching the opioid crisis in North America. The analyst uses data on prescriptions and demographics to identify new patterns that are emerging as the crisis spreads.

## How link analysis works

The following table provides an overview of the terminology in link analysis:

Term | Description | Examples |
---|---|---|

Network | A set of interconnected nodes and links. | An online social network that uses a network of profiles and relationships to connect users. Airline networks that use a network of airports and flights to transport travelers from their origin to their destination. |

Node | A point or vertex that represents an object, such as a person, place, crime type, or tweet. The node may also include associated properties. | The profiles in a social network. Associated properties may include the user's name, home town, or employer. The airports in an airline network. Associated properties may include the airport name. |

Link | The relationships or connections between nodes. The link may also include associated properties. | The relationship between profiles in the network, such as friend, follower, or connection. Associated properties may include the length of the relationship. The flights between airports in an airline network. Associated properties may include the number of flights between airports. |

### Centrality

Centrality is a measure of importance for nodes in a network.

Overall centrality is used for the following purposes:

- To evaluate the influence of a node over other nodes in the network. For example, which user will reach the most other users when sharing a piece of news or a job opportunity?
- To identify the nodes that are most influenced by other nodes. For example, which airport will be most affected by cancelled flights from a storm in a different region?
- To observe the flow or spread of something throughout the network, including information, objects, or phenomena. For example, how does a package move from the warehouse to the delivery address?
- To understand which nodes spread phenomena through the network most efficiently. For example, which newspaper or channel should be contacted so the story reaches the most people?
- To locate nodes that can block or prevent the spread of phenomena. For example, where should vaccination clinics be located to stop the spread of a virus?

There are four ways to measure centrality in Insights: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality.

#### Degree centrality

Degree centrality is based on the number of direct connections a node has. Degree centrality should be used when you want to determine which nodes have the most direct influence. For example, in a social network, the users with the most connections would have a higher degree centrality.

Degree centrality of node x is calculated using the following equation:

`degCentrality(x)=deg(x)/(Nodes`_{Total}-1)

where:

- Nodes
_{Total}= The number of nodes in the network - deg(x) = The number of nodes connected to node x

If the links are directed, meaning that information flows between nodes in one direction only, the degree centrality can be measured either as indegree or outdegree. In the case of a social network, the indegree would be based on the number of profiles the user is following, whereas the outdegree would be based on the number of followers the user has.

Indegree centrality is calculated using the following equation:

`indegCentrality(x)=indeg(x)/(Nodes`_{Total}-1)

where:

- Nodes
_{Total}=the number of nodes in the network - indeg(x)=the number of nodes connected to node x with flow directed toward node x

Outdegree centrality is calculated using the following equation:

`outdegCentrality(x)=outdeg(x)/(Nodes`_{Total}-1)

where:

- Nodes
_{Total}= The number of nodes in the network - outdeg(x) = The number of nodes connected to node x with flow directed away from node x

For directed graphs, Insights sizes nodes by outdegree centrality by default.

#### Betweenness centrality

Betweenness centrality is based on the extent a node is part of the shortest path between other nodes. Betweenness centrality should be used when you want to determine which nodes are used to connect other nodes to each other. For example, a user in a social network with connections to multiple groups of friends will have a higher betweenness centrality than users with connections in only one group.

Betweenness centrality of node x is calculated using the following equation:

`btwCentrality(x)=Σ`_{a,bϵNodes}(paths_{a,b}(x)/paths_{a,b})

where:

- Nodes = All the nodes in the network
- paths
_{a,b}= The number of shortest paths between all nodes a and b - paths
_{a,b}(x) = The number of shortest paths between nodes a and b that connect through node x

The betweenness centrality equation above does not account for the size of the network, so large networks will tend to have greater betweenness centrality values than small networks. To allow comparisons between networks of different sizes, the betweenness centrality equation must be normalized by dividing by the number of node pairs in the chart.

The following equation is used to normalize an undirected chart:

`1/2(Nodes`_{Total}-1)(Nodes_{Total}-2)

where:

- Nodes
_{Total}= The number of nodes in the network

The following equation is used to normalize a directed chart:

`(Nodes`_{Total}-1)(Nodes_{Total}-2)

where:

- Nodes
_{Total}= The number of nodes in the network

#### Closeness centrality

Closeness centrality is based on the average of the shortest network path distance between nodes. Closeness centrality should be used when you want to determine which nodes are most closely associated to the other nodes in the network. For example, a user with more connections in the social network will have a higher closeness centrality than a user that is connected through other people (in other words, a friend of a friend).

##### Note:

The distance between nodes refers to the number of links separating them, not the geographical distance.

Closeness centrality of node x is calculated using the following equation:

`closeCentrality(x)=(nodes(x,y)/(Nodes`_{Total}-1))*(nodes(x,y)/dist(x,y)_{Total})

where:

- Nodes
_{Total}= The number of nodes in the network - nodes(x,y) = The number of nodes that are connected to node x
- dist(x,y)
_{Total}= The sum of the shortest path distances from node x to other nodes

#### Eigenvector centrality

Eigenvector centrality is based on important nodes being connected to other important nodes. Eigenvector centrality should be used when you want to determine which nodes are part of a cluster of influence. For example, a user in a social network with many connections to other users with many connections will have a higher eigenvector centrality than a user with few connections, or who is connected to other users with few connections.

Eigenvector centrality of node x is calculated using power iteration to find the largest eigenvector using the following equation:

`Ax=λx`

where:

- λ = The eigenvalue
- x = The eigenvector
- A = The matrix describing the linear transformation

## Next steps

Try this scenario-based exercise from Learn ArcGIS for hands-on experience with link analysis: