Since the debut of Facebook, Twitter, Snapchat and Wechat, the social network has taken the center stage for innovation, new business model and transformed into a new media that deeply impact our lives. Some even claim it can change election results. Social network relationship is mostly represented by a directional or un-directional graph. What can we do with it and what value can we get given a network graph?

Before diving into the things we can do, I’ll first go over some terminology about a graph by looking at this illustration example:

Vertex: The fundamental unit of a graph, in this example, are the people node (objects).

Arc or directed edge: A connection between a pair of nodes. It can be both directed (with direction) and undirected.

Indegree, outdegree, hub and authority scores: These are measures of the centrality (“importance”) of a paper in a network.

Path: A sequence of vertices such that from each vertex there is an edge to the next vertex in the sequence.

Now we know the terminologies for network graph and what are we going to do with a given graph? Here are some starting points and the R code to illustrate it:

**Measure connectedness of points**

This connectedness will measure how many vertexes are connected to other vertices. The number of lines connecting to a vertex is also called vertex of degree.

Example: Node (1) has 3 vertexes of degree. And node (2) has highest 6.

library(‘igraph’)

> library(‘sna’)

> graph1 <- sample_pa(15, power = 1, directed = FALSE)

> plot(graph1)

> degree(graph1)

[1] 3 6 3 1 1 5 1 1 1 1 1 1 1 1 1

**Measure betweenness of points**

This metric measures the bridge that individual node provide between groups or individuals. Generally, higher betweenness score, the more important that individually is. As seen from here, node 2 bridges the most within all the nodes.

> betweenness(graph1)

[1] 25 75 25 0 0 46 0 0 0 0 0 0 0 0 0

Network density is defined as the number of connection divided by all the possible connections. A completed connected network is 1. In this graph, it is 0.133.

> density <- edge_density(graph1, loops = FALSE)

> density

[1] 0.1333333

A clique is a small group of interconnected nodes with similar features. This is useful to identify groups of similar traits from the graph. In the above example, there is no clique and I’ll create another one.

> graph2 <- sample_gnp(20,0.3)

> plot(graph2)

> cliques(graph2, min = 4) # minimum of 4 members

[[1]]

+ 4/20 vertices, from da697b6:

[1] 1 3 19 20

[[2]]

+ 4/20 vertices, from da697b6:

[1] 6 13 17 19

[[3]]

+ 4/20 vertices, from da697b6:

[1] 6 13 16 17

[[4]]

+ 4/20 vertices, from da697b6:

[1] 2 3 16 17

[[5]]

+ 4/20 vertices, from da697b6:

[1] 3 11 15 16

[[6]]

+ 4/20 vertices, from da697b6:

[1] 3 11 14 16

[[7]]

+ 4/20 vertices, from da697b6:

[1] 3 11 16 17

[[8]]

+ 4/20 vertices, from da697b6:

[1] 5 7 13 15

**Find components of a graph**

For a graph, it is possible that some nodes are not connected to another node. So the graph can have multiple components that are not interconnected. Here is how to identify the components. First, create a sparse graph.

comp_graph <- sample_gnp(30,0.05,directed =FALSE, loops =FALSE)

> plot(comp_graph)

> components(comp_graph)

$membership

[1] 1 2 3 4 1 5 6 5 5 1 7 8 1 9 1 6 1 1 5 10 1 1 1 1 1 2 1 11

[29] 1 1

$csize

[1] 15 2 1 1 4 2 1 1 1 1 1

$no

[1] 11

So there are $no 11 components with its membership in $membership, with a size of $csize.

**Take a random walk a graph**

Some graphs present processes or path where an active node can change. When you take a random walk, each path assigned an equal weight. The random walk process will take the walk from beginning to the end and shows which nodes are visited. Let’s look at the code (start at node 29, steps of 8):

random_walk(comp_graph, 29, 8, stuck = “return”)

+ 8/30 vertices, from 9a4a115:

[1] 29 13 29 1 5 1 29 13

The path is: 29, 13, 29, 1, 5, 1, 29, 13.

**Ref: **

Comparison of Translational Patterns in Two Nutrient-Disease Associations: Nutritional Research Series, Vol. 5.