Saturday, March 27, 2021

Strongly Connected Components and Kosaraju's Algorithm

Some presentations of Kosaraju’s Algorithm don’t provide a detailed explanation of why the algorithm works. Here’s my attempt to explain it.

The story begins with depth-first search (DFS). To review, DFS goes deeper at each step, following an out-edge from the current vertex to a never-before-seen vertex. If there are no out-edges to never-before-seen vertices, then the search backtracks to the last visited vertex with out-edges to never-before-seen vertices and continues from there.

The following graph shows the result of DFS on a small graph. The edges traversed by the DFS are marked in green and form a depth-first tree.

Example of Depth First Search.

We can categorizes the edges of the graph with respect to the depth-first tree in the following way:

  • tree edge: an edge on the tree, e.g., g → c in the graph above.
  • back edge: an edge that connects a descendent to an ancestor with respect to the tree, e.g., f → g.
  • forward edge: an edge that connects an ancestor to a descendent wrt. the tree, e.g., f → e
  • cross edge: all other edges, e.g., k → l.

Graph with edges categorized by a Depth-First Search. The tree edges are in green, back edges in red, forward edges in blue, and cross edges in black.

Theorem A graph has a cycle if and only if there is a back edge.

As we shall see, it is useful to record timestamps on a vertex when it is first discovered during a DFS and when it is finished (after visiting all of its descendants).

A graph with discover and finish times from a Depth-First Search.

Theorem

  • If u → v is a tree, forward or cross edge, then the finish_time[v] < finish_time[u].
  • If u → v is a back edge, then finish_time[u] < finish_time[v].

For reference, here is the code for DFS and its auxiliary function DFS_visit.

static void DFS_visit(List<List<Integer>> G, Integer u, 
                      ArrayList<Integer> parent,
                      ArrayList<Boolean> visited, List<Integer> finish) {
    visited.set(u, true);
    for (Integer v : G.get(u)) {
        if (! visited.get(v)) {
            parent.set(v, u);
            DFS_visit(G, v, parent, visited, finish);
        }
    }
    finish.add(u);
}

static List<Integer> DFS(List<List<Integer>> G) {
    ArrayList<Integer> parent = new ArrayList<>();
    ArrayList<Boolean> visited = new ArrayList<>();
    for (int u = 0; u != G.size(); ++u) {
      parent.add(u);
      visited.add(false);
    }
    ArrayList<Integer> finish = new ArrayList<>();
    for (int u = 0; u != G.size(); ++u) {
        visited.set(u, false);
    }
    for (int u = 0; u != G.size(); ++u) {
        if (! visited.get(u))
            DFS_visit(G, u, parent, visited, finish);
    }
    return finish;
}

Now we turn to discussing the problem of computing strongly connected components.

Definition A strongly connected component is a maximum subset of the vertices in a graph such that every vertex in the subset is reachable from all the other vertices in the subset.

For example, the following graph

has these strongly connected components:

Definition The component graph C of another digraph G has 1) a vertex for each SCC in G. For each vertex u in C, we write SCC(u) for it’s SCC in G. 2) an edge between u and v if there is an edge from any vertex in SCC(u) to any vertex in SCC(v).

Here is the component graph of the example.

Theorem A component graph is acyclic.

Otherwise, the vertices in the cycle represent connected components that are not maximal. They could have been combined into a larger SCC.

Kosaraju’s Algorithm for SCC

Suppose we do a DFS_visit from a random node in the graph G. We’ll visit all of the other nodes in its SCC (that’s good) but we may also visit nodes in other SCCs (that’s bad).

How can we cause DFS_visit to stop before visiting nodes in other SCCs?

If we run DFS_visit on a node in an SCC that has no out-edges to other SCCs, then we’d just visit the nodes in that SCC and no other. We could then remove those nodes from the graph and repeat.

That’s a lot like a topological ordering on the component graph C, but with out-edges instead of in-edges. So what we need is a topological ordering on the transposed component graph C^T.

Definition The transpose of a graph G, written G^T, has the same vertices as G but the edges are reversed.


For examaple,

D, B, A, E

is a topological ordering of C^T.

But we don’t have C yet… that’s what we’re trying to compute!

Recall that DFS finish times are related to topological ordering. We can apply DFS to G^T to obtain finish times. Here’s the transposed graph of the example with the root and edges of each DFS tree highlighted in green.

and here are the vertices ordered by finish time:

3, 0, 1, 5, 4, 2

The vertex that finished last (vertex 2) must be in a SCC (D) that does not have any in-edges in C^T. Why is that? If there were an in-edge from another SCC, then the source of that in-edge would have finished later, but that contradicts vertex 2 being the last to finish. The only way the source of the in-edge could finish earlier would be if it was a back edge, but then the two vertices would be in a cycle and in the same SCC, which contradicts them being in different SCC.

Since the SCC (D) does not have any in-edges in C^T, it doesn’t have any out-edges in the C.

So running DFS_visit on vertex 2 in the original graph will only reach other vertices in its SCC (D). DFS_visit will mark all of those vertices as visited, so later runs of DFS_visit will ignore them.

We continue running DFS_visit on each vertex according to the reverse order of finish time (i.e. 4,5,1,0,3). Each tree in the resulting DFS forest is a SCC.

So here’s the algorithm

  1. Transpose the graph G to obtain G^T.
  2. Apply DFS to G^T to obtain the order in which the vertices finished.
  3. For each vertex u in the reversed finish list, apply DFS_visit to u in G.
  4. Each of the resulting DFS trees is a SCC. (The trees are encoded in the parent array.)

(The above differs from the standard presentation of Kosaraju’s algorithm, which instead applies DFS to G to get an ordering, and then applies DFS_visit to G^T repeatedly to get the DFS forest in which each tree is an SCC.)

static ArrayList<Integer> SCC(List<List<Integer>> G) {
  List<List<Integer>> GT = transpose(G);
  List<Integer> finished = DFS(GT);
  Collections.reverse(finished);

  ArrayList<Integer> parent = new ArrayList<>();
  ArrayList<Boolean> visited = new ArrayList<>();
  for (int u = 0; u != G.size(); ++u) {
    parent.add(u);
    visited.add(false);
  }
  ArrayList<Integer> ignore = new ArrayList<>();
  for (Integer u : finished) {
    if (! visited.get(u))
      DFS_visit(G, u, parent, visited, ignore);
  }
  return parent;
}