Some presentations of Kosaraju’s Algorithm don’t provide a detailed explanation of why the algorithm works. Here’s my attempt to explain it.

The story begins with depth-first search (DFS). To review, DFS goes deeper at each step, following an out-edge from the current vertex to a never-before-seen vertex. If there are no out-edges to never-before-seen vertices, then the search backtracks to the last visited vertex with out-edges to never-before-seen vertices and continues from there.

The following graph shows the result of DFS on a small graph. The edges traversed by the DFS are marked in green and form a *depth-first tree*.

**Example of Depth First Search.**

We can categorizes the edges of the graph with respect to the depth-first tree in the following way:

- tree edge: an edge on the tree, e.g., g → c in the graph above.
- back edge: an edge that connects a descendent to an ancestor with respect to the tree, e.g., f → g.
- forward edge: an edge that connects an ancestor to a descendent wrt. the tree, e.g., f → e
- cross edge: all other edges, e.g., k → l.

**Graph with edges categorized by a Depth-First Search. ****The tree edges are in green, back edges in red, forward edges in blue, and cross edges in black.**

**Theorem** A graph has a cycle if and only if there is a back edge.

As we shall see, it is useful to record timestamps on a vertex when it is first *discovered* during a DFS and when it is *finished* (after visiting all of its descendants).

**A graph with discover and finish times from a Depth-First Search.**

**Theorem**

- If u → v is a tree, forward or cross edge, then the finish_time[v] < finish_time[u].
- If u → v is a back edge, then finish_time[u] < finish_time[v].

For reference, here is the code for `DFS`

and its auxiliary function `DFS_visit`

.

```
static void DFS_visit(List<List<Integer>> G, Integer u,
ArrayList<Integer> parent,
ArrayList<Boolean> visited,
```

List<Integer> finish) {

```
.set(u, true);
visitedfor (Integer v : G.get(u)) {
if (! visited.get(v)) {
.set(v, u);
parentDFS_visit(G, v, parent, visited, finish);
}
}
.add(u);
finish}
static List<Integer> DFS(List<List<Integer>> G) {
ArrayList<Integer> parent = new ArrayList<>();
ArrayList<Boolean> visited = new ArrayList<>();
for (int u = 0; u != G.size(); ++u) {
.add(u);
parent.add(false);
visited}
ArrayList<Integer> finish = new ArrayList<>();
for (int u = 0; u != G.size(); ++u) {
.set(u, false);
visited}
for (int u = 0; u != G.size(); ++u) {
if (! visited.get(u))
DFS_visit(G, u, parent, visited, finish);
}
return finish;
```

}

**Definition** A *strongly connected component* is a maximum subset of the vertices in a graph such that every vertex in the subset is reachable from all the other vertices in the subset.

For example, the following graph

has these strongly connected components:

**Definition** The *component graph* C of another digraph G has 1) a vertex for each SCC in G. For each vertex u in C, we write SCC(u) for it’s SCC in G. 2) an edge between u and v if there is an edge from any vertex in SCC(u) to any vertex in SCC(v).

Here is the component graph of the example.

**Theorem** A component graph is acyclic.

Otherwise, the vertices in the cycle represent connected components that are not maximal. They could have been combined into a larger SCC.

## Kosaraju’s Algorithm for SCC

Suppose we do a `DFS_visit`

from a random node in the graph G. We’ll visit all of the other nodes in its SCC (that’s good) but we may also visit nodes in other SCCs (that’s bad).

How can we cause `DFS_visit`

to stop before visiting nodes in other SCCs?

If we run `DFS_visit`

on a node in an SCC that has no out-edges to other SCCs, then we’d just visit the nodes in that SCC and no other. We could then remove those nodes from the graph and repeat.

That’s a lot like a topological ordering on the component graph C, but with out-edges instead of in-edges. So what we need is a topological ordering on the transposed component graph C^T.

**Definition** The *transpose* of a graph G, written G^T, has the same vertices as G but the edges are reversed.

For examaple,

`D, B, A, E`

is a topological ordering of C^T.

But we don’t have C yet… that’s what we’re trying to compute!

Recall that DFS finish times are related to topological ordering. We can apply `DFS`

to G^T to obtain finish times. Here’s the transposed graph of the example with the root and edges of each DFS tree highlighted in green.

and here are the vertices ordered by finish time:

`3, 0, 1, 5, 4, 2`

The vertex that finished last (vertex 2) must be in a SCC (D) that does not have any in-edges in C^T. Why is that? If there were an in-edge from another SCC, then the source of that in-edge would have finished later, but that contradicts vertex 2 being the last to finish. The only way the source of the in-edge could finish earlier would be if it was a back edge, but then the two vertices would be in a cycle and in the same SCC, which contradicts them being in different SCC.

Since the SCC (D) does not have any in-edges in C^T, it doesn’t have any out-edges in the C.

So running `DFS_visit`

on vertex 2 in the original graph will only reach other vertices in its SCC (D). `DFS_visit`

will mark all of those vertices as visited, so later runs of `DFS_visit`

will ignore them.

We continue running `DFS_visit`

on each vertex according to the reverse order of finish time (i.e. 4,5,1,0,3). Each tree in the resulting DFS forest is a SCC.

So here’s the algorithm

- Transpose the graph G to obtain G^T.
- Apply
`DFS`

to G^T to obtain the order in which the vertices finished. - For each vertex u in the reversed finish list, apply
`DFS_visit`

to u in G. - Each of the resulting DFS trees is a SCC. (The trees are encoded in the
`parent`

array.)

(The above differs from the standard presentation of Kosaraju’s algorithm, which instead applies `DFS`

to G to get an ordering, and then applies `DFS_visit`

to G^T repeatedly to get the DFS forest in which each tree is an SCC.)

```
static ArrayList<Integer> SCC(List<List<Integer>> G) {
List<List<Integer>> GT = transpose(G);
List<Integer> finished = DFS(GT);
Collections.reverse(finished);
ArrayList<Integer> parent = new ArrayList<>();
ArrayList<Boolean> visited = new ArrayList<>();
for (int u = 0; u != G.size(); ++u) {
.add(u);
parent.add(false);
visited}
ArrayList<Integer> ignore = new ArrayList<>();
for (Integer u : finished) {
if (! visited.get(u))
DFS_visit(G, u, parent, visited, ignore);
}
return parent;
}
```