Table of Contents

Graph Algorithm Basics, BFS, and DFS

SailGP SF Grand Final 2023. Used on a post about graph algorithms.

Graphs provide a representation of relationships between entities. I utilize graph algorithms to model and analyze relationships, as well as identify patterns, within interconnected data. Graph algorithms are commonly used to help solve problems relating to network performance, resource allocation, financial portfolio optimization, and risk management. 

This blog post covers graph algorithms and their significance in solving complex computational problems.

The quote formatting (like this sentence) is used to provide TLDR definitions for each concept.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


Graphs

In a graph, vertices are the individual points or nodes, and edges are the connections or links between those points or nodes.

Before examining graph algorithms, let’s explore graph basics.

A graph is a collection of nodes (or vertices) and edges that connect these nodes. The edges represent relationships between the nodes. Note that throughout this post, I refer to vertices and nodes interchangeably.

edges and vertexes. Used on a post about graph algorithms.

Graphs can be directed or undirected, weighted or unweighted, and cyclic or acyclic, depending on the nature of their edges.

Directed graphs have edges with a specified direction, while undirected graphs have edges without a specified direction; weighted graphs assign numerical values to edges, whereas unweighted graphs do not; cyclic graphs contain cycles or closed paths, while acyclic graphs do not have any cycles.

Directed vs undirected graph

Source

Directed or undirected

Two fundamental types of graphs are directed and undirected. They differ in how edges between vertices are defined.

Directed graphs

Each edge has a direction in a directed graph or “digraph,” indicating a one-way relationship between two vertices. These graphs are often represented by arrows on the edges, indicating the direction of the relationship. 

Note that if there is an edge from vertex A to vertex B in a digraph, it does not imply the existence of an edge from B to A. However, both relationships can exist concurrently if there are edges pointing in both directions.

Undirected graphs

In an undirected graph, edges have no direction; they represent a two-way relationship between vertices. If there is an edge between vertices A and B, it implies that there is also an edge between B and A.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


Weighted or unweighted

Weighted graphs assign numerical values (weights) to edges, representing the cost or distance between nodes, whereas unweighted graphs lack assigned edge values.

Weighted vs Unweighted Graph

Source 

Cyclic or acyclic

Cyclic graphs have at least one cycle or closed loop, while acyclic graphs lack cycles or closed loops.

Cyclic vs acyclic graphs

Source

Connected components 

Sometimes a graph is split into multiple disjoint components. It is can helpful to identify and count these components depending on the problem I am solving.

Connected Components. Used on a post about graph algorithms.

Source

Effects on algorithm behavior 

Directed or undirected effects

The choice between directed and undirected graphs can significantly impact the design and behavior of algorithms that operate on graphs. 

  1. Traversal algorithms
    1. Undirected graphs: traversal algorithms (e.g., Depth-First Search (DFS) or Breadth-First Search (BFS)) are generally more straightforward because there is no need to consider edge directions.
    2. Directed graphs: traversal algorithms need to account for the direction of edges. For example, in a directed graph, traversing an edge from A to B differs from traversing it from B to A.
  2. Connectivity
    1. Undirected graphs: connectivity is symmetric. If there is a path from vertex A to vertex B, there is also one from B to A.
    2. Directed graphs: connectivity may not be symmetric. There could be a directed path from A to B without a corresponding path from B to A.
  3. Cycles
    1. Undirected graphs: cycles are straightforward to detect using standard algorithms.
    2. Directed graphs: may have directed cycles (cycles where the edges have a specific direction), and detecting them may involve more complex algorithms (e.g., algorithms for detecting strongly connected components – Kosaraju, Tarjan, and Gabow).
  4. Topological sorting
    1. Undirected graphs: not possible.
    2. Directed graphs: directed acyclic graphs (DAGs) are essential for task scheduling. Topological sorting, which orders vertices based on directed edges, is specific to directed graphs and is used in such scenarios.
  5. Degree and in/out-degree
    1. Undirected graphs: the degree of a vertex is the count of edges connected to it.
    2. Directed graphs: vertices have both in-degree (number of incoming edges) and out-degree (number of outgoing edges), providing information about the graph’s structure.

🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


Cyclic or acyclic

  1. Traversal
    1. Acyclic: algorithms for acyclic graphs are often more straightforward since there are no concerns about infinite loops during traversal.
    2. Cyclic: algorithms that traverse graphs (e.g., DFS or BFS) may encounter infinite loops if handled improperly in cyclic graphs.
  2. Connectivity and sorting
    1. Acyclic: can be topologically sorted, which is helpful in scheduling or dependency resolution.
    2. Cyclic: cycles can create alternate paths between nodes, influencing connectivity and the reachability of nodes.
  3. Shortest path
    1. Acyclic: finding the shortest paths becomes more straightforward in acyclic graphs, as there are no cycles to complicate the process.
    2. Cyclic: detecting cycles or finding the shortest path becomes more complex in cyclic graphs.

Connected components 

  1. Graph traversal
    1. Connected components as subgraphs
      1. Each connected component can be treated as a subgraph. When performing graph traversal algorithms, these algorithms may be applied separately to each connected component.
    2. Traversal boundaries
      1. Connected components serve as natural boundaries during traversal. Traversal algorithms typically do not cross between connected components, allowing for “focused” exploration within each component.
  2. Connectivity and reachability
    1. Isolation
      1. Vertices within the same connected component are mutually reachable. However, vertices in different connected components are not reachable from each other directly.
    2. Graph connectivity
      1. The number of connected components influences the overall connectivity of the graph. A graph with a single connected component is fully connected, while multiple connected components indicate a fragmented structure.
  3. Analysis and decomposition
    1. Component size
      1. The size (number of vertices) of connected components can provide insights into the graph’s structure. Small connected components may represent isolated clusters or outliers.
    2. Component identification
      1. Identifying connected components is helpful in community detection or identifying weakly connected regions in a network.
  4. Algorithmic efficiency with component-wise processing
    1. In some cases, algorithms can process connected components independently. Independent processing can lead to more efficient solutions, especially when the behavior within each connected component is relatively independent of others.
  5. Graph connectivity problems with bridges and cut-edges
    1. Identifying connected components is related to the analysis of bridges (edges whose removal increases the number of connected components) and cut-edges. This can help when analyzing the resilience of a graph to edge removal.
  6. Graph visualization
    1. Connected components can be visually represented as separate subgraphs, which can display the graph’s structure.

Graphs representations

Now that we understand how each part of a graph affects the graph’s overall behavior, let’s examine and compare graph representations. 

The two primary representations are adjacency matrices and adjacency lists.

Adjacency matrices

In an adjacency matrix representation, a matrix depicts the connections between nodes. If there is an edge between nodes i and j, the matrix entry at position (i, j) will be 1; otherwise, it will be 0. This representation is efficient for dense graphs, but may be space-inefficient for sparse graphs.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


Adjacency lists

In an adjacency list representation, each node maintains a list of its neighboring nodes. Adjacency lists are more space-efficient, especially for sparse graphs. However, it may take longer to determine if there is an edge between two specific nodes.

DFS vs BFS graph gif

Source 

Common graph algorithms

  1. Breadth-First Search (BFS)
    1. BFS explores a graph level by level, visiting all the neighbors of a node before moving on to the next level. It is often used to find the shortest path in unweighted graphs and can assist with network broadcasting.
  2. Depth-First Search (DFS)
    1. DFS explores a graph by going as deep as possible along each branch before backtracking. It is valuable for topological sorting, cycle detection, and maze solving.
  3. Dijkstra’s algorithm
    1. Dijkstra’s algorithm finds the shortest path in weighted graphs. It helps with routing and network optimization problems.
  4. Bellman-Ford algorithm
    1. Like Dijkstra’s algorithm, the Bellman-Ford algorithm finds the shortest path in weighted graphs. However, it can handle graphs with negative edge weights, making it suitable for a broader range of problems.
BFS graph gif

Source

Breadth-First Search (BFS)

Traverse a graph level by level. Applications include finding the shortest path in an unweighted graph, connected components, and network broadcasting.

BFS explores a graph level by level, starting from a specified source vertex. 

It starts at the tree root (or some arbitrary graph node) and explores the neighbor nodes at the present depth before moving to nodes at the next depth level. Steps: 

  1. Enqueue the root node (initial start node) into a queue.
  2. Mark the initial node as visited.
  3. While the queue is not empty:
    1. Dequeue a node from the queue.
    2. Process the node (perform any desired operation).
    3. Enqueue all unvisited neighbors of the dequeued node.
    4. Mark the dequeued node as visited.

Unlike DFS, which goes as deep as possible along a branch before backtracking, BFS explores the neighbor vertices of the current level before moving on to the next level. This strategy discovers the shortest path to each reachable vertex first, making BFS useful for shortest route problems or when someone needs to explore all vertices within a certain distance from the source.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


Algorithmic mechanics

  1. Queue-based structure
    1. At the heart of BFS lies a queue data structure. The algorithm maintains a queue to keep track of unexplored vertices. Starting with the source vertex, the algorithm dequeues a vertex, explores its neighbors, and enqueues them for further exploration.
  2. Marking visited nodes
    1. To prevent revisiting already explored vertices, BFS uses a marking mechanism. Each vertex is marked as visited once dequeued, ensuring the algorithm avoids cycle entrapment.
  3. Level-order traversal
    1. One strength of BFS is its ability to perform level-order traversal. Level-order traversal explores all vertices at a certain distance from the source before moving to vertices at a greater distance. This property makes BFS a great choice for finding the shortest path or determining connectivity.

Program example

A financial instrument is a tradable asset or contract representing a financial value, such as stocks, bonds, derivatives, or currencies. This program could theoretically help with analysis on how quickly a shock (a type of financial change) would propagate through a network.

from collections import deque
class financialGraph:
def __init__(self):
self.graph = {}
def add_instrument(self, instrument, dependencies):
# add financial instrument and its dependencies to graph
self.graph[instrument] = dependencies
def bfs(self, start_instrument):
visited = set()
queue = deque([start_instrument])
print("bfs traversal:")
while queue:
current_instrument = queue.popleft()
if current_instrument not in visited:
print(current_instrument)
# mark current instrument as visited
visited.add(current_instrument)
# enqueue dependencies for exploration
if current_instrument in self.graph:
queue.extend(self.graph[current_instrument])
# example usage:
if __name__ == "__main__":
# create financial graph
financial_graph = financialGraph()
# add instruments and dependencies to graph
financial_graph.add_instrument("stock A", ["stock B", "option C"])
financial_graph.add_instrument("stock B", ["bond D"])
financial_graph.add_instrument("option C", ["stock D"])
financial_graph.add_instrument("bond D", [])
# perform bfs traversal starting from specific instrument
financial_graph.bfs("stock A")
# bfs traversal:
# stock A
# stock B
# option C
# bond D
# stock D
view raw bfs2.py hosted with ❤ by GitHub

BFS applications

1. Shortest path in unweighted graphs 

As I stated previously, BFS excels at finding the shortest path between two vertices in an unweighted graph. Its level-order traversal ensures that the first occurrence of the destination vertex yields the shortest route. 

Here is how BFS accomplishes this:

  1. Initialization
    1. Start BFS from the source vertex.
    2. Initialize a queue and enqueue the source vertex.
    3. Mark the source vertex as visited.
  2. Exploration
    1. While the queue is not empty:
      1. Dequeue a vertex from the front of the queue.
      2. Explore all unvisited neighbors of the dequeued vertex.
      3. Enqueue each unvisited neighbor and mark it as visited.
  3. Recording the path
    1. Maintain a data structure (e.g., an array or dictionary) to keep track of the parent of each vertex in the current traversal.
    2. When exploring a neighbor, set its parent as the current dequeued vertex.
  4. Termination
    1. Continue until reaching the destination vertex or the entire graph is traversed.
  5. Shortest path reconstruction
    1. Upon reaching the destination vertex, backtrack from the destination to the source vertex using the recorded parent information to reconstruct the shortest path.

The key idea is that BFS explores the graph breadth-first, ensuring that vertices closer to the source are visited before vertices farther away. As a result, the first occurrence of the destination vertex during the BFS traversal represents the shortest path.

Since the graph is unweighted, each edge is treated as having the same weight. Therefore, the first time encountering a vertex is guaranteed to be the shortest path during the BFS traversal. (Level-order exploration results in discovering longer paths later). 


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


2. Maze solving

By modeling the maze as a graph and applying BFS, we can quickly determine the shortest path from the entrance to the exit. Here is how BFS accomplishes this:

  1. Graph representation
    1. Represent the maze as a graph, where each cell is a vertex, and edges represent possible movements between adjacent cells (up, down, left, right).
    2. If a cell is blocked or contains an obstacle, it is not considered a valid vertex or edge in the graph.
  2. Initialization
    1. Start BFS from the maze entrance.
    2. Initialize a queue and enqueue the entrance cell.
    3. Mark the entrance cell as visited.
  3. Exploration
    1. While the queue is not empty:
      1. Dequeue a cell from the front of the queue.
      2. Explore all unvisited neighboring cells accessible (not blocked) from the current cell.
      3. Enqueue each unvisited and accessible neighbor and mark it as visited.
  4. Recording the path
    1. Maintain a data structure (e.g., an array or dictionary) to keep track of the parent of each cell in the current traversal.
    2. When exploring a neighbor, set its parent as the current dequeued cell.
  5. Termination
    1. Keep traversing the maze until it reaches the exit cell or traverses the entire maze.
  6. Shortest path reconstruction
    1. Once reaching the exit cell, reconstruct the shortest path by backtracking from the exit to the entrance using the recorded parent information.

By exploring the maze using BFS, the algorithm ensures that it discovers the shortest path first. Because BFS traverses the maze level by level, it ensures the exit path is the shortest possible when it first encounters the exit cell during the traversal.

3. Network routing

BFS is crucial in discovering the optimal path between routers in network routing algorithms. Its efficiency makes it an attractive choice for scenarios where minimizing latency is essential. Here is how BFS achieves this in the context of network routing.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


Steps
  1. Graph representation
    1. Represent the network as a graph, where routers are vertices, and connections between routers are edges.
    2. Assign weights to edges based on metrics like latency, bandwidth, or any other relevant criteria.
  2. Initialization
    1. Start BFS from the source router.
    2. Initialize a queue and enqueue the source router.
    3. Mark the source router as visited.
  3. Exploration
    1. While the queue is not empty:
      1. Dequeue a router from the front of the queue.
      2. Explore all unvisited neighboring routers.
      3. Enqueue each unvisited neighbor and mark it as visited.
      4. Consider the weights on edges to determine the optimal path.
  4. Recording the path and metrics
    1. Maintain a data structure (e.g., an array or dictionary) to keep track of the parent of each router in the current traversal.
    2. Record metrics (e.g., cumulative latency) for each router along the way.
  5. Termination
    1. Keep traversing the network until the algorithm reaches the destination router or traverses the entire network.
  6. Optimal path reconstruction
    1. Upon reaching the destination router, reconstruct the optimal path by backtracking from the destination to the source using the recorded parent information.
    2. Extract relevant metrics to determine the quality of the optimal path.

By exploring the network using BFS, the algorithm discovers the optimal path with minimal latency. Since BFS traverses the network breadth-first, it naturally prioritizes shorter paths before exploring longer ones. This also explains why BFS is so helpful when minimizing latency is a priority (e.g., real-time applications and video streaming).

4. Connected components

BFS also aids in finding connected components in an undirected graph. By starting BFS from unvisited vertices, it identifies and labels different connected components. Here is how BFS accomplishes this:

  1. Initialization
    1. Start BFS from an unvisited vertex in the undirected graph.
    2. Initialize a queue and enqueue the chosen vertex.
    3. Mark the vertex as visited.
  2. Exploration
    1. While the queue is not empty:
      1. Dequeue a vertex from the front of the queue.
      2. Explore all unvisited neighbors of the dequeued vertex.
      3. Enqueue each unvisited neighbor and mark it as visited.
  3. Labeling connected components
    1. Continue the exploration until the queue is empty. Labeling ensures that the BFS process covers an entire connected component.
    2. All the vertices visited during this BFS traversal belong to the same connected component.
    3. Assign a label or identifier to this connected component.
  4. Repeat for unvisited vertices
    1. If there are still unvisited vertices in the graph, select another unvisited vertex and start BFS from that vertex.
    2. Assign a new label to the vertices visited during this BFS traversal, indicating a new connected component.
  5. Termination
    1. Continue this process until all vertices in the graph are visited.

The BFS algorithm identifies and labels different connected components in the undirected graph. Each BFS traversal from an unvisited vertex forms a connected component, and the process repeats until all vertices are covered. The result is a decomposition of the graph into distinct connected components.

Connected components help me analyze the structure of undirected graphs by showing the graph’s connectivity patterns. BFS accomplishes this task by visiting all vertices within a connected component before exploring other components. It guarantees that each connected component is identified correctly and labeled uniquely.

5. Web crawling 

BFS web crawling starts with the initial page, and explores its links. Then, it explores the links of those pages, and so on.

BFS is used in web crawling and social network analysis to explore the connectivity of websites and social networks. Here is how BFS applies to these tasks:

  1. Initialization
    1. Start BFS from an initial web page or a set of initial pages.
    2. Initialize a queue and enqueue the initial pages.
    3. Mark the initial pages as visited.
  2. Exploration
    1. While the queue is not empty:
      1. Dequeue a web page from the front of the queue.
      2. Retrieve the links on the dequeued page.
      3. Enqueue each unvisited link and mark it as visited.
  3. Repeat
    1. Continue the process, exploring the links of newly dequeued pages.
    2. The BFS traversal makes sure to visit pages closer to the initial page before exploring deeper pages.
  4. Termination
    1. Keep crawling until the algorithm reaches a specified depth, crawls a specific number of pages, or traverses the entire website.

6. Social network analysis

BFS social network analysis illustrates the degrees of separation between individuals.

  1. Initialization
    1. Start BFS from an initial individual or a set of initial individuals in the social network.
    2. Initialize a queue and enqueue the initial individuals.
    3. Mark the initial individuals as visited.
  2. Exploration
    1. While the queue is not empty:
      1. Dequeue an individual from the front of the queue.
      2. Retrieve the friends or connections of the dequeued individual.
      3. Enqueue each unvisited friend and mark them as visited.
  3. Degree of separation
    1. As BFS progresses, each level of traversal represents a degree of separation.
    2. The first level consists of the initial individuals, the second level their immediate connections, and so on.
  4. Repeat
    1. Continue the process, exploring the connections of newly dequeued individuals.
    2. The BFS traversal ensures that it visits individuals closer in terms of degrees of separation before exploring those farther away.
  5. Termination
    1. Keep traversing the social network until the algorithm reaches a specified depth (degree of separation) or traverses the entire network.

BFS is suitable for this task because it explores pages or individuals level by level. In the context of social networks, the levels correspond to degrees of separation between individuals, showing the structure and connectivity of the network. In web crawling, BFS helps discover and index pages in a structured manner, ensuring that pages are visited early in the process.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


7. Garbage collection

Lastly, BFS can help with garbage collection algorithms to discover and free memory that is no longer reachable. Here is how BFS achieves this in garbage collection algorithms:

  1. Graph representation
    1. Represent the memory space as a graph, where nodes represent allocated memory blocks, and edges represent references or pointers between these blocks.
    2. Each node in the graph corresponds to a memory block, and edges represent references from one block to another.
  2. Initialization
    1. Start BFS from a set of root nodes. These root nodes are typically the starting points for memory references, such as global variables or objects referenced by the program.
  3. Exploration
    1. While the queue is not empty:
      1. Dequeue a memory block from the front of the queue.
      2. Explore all references or pointers from the dequeued block to other blocks.
      3. Enqueue each referenced unvisited block.
  4. Mark-and-sweep
    1. During the exploration, mark each visited block as reachable.
    2. After the BFS traversal is complete, perform a sweep phase to identify and reclaim memory blocks marked as unreachable.
    3. The unreached memory blocks are considered garbage and can be freed.
  5. Repeat for unvisited roots
    1. If there are additional roots or starting points for memory references, start BFS from these unvisited roots.
    2. Repeat the mark-and-sweep process until all reachable memory blocks are marked.
  6. Garbage collection
    1. The marked memory blocks are considered live or reachable, while the unmarked blocks are considered garbage.
    2. Free the memory occupied by the garbage blocks, making it available for reuse.
Garbage Collection conclusion

BFS is well-suited for garbage collection because it ensures that memory blocks are visited breadth-first. Here, blocks closer to the root nodes are marked and collected first, and the algorithm gradually moves outward through the memory graph. As a result, it identifies and collects garbage efficiently, reclaiming memory that is no longer in use.

Garbage collection helps manage memory in programming languages with automatic memory management, since it helps prevent leaks and ensures efficient use of available resources. Using BFS in garbage collection algorithms contributes to the systematic identification and collection of unreferenced memory, helping maintain the health and performance of a program.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


Depth-First Search (DFS)

While BFS focuses on exploring neighboring nodes before moving deeper into the graph, DFS takes a different approach. DFS traverses a graph by exploring as far as possible along each branch before backtracking, delving deeply into the structure. This method is particularly valuable in topological sorting, identifying strongly connected components, and solving select puzzles.

DFS traverses a graph deeply, exploring as far as possible along each branch before backtracking. Applications include topological sorting, strongly connected components, and solving puzzles.

Mechanics 

  1. Definition and concept
    1. DFS is a graph traversal algorithm that explores as far as possible along each branch before backtracking. The ‘depth’ in DFS refers to how deep the algorithm explores a particular branch before considering other paths.
  2. Stacks and recursion
    1. DFS can utilize a stack or recursion. The stack mimics a recursion call stack, maintaining information about the nodes to be visited. Recursive DFS, on the other hand, expresses the algorithm’s “backtracking nature.”

Example program

1. Pseudocode

Understanding DFS starts with learning its pseudocode. A simple representation might look like this:

DFS(node):

      if node is not visited:

         mark node as visited

         for each neighbor in neighbors(node):

            DFS(neighbor)

2. Traversing a graph

Here is a step-by-step example GIF of how DFS traverses a graph. Visualizing the algorithm in action helps comprehend its depth-first nature.

DFS gif

Source

DFS applications 

DFS’s deep exploration before backtracking makes it a valuable tool in many applications. Here are some of those applications.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


1. Graph traversal

DFS can explore and visit all the vertices and edges of a graph. This application is prominent in social network analysis, network routing, and recommendation systems.

  1. Start from a source node
    1. Choose a starting node in the graph as the initial point for traversal.
  2. Mark the node as visited
    1. Mark the chosen node as visited to keep track of the explored nodes.
  3. Explore adjacent nodes
    1. Move to an adjacent, unvisited node from the current node. If there are multiple adjacent nodes, choose one and repeat the process.
  4. Recursive exploration
    1. If there are unvisited nodes adjacent to the current one, repeat step 3 by recursively applying DFS to the chosen adjacent node.
  5. Backtrack
    1. If there are no unvisited nodes from the current one, backtrack to the previous node (if possible) and repeat step 3.
  6. Repeat until all nodes are visited
    1. Continue steps 3-5 until all nodes in the graph are visited.

2. Maze solving

DFS is an excellent choice for solving mazes. The algorithm can navigate through the maze, exploring paths until it finds a solution.

  1. Initialization
    1. Start at the maze entrance.
    2. Mark the starting point as visited.
    3. Create a stack to keep track of the path.
  2. Explore neighbors
    1. Look at the current cell’s neighbors (e.g., up, down, left, right).
    2. Choose one unvisited neighbor and move to that cell.
    3. Mark the new cell as visited and push it onto the stack.
  3. Backtrack if necessary
    1. If there are no unvisited neighbors, backtrack by popping the last cell from the stack.
    2. Repeat this process until it finds a cell with unvisited neighbors or the stack is empty.
  4. Goal check
    1. Repeat steps 2-3 until the algorithm reaches the exit of the maze.
    2. The algorithm finds a solution when it reaches the exit.

3. Topological sorting

In directed acyclic graphs (DAGs), DFS can perform topological sorting. DFS with DAGs is often used to schedule tasks with dependencies (e.g., project management), where the order of tasks matters.

Topological sorting orders the vertices of a directed graph such that for every directed edge (u, v), vertex u comes before v in the ordering. Here are steps for DFS-based topological sorting:

  1. Initialization
    1. Start with an empty result list or stack to store the topological order.
    2. Initialize a set or array to keep track of visited vertices.
  2. DFS traversal
    1. Perform a DFS traversal starting from any unvisited vertex.
    2. During the traversal, recursively visit the neighbors of the current vertex.
  3. Mark visited
    1. Mark the current vertex as visited before pushing it to the result list or stack.
    2. This step ensures that a vertex is only added to the topological order after exploring all its neighbors.
  4. Backtrack and record
    1. After visiting all neighbors of a vertex, backtrack to the previous vertex.
    2. Add the current vertex to the result list or stack.
  5. Repeat for unvisited vertices
    1. Repeat steps 2-4 for any unvisited vertices in the graph.
  6. Result
    1. The topological order is the reverse of the order in which vertices were added to the result list or stack.

4. Connected components

DFS can identify connected components in an undirected graph. This application is helpful in network analysis to learn about the structure and dynamics of a network.

  1. Initialize data structures
    1. Create a boolean array to mark visited nodes.
    2. Initialize a counter for connected components.
  2. DFS algorithm
    1. Start DFS from an unvisited node.
    2. Mark the current node as visited.
    3. Explore all adjacent nodes that are not visited, and recursively apply DFS on each.
    4. Repeat until all nodes are visited.
  3. Count connected components
    1. After DFS completes for the first connected component, increment the counter.
    2. Repeat the DFS for the next connected component if unvisited nodes remain.
  4. Result
    1. The counter now holds the total number of connected components in the graph.
    2. You can also maintain a data structure to store nodes belonging to each connected component.

🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


5. Cycle detection

DFS can be employed to detect cycles in a graph. This application helps in deadlock detection in operating systems, where cycles can lead to system instability.

  1. Start DFS from any node in the graph.
  2. Mark the current node as visited.
  3. Explore each adjacent node:
    1. If the adjacent node is not visited, recursively apply DFS on it.
    2. A cycle is detected if the adjacent node is visited and not the current node’s parent.
  4. Repeat steps 2-3 for all unvisited nodes.
  5. If unvisited nodes exist, choose one and repeat steps 2-4.
  6. If no cycles are detected, the graph is acyclic.

This approach identifies cycles by checking for back edges during the DFS traversal. If an edge leads to a visited node that is not the current node’s parent, it indicates a cycle’s presence.

6. Pathfinding algorithms

While DFS might not be the primary choice for finding the shortest path in weighted graphs, its principles inspire other path-finding algorithms. Algorithms like Depth-Limited Search and Iterative Deepening Depth-First Search use DFS principles to find optimal paths in complex scenarios.

General path finding with DFS 
  1. Initialize data structures
    1. Create a stack to keep track of the nodes to be visited.
    2. Create a boolean array or set to mark visited nodes.
  2. Start node
    1. Push the starting node onto the stack.
    2. Mark the starting node as visited.
  3. Explore neighbors
    1. While the stack is not empty:
      1. Pop a node from the stack.
      2. Explore its neighbors.
      3. For each unvisited neighbor, mark it as visited and push it onto the stack.
  4. Termination
    1. Continue until the stack is empty or the goal node is found.

Remember, DFS does not guarantee the shortest path in weighted graphs, but variations like Iterative Deepening DFS can find optimal paths in some of these scenarios.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


Iterative Deepening DFS (IDDFS)
  1. Initialize
    1. Set the depth limit to 0.
    2. Repeat these steps until the algorithm finds the goal or explores the entire search space.
  2. Depth-first search with depth limit
    1. Perform a DFS with the current depth limit.
    2. Mark the starting node as visited.
    3. Explore neighbors up to the current depth limit.
  3. Check for goal
    1. Stop and return the solution if the DFS discovers the goal node.
    2. If the algorithm does not find the goal, increase the depth limit.
  4. Repeat
    1. Repeat steps 2-3 with the increased depth limit.
  5. Termination
    1. Continue this process until DFS finds the goal or explores the entire search space.

By incrementally increasing the depth limit in each iteration, Iterative Deepening DFS combines the benefits of DFS with the completeness of BFS. It explores the search space depth-first while gradually increasing the depth limit to find the optimal solution. This approach is instrumental in scenarios where the depth of the solution is unknown. 

Depth-Limited Search 

Depth-Limited Search is a variant of DFS where the search is restricted to a fixed depth. In traditional DFS, the algorithm explores as far as possible along each branch before backtracking. However, in Depth-Limited Search, the exploration is limited to a specific depth level, and the search backtracks if the goal is not found within that depth limit.

  1. Initialize
    1. Start from the initial state.
    2. Set the depth limit, which restricts the depth of the Search.
  2. Perform Depth-Limited Search
    1. Explore nodes at the current depth level.
    2. Apply the actions applicable to the current state to generate successor states.
    3. Return the solution if the algorithm finds the goal state.
  3. Recursion
    1. If the depth limit is not reached, recursively apply the depth-limited search to the successor states with a reduced depth limit.
  4. Backtrack
    1. If the current depth level does not yield a solution, backtrack to the previous state and explore other branches.
  5. Repeat
    1. Repeat steps 2-4 until the algorithm finds a solution or searches the entire space within the specified depth limit.
  6. Handle cutoffs
    1. If a cutoff occurs due to reaching the depth limit, it means the solution might exist beyond the current depth. Handle this appropriately, considering the solution might be deeper in the search space.
  7. Return solution
    1. Once the algorithm reaches the goal state, return the solution path.
  8. Termination
    1. If the algorithm explores the entire search space without finding a solution, terminate the Search and conclude that no solution exists within the specified depth limit.

Remember that Depth-Limited Search is a variant of DFS with a limited exploration depth to prevent infinite loops and improve efficiency. Adjusting the depth limit affects the trade-off between completeness and efficiency.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


7. Compiler design

Compiler design utilizes DFS for syntax analysis and code generation. It helps in traversing and analyzing the syntax tree or abstract syntax tree representing the structure of a program.

  1. Construct syntax tree
    1. Begin by constructing the syntax tree or abstract syntax tree (AST) from the program’s source code. Each node in the tree represents a syntactic construct in the code.
  2. Initialize DFS
    1. Start DFS from the root of the syntax tree. Initialize a stack to keep track of the visited nodes. 
  3. Visit nodes
    1. Perform the syntax analysis or code generation operations for each visited node. This could involve checking for language constructs, handling variable declarations, managing control flow structures, etc.
  4. Traverse children
    1. Traverse the children of the current node in a depth-first manner. Push each child onto the stack for further exploration.
  5. Backtrack
    1. If a leaf node is reached or all children of a node have been visited, backtrack by popping the current node from the stack.
  6. Continue DFS
    1. Continue the DFS process until all nodes in the syntax tree have been visited. This ensures a comprehensive analysis of the program’s structure.
  7. Generate code
    1. During the DFS traversal, generate code based on the syntactic constructs encountered. This may involve emitting assembly or machine code instructions corresponding to the high-level language constructs.
  8. Handle errors
    1. Implement error-handling mechanisms within the DFS traversal. This is crucial for identifying and reporting syntax errors or other issues in the source code.
  9. Output
    1. Once the DFS traversal and code generation is complete, produce the final output, which could be an executable file, intermediate code, or other forms, depending on the compiler’s design.

Applying DFS in compiler design ensures a systematic and efficient traversal of the syntax tree, enabling the analysis and generation of code for the given programming language.


🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.


Conclusion on graphs

Graph algorithms are the backbone of social network analysis, route planning, network latency. Understanding the fundamentals of graph theory, like BFS and DFS, allows us to create efficient programs. 

Remember, BFS systematically explores and visits a graph’s vertices level by level, starting from a specified source vertex. It is particularly effective in finding the shortest path and exploring the nearest neighbors in graph-based structures or networks. Conversely, BFS’s main weakness lies in its potential inefficiency when dealing with large graphs or networks, as it explores nodes level by level without considering the specific characteristics of the problem, leading to increased time and space complexity.

On the other hand, DFS explores as far as possible along each branch before backtracking, effectively traversing the depth of a graph. DFS has a relatively low space complexity compared to its breadth-first counterpart. Its memory requirements depend on the depth of the recursion or the size of the stack. While DFS is excellent for exploration, it may not be ideal for certain tasks. For example, it might not be the most efficient algorithm for finding the shortest path in a weighted graph.

If you enjoyed this post on graph algorithm basics, consider reading Using Greedy and Dynamic Optimization Algorithms.

Portrait of Olivia Gallucci in garden, used in LNP article.

Written by Olivia Gallucci

Olivia is senior security engineer, certified personal trainer, and freedom software advocate. She writes about offensive security, open source software, and professional development.

Discover more from [ret]2read

An OS Internals Newsletter by Olivia Gallucci. Subscribe now to keep reading and get access to the full archive.

Continue reading