Chapter 32 — 🧭 Search & Problem Solving

📖 All chapters | ← 31 · 🧰 Tools & Frameworks | 33 · 📖 Knowledge Representation & Reasoning →

📚 Jump to any chapter

🧮 Mathematical Foundations

🧭 The ML Workflow

🧩 Classical Machine Learning

🎲 Probabilistic Models

13 · 🕸️ Probabilistic Graphical Models

🧠 Deep Learning

🗣️ Applied AI: Vision, Language, Audio & Time

🕹️ Reinforcement Learning

25 · 🕹️ Reinforcement Learning

🛠️ Applied ML Systems & Industries

🚀 Production, Tooling & Infrastructure

📚 Classical & Symbolic AI

⚖️ Responsible AI & Frontier

🎓 Advanced & Specialized Topics

🎚️ Post-Training & Fine-Tuning

🚢 Model Serving & Deployment

47 · 🚢 Model Serving & Deployment in Production

Search is the oldest and most general problem-solving engine in AI: when you can describe where you are, what you can do, and what counts as done, search systematically explores the space of possibilities to find a sequence of actions that gets you to a goal. It sits at the root of Classical & Symbolic AI and quietly powers planners, theorem provers, route finders, puzzle solvers, and the move-selection logic inside game-playing programs. Master it once and you have the scaffolding for half of GOFAI.

🧭 In context: Classical & Symbolic AI · used to find action sequences from a start state to a goal · the one key idea: turn any problem into a graph of states and explore it smartly.

💡 Remember this: Once you can name a state, the legal actions, a goal test, and a step cost, almost any problem becomes “explore this graph smartly” — and the heuristic you bolt on decides how much of the graph you can skip.

32.1 — Problem formulation (states, actions, goal test, path cost)

Before you can search, you must formulate the problem — translate a messy real-world task into a few precise pieces. Think of it like describing a board game to someone who has never seen it: you tell them where the pieces start, what moves are legal, and how you know the game is won. Everything else in this chapter assumes this step is done well.

A search problem is defined by these pieces:

State space — the set of all configurations the world can be in. A state is one snapshot (e.g., “the agent is in city Arad”).
Initial state — where you begin.
Actions — given a state, which moves are legal. The function \(\text{Actions}(s)\) returns the applicable actions in state \(s\).
Transition model — \(\text{Result}(s, a)\) gives the new state after doing action \(a\) in state \(s\). Together the states plus transitions form a directed graph.
Goal test — a predicate that says whether a state is a goal. It can be one explicit state (“be in Bucharest”) or a property (“the queens don’t attack each other”).
Path cost — a number summing the cost of each step along a path. The step cost \(c(s, a, s')\) is the cost of one action; we usually want the cheapest path.

A solution is a path (a sequence of actions) from the initial state to a goal state; an optimal solution is one of least path cost.

The path cost of a solution is just the running total of step costs along it:

\[g(\text{path}) = \sum_{i=1}^{k} c(s_{i-1}, a_i, s_i)\]

In words: the total cost of a path is the sum of the costs of each individual action taken along it, from the first step to the last. Also written: \(g = c_1 + c_2 + \dots + c_k\) — add up the per-step costs one after another (a cumulative sum).

Worked example — the 8-puzzle. A 3×3 grid holds tiles 1–8 and one blank. A state is an arrangement of tiles; the blank can slide Up/Down/Left/Right (swapping with a neighbor). The goal test checks for the sorted arrangement. Each move costs 1, so the path cost equals the number of moves.

initial          goal
7 2 4          1 2 3
5 0 6   ==>    4 5 6
8 3 1          7 8 0      (0 = blank)

The state space has \(9!/2 = 181{,}440\) reachable states — far too many to eyeball, which is exactly why we need systematic search rather than guessing.

The flow from a real-world task to a runnable search is always the same handful of steps:

flowchart LR
  A["Real problem"] --> B["State representation"]
  B --> C["Actions + Result(s,a)"]
  C --> D["Goal test"]
  D --> E["Path cost"]
  E --> F["Search algorithm explores the graph"]

A small but important idea is abstraction: the state you write down should keep only what matters. For route-finding you record the current city, not the weather or the color of the car. A good abstraction is one where every abstract solution can still be carried out in the real world, and the abstraction is as coarse as you can get away with — fewer details mean a smaller graph to search.

32.1.1 — The search tree vs. the state graph

Here is a distinction that trips up almost everyone the first time: the state space and the search tree are not the same object. The state space is the underlying map — each distinct configuration appears once, and roads (transitions) may loop back on themselves. The search tree is the record of paths the algorithm has explored — and the same state can show up many times in that tree if there are several routes to reach it.

Intuition: think of a city map (the state graph) versus all the different walking routes you could trace through it (the tree). The city has one “Main Square,” but a list of routes might mention Main Square a dozen times because a dozen different walks pass through it. A search algorithm builds nodes — bookkeeping objects, each holding a state, its parent, the action that produced it, and the path cost \(g\) — and a single state can hide behind many nodes.

flowchart TB
  subgraph graph["State graph (the map)"]
    a((A)) --- b((B))
    b --- c((C))
    a --- c
  end
  subgraph tree["Search tree from A (the routes)"]
    A2((A)) --> B2((B))
    A2 --> C2((C))
    B2 --> C2b((C))
    C2 --> B2b((B))
  end

This is exactly why graph search keeps a seen/explored set: without it, the same state gets re-expanded through every path that reaches it, and on a cyclic map the tree is infinite. Tree search omits that set (cheaper memory, risk of loops); graph search pays for a visited set to guarantee each state is expanded at most once. Every code skeleton in this chapter that tracks seen is doing graph search.

Tip

The hardest part of search is usually the formulation, not the algorithm. Choosing a compact state representation and the right action granularity (abstracting away irrelevant detail) can shrink a hopeless space to a tractable one before any code runs.

32.2 — Uninformed search (BFS, DFS, uniform-cost, iterative deepening)

Uninformed (or blind) search uses only the problem definition — no extra hints about which states look promising. All these algorithms share one skeleton: keep a frontier (the set of generated-but-not-yet-expanded nodes), repeatedly pull a node off it, test for the goal, and expand it (generate its successors). What differs is the order in which the frontier is processed — and that single choice changes everything about the algorithm’s behavior.

We judge each strategy on four properties:

Completeness — is it guaranteed to find a solution if one exists?
Optimality — is it guaranteed to find the cheapest solution?
Time complexity — how many nodes does it generate?
Space complexity — how many nodes does it hold in memory at once?

Let \(b\) be the branching factor (successors per node), \(d\) the depth of the shallowest goal, and \(m\) the maximum depth of the tree.

Breadth-first search (BFS) expands the shallowest node first, using a FIFO queue. It explores level by level, like a ripple spreading outward. It is complete and, when every step costs the same, optimal — it reaches the goal at the smallest depth first. Its curse is memory: it must store the entire frontier, \(O(b^d)\) nodes. With \(b=10\) and \(d=12\) that is \(10^{12}\) nodes — memory, not time, is what kills BFS in practice.

Depth-first search (DFS) expands the deepest node first, using a LIFO stack (or recursion). It plunges down one branch, backtracking only at dead ends. Its virtue is memory: it stores only the current path plus unexpanded siblings, \(O(bm)\) — linear. Its vices: it is not optimal, and on infinite or cyclic graphs it may run forever (not complete) unless you track visited states.

Uniform-cost search (UCS) is BFS’s cost-aware cousin: it expands the node with the lowest path cost \(g(n)\) so far, using a priority queue ordered by \(g\). When step costs vary, this — not BFS — is what guarantees the cheapest solution. It is complete and optimal given non-negative step costs. It is essentially Dijkstra’s algorithm rediscovered as a search strategy — the same shortest-path machinery that resurfaces in graph machine learning.

Iterative deepening search (IDS) is the clever hybrid: run a depth-limited DFS with limit 0, then 1, then 2, and so on until the goal is found. Each round rediscovers the shallow nodes, but because the number of nodes grows exponentially with depth, the wasted re-work is negligible — the deepest level dominates the count. IDS gets BFS’s completeness and optimality (with equal step costs) at DFS’s \(O(bd)\) linear memory. It is the go-to uninformed method when the search depth is unknown.

Consider this tiny tree to see how the order of expansion differs:

flowchart TB
  R(("Start"))
  R --> A(("A"))
  R --> B(("B"))
  A --> C(("C"))
  A --> D(("D"))
  B --> E(("E"))
  B --> G(("Goal"))
  classDef goal fill:#cdeccd,stroke:#2a2;
  class G goal;

For this tree, BFS visits Start, A, B, C, D, E, Goal — finding Goal at depth 2 only after exploring the whole level above it. DFS visits Start, A, C, D, B, E, Goal — diving all the way into A’s subtree before it ever looks at B.

You can feel the difference in how the two sweep a tree: BFS fills a whole level before dropping down (a ripple spreading outward), while DFS plunges straight down one branch and only backtracks at a dead end. The two highlighted nodes below pulse to show the order each strategy reaches them:

This “where does the next node come from?” choice is the whole story, and a single picture captures it — the only difference between BFS, DFS, and UCS is which end of the frontier you remove from:

The shared skeleton in code is short; only the data structure you pull from changes:

from collections import deque
import heapq

def bfs(start, goal, succ):
    frontier = deque([(start, [start])])      # (node, path)
    seen = {start}
    while frontier:
        node, path = frontier.popleft()       # FIFO -> shallowest first
        if node == goal: return path
        for nxt in succ(node):
            if nxt not in seen:
                seen.add(nxt); frontier.append((nxt, path + [nxt]))

def ucs(start, goal, succ_cost):              # succ_cost(n) -> [(nxt, step_cost)]
    frontier = [(0, start, [start])]          # (g, node, path), heap ordered by g
    best = {start: 0}
    while frontier:
        g, node, path = heapq.heappop(frontier)
        if node == goal: return g, path
        for nxt, c in succ_cost(node):
            ng = g + c
            if ng < best.get(nxt, float('inf')):
                best[nxt] = ng
                heapq.heappush(frontier, (ng, nxt, path + [nxt]))

# tiny self-check
graph = {"S": ["A", "B"], "A": ["C", "D"], "B": ["E", "G"],
         "C": [], "D": [], "E": [], "G": []}
assert bfs("S", "G", lambda n: graph[n]) == ["S", "B", "G"]

The four blind strategies compared on the properties that matter:

Strategy	Complete?	Optimal?	Time	Space
BFS	Yes	Yes (equal costs)	\(O(b^d)\)	\(O(b^d)\)
DFS	No (∞/cyclic)	No	\(O(b^m)\)	\(O(bm)\)
Uniform-cost	Yes	Yes	\(O(b^{1+\lfloor C^*/\varepsilon\rfloor})\)	same
Iterative deepening	Yes	Yes (equal costs)	\(O(b^d)\)	\(O(bd)\)

Here \(C^*\) is the optimal cost and \(\varepsilon\) the smallest step cost.

32.2.1 — Using a real library (networkx)

In practice you rarely hand-roll BFS or Dijkstra on a real graph — a battle-tested library has the edge cases (cycles, disconnected components, ties) already handled. networkx gives you the same algorithms in one call, which is the lazy and correct choice for anything beyond a teaching example:

import networkx as nx

# Romania-style weighted road map
G = nx.Graph()
G.add_weighted_edges_from([
    ("S", "A", 4), ("A", "Z", 3),    # S->A->Z costs 7
    ("S", "B", 2), ("B", "G", 6), ("G", "Z", 3),   # S->B->G->Z costs 11
])

print(nx.shortest_path(G, "S", "Z"))                       # BFS (unweighted): ['S', 'A', 'Z']
print(nx.shortest_path(G, "S", "Z", weight="weight"))      # Dijkstra/UCS: ['S', 'A', 'Z']
print(nx.shortest_path_length(G, "S", "Z", weight="weight"))  # 7

# A* with a heuristic: networkx takes h(u, v) as a function
h = {"S": 7, "A": 3, "B": 6, "G": 3, "Z": 0}     # straight-line estimate to Z
print(nx.astar_path(G, "S", "Z", heuristic=lambda u, v: h[u], weight="weight"))

The from-scratch versions above are for understanding the mechanism; networkx (or scipy.sparse.csgraph for big numeric graphs) is what you ship.

Warning

Plain DFS on a graph with cycles can loop forever. Always track visited states (graph search) or impose a depth limit. And don’t reach for BFS on deep problems — its \(O(b^d)\) memory exhausts RAM long before time runs out; use iterative deepening instead.

32.3 — Informed / heuristic search (greedy best-first, A*)

Blind search treats every frontier node as equally promising. Informed search adds a heuristic \(h(n)\) — a cheap estimate of the cost from node \(n\) to the nearest goal — to steer toward the goal and skip whole regions of the space. The heuristic is domain knowledge injected as a function; for road travel, the straight-line (“as the crow flies”) distance to the goal city is a classic example.

Greedy best-first search expands the node that looks closest to the goal: it orders the frontier by \(h(n)\) alone. It is often fast but myopic — by ignoring the cost already paid, it can charge toward a goal that turns out to be expensive, or even get stuck in dead ends. It is not optimal and not complete in general.

A* (“A-star”) fixes this by combining both numbers:

\[f(n) = g(n) + h(n)\]

In words: the estimated total cost of the best path through node \(n\) equals what you have already spent getting to \(n\) plus a guess of what it will still cost to reach the goal from \(n\). Also written: \(f(n) = \underbrace{g(n)}_{\text{cost behind you}} + \underbrace{h(n)}_{\text{estimated cost ahead}}\) — split the total into “past” and “future” halves.

where \(g(n)\) is the actual cost from the start to \(n\), and \(h(n)\) is the estimated cost from \(n\) to the goal. So \(f(n)\) estimates the total cost of the cheapest path through \(n\). A* always expands the lowest-\(f\) frontier node. Intuitively it balances “how far I’ve come” against “how far I think I still have to go.” UCS is the special case \(h=0\), greedy is the special case \(g=0\), and A* is the principled middle between them.

The three informed strategies are really one knob — how much do you trust the past versus the future?

Worked example — A* on a road map. Cities connected by roads with travel costs, plus a straight-line heuristic \(h\) to the goal Z:

Node	\(g\) (cost so far)	\(h\) (to Z)	\(f=g+h\)
S	0	7	7
A (via S)	4	3	7
B (via S)	2	6	8
Z (via A)	7	0	7

From S, A* expands the lowest-\(f\) node. A (with \(f=7\)) beats B (with \(f=8\)), so it goes S→A and then reaches Z with \(f=7\). Greedy search would also try A first (small \(h\)), but the crucial difference is that A* would reconsider B if pursuing A turned out costlier — greedy never looks back.

import heapq
def astar(start, goal, succ_cost, h):
    frontier = [(h(start), 0, start, [start])]   # (f, g, node, path)
    best = {start: 0}
    while frontier:
        f, g, node, path = heapq.heappop(frontier)
        if node == goal: return g, path
        for nxt, c in succ_cost(node):
            ng = g + c
            if ng < best.get(nxt, float('inf')):
                best[nxt] = ng
                heapq.heappush(frontier, (ng + h(nxt), ng, nxt, path + [nxt]))

The only change from UCS is the priority key: \(g\) becomes \(g + h\). That one extra term is the entire difference between blind and informed search.

Where this is used. A* is not a textbook curiosity — it is the workhorse behind GPS and map routing, where the straight-line distance to your destination is the heuristic that keeps the search from fanning out across the whole continent. It also drives pathfinding for units in video games (often on a navigation grid), robot motion planning, and network packet routing. Anywhere you need the cheapest route through a graph and you have even a rough sense of “which way is the goal,” A* is usually the first thing reached for.

Tip

A useful mental picture: UCS searches in expanding circles around the start, while A* searches in stretched ellipses pointing toward the goal. A good heuristic squeezes the ellipse tighter, so fewer nodes get explored.

32.4 — Admissible and consistent heuristics (why A* is optimal)

A*’s optimality is not automatic — it depends on the heuristic obeying one of two conditions. Get the heuristic wrong and A* can confidently return a path that isn’t the cheapest.

A heuristic is admissible if it never overestimates the true remaining cost: \(h(n) \le h^*(n)\) for every node, where \(h^*(n)\) is the real cost from \(n\) to the goal. An admissible heuristic is optimistic — it may guess too low but never too high. Straight-line distance is admissible for road travel because no road is ever shorter than the straight line between two points.

In words: the heuristic’s guess at any node is allowed to be too cheap, but never too expensive — it must stay at or below the genuine cost-to-go. Also written: \(\forall n:\ 0 \le h(n) \le h^*(n)\) — the estimate is pinned between zero and the true remaining distance.

A heuristic is consistent (or monotone) if it obeys a triangle inequality: for every node \(n\) and successor \(n'\) reached by action \(a\),

\[h(n) \le c(n, a, n') + h(n')\]

In words: moving one step toward the goal can shrink your remaining estimate by at most what that step actually cost — the heuristic can never “drop too fast.” Also written: \(h(n) - h(n') \le c(n, a, n')\) — the decrease in the estimate across an edge is bounded by that edge’s cost.

Consistency means the estimate never drops by more than the step actually costs. It is the stronger condition: every consistent heuristic is admissible, but not every admissible one is consistent.

Why A* is optimal (the intuition). Suppose a suboptimal goal \(G_2\) with cost \(C_2 > C^*\) sits on the frontier alongside a node \(n\) that lies on an optimal path to the true goal. Because \(h\) is admissible, \(f(n) = g(n) + h(n) \le C^*\) — the estimate through \(n\) never exceeds the true optimum. And \(f(G_2) = g(G_2) = C_2 > C^*\), since \(h\) at any goal is 0. Therefore \(f(n) < f(G_2)\), so A* always expands \(n\) before it would ever expand the suboptimal goal. The shoddy goal can never sneak out of the frontier first. With a consistent heuristic you get a bonus: the \(f\) values along any path are non-decreasing, so the first time A* expands any node it has already found that node’s cheapest path — no node ever needs re-expansion, and graph-search A* is optimal without extra bookkeeping.

def is_admissible(nodes, h, h_star):   # h_star: true cost-to-goal oracle
    return all(h(n) <= h_star(n) for n in nodes)

def is_consistent(edges, h):           # edges: list of (n, cost, n_prime)
    return all(h(n) <= c + h(np) for (n, c, np) in edges)

# demo self-check: straight-line vs true road distance
assert is_consistent([("A", 3, "Z")], lambda x: {"A":3,"Z":0}[x])     # 3 <= 3+0 ok
assert not is_consistent([("A", 1, "Z")], lambda x: {"A":3,"Z":0}[x]) # 3 <= 1+0 fails

Dominance. Between two admissible heuristics, the one that is always larger (closer to \(h^*\) without exceeding it) is better — it expands fewer nodes because its estimates prune more aggressively. So we want heuristics as high as possible while staying admissible. A common way to invent one is relaxation: drop a constraint of the real problem and solve the easier version exactly. For the 8-puzzle, allowing a tile to teleport to its target gives the misplaced-tiles heuristic (count the tiles out of place); allowing a tile to slide through others gives the stronger Manhattan-distance heuristic (sum each tile’s grid distance to its target). Both are admissible, and Manhattan distance dominates, so it explores far fewer nodes.

Worked example — the two 8-puzzle heuristics, numerically. Take a state where tiles 1, 2, and 5 are out of place: tile 1 is one row above its target, tile 2 is two columns away, tile 5 is one step away. The misplaced-tiles heuristic just counts them: \(h_1 = 3\). The Manhattan heuristic sums each tile’s grid distance: \(h_2 = 1 + 2 + 1 = 4\). Both are \(\le\) the true number of moves, so both are admissible — but \(h_2 = 4 \ge h_1 = 3\) at this state (and at every state), so A* with \(h_2\) prunes harder and expands fewer nodes. That single point of dominance can mean exploring thousands fewer nodes on a hard puzzle.

def manhattan(state, goal):
    """8-puzzle Manhattan-distance heuristic. state/goal: tuples of 9 ints, 0=blank."""
    dist = 0
    for tile in range(1, 9):
        i, j = divmod(state.index(tile), 3)
        gi, gj = divmod(goal.index(tile), 3)
        dist += abs(i - gi) + abs(j - gj)
    return dist

def misplaced(state, goal):
    return sum(1 for k in range(9) if state[k] and state[k] != goal[k])

GOAL = (1, 2, 3, 4, 5, 6, 7, 8, 0)
s = (1, 2, 3, 4, 0, 6, 7, 5, 8)      # tiles 5 and 8 swapped-ish
assert manhattan(s, GOAL) >= misplaced(s, GOAL)   # Manhattan dominates

flowchart TB
  R["Real 8-puzzle<br/>(tiles block each other)"]
  R -->|"drop: tiles can teleport"| M["Misplaced-tiles h<br/>(weaker, admissible)"]
  R -->|"drop: tiles slide through"| H["Manhattan-distance h<br/>(stronger, admissible)"]
  M -.->|"H dominates M"| H

Warning

If your heuristic overestimates — say, multiplying distances by 1.5 to speed things up — A* may return a non-optimal path. That can be a deliberate tradeoff (weighted A*, \(f = g + w\cdot h\) with \(w>1\), trades optimality for speed), but never assume A* is optimal unless you have proven \(h\) admissible.

32.5 — Local search (hill climbing, simulated annealing, beam search)

Everything so far returns a path. But for many problems — scheduling, layout, n-queens, hyperparameter tuning — we only care about the final configuration, not how we got there. Local search exploits this: keep just the current state (and maybe a few), and iteratively nudge it toward better neighbors using an objective function \(f\) to maximize (or a cost to minimize). It uses almost no memory and scales to huge or even continuous spaces, but it sees only the local neighborhood — it climbs the landscape blindfolded, feeling only which nearby step goes up.

Hill climbing is the greedy local mover: from the current state, look at all neighbors and step to the best one; stop when no neighbor is better. It is fast and memory-trivial, but it gets trapped at a local optimum (a peak lower than the global one), stalls on plateaus (flat regions where no neighbor looks better), and struggles on ridges (where progress needs an awkward zig-zag). Restarting from many random starts — random-restart hill climbing — mitigates this surprisingly effectively, because a few of the starts will land on slopes leading to the global peak.

Worked example — 4-queens by hill climbing. Place one queen per column and count attacking pairs as the cost to minimize (0 means solved). From a random placement with, say, 3 attacking pairs, hill climbing tries moving each queen within its column and takes whichever single move drops the count the most — perhaps to 1, then to 0. If instead every move leaves the count unchanged or higher, it is stuck in a local minimum and a random restart is the cheap fix.

Simulated annealing escapes local optima by sometimes accepting worse moves. It borrows from metallurgy: heat metal so its atoms jiggle freely, then cool it slowly so they settle into a low-energy crystal. The algorithm picks a random neighbor and accepts it if it’s better; if it’s worse by an amount \(\Delta\), it accepts anyway with probability \(e^{-\Delta / T}\). The temperature \(T\) starts high (almost any move accepted — lots of exploration) and is lowered on a cooling schedule toward 0 (only improving moves survive — pure hill climbing). Cool slowly enough and it provably converges to the global optimum.

\[P(\text{accept a worse move}) = e^{-\Delta / T}\]

In words: when a candidate move makes things worse by \(\Delta\), accept it anyway with a chance that shrinks fast for big setbacks and shrinks as the system cools. Also written: \(\ln P = -\Delta / T\), i.e. \(P = \exp(-\Delta/T)\) — the log-probability of accepting a bad move is just the negative penalty scaled by temperature (the Boltzmann factor).

The temperature controls how reckless the search is. When \(T\) is large, \(e^{-\Delta/T}\) is close to 1, so even bad moves get through; as \(T \to 0\), the same probability collapses toward 0 and the search hardens into plain hill climbing.

A small numeric feel: with \(\Delta = 2\), at a hot \(T = 10\) the accept probability is \(e^{-0.2} \approx 0.82\) (very willing to backtrack), but at a cold \(T = 0.5\) it is \(e^{-4} \approx 0.018\) (almost never). Same bad move, opposite behavior — that is the cooling schedule doing its job.

Picture a ball on a bumpy landscape. When the system is hot, the ball jitters wildly and can bounce out of a shallow trap; as it cools, the jitter dies down and it settles into the deepest nearby valley. The marker below shakes hard at first, then calms — exactly the annealing schedule in motion:

Beam search (the local version) is the parallel compromise: instead of one current state, keep the \(k\) best states (the beam width). At each step, generate all successors of all \(k\) states and keep the best \(k\) overall. With \(k=1\) it is hill climbing; with \(k=\infty\) it approaches breadth-first behavior. The danger is that all \(k\) beams crowd into the same promising region and lose diversity; stochastic beam search keeps successors with probability rising in their quality, which spreads the beams out and avoids that collapse.

import math, random
def simulated_annealing(start, neighbors, energy, schedule):
    cur = start; cur_e = energy(cur)
    t = 1
    while True:
        T = schedule(t); t += 1
        if T <= 1e-9: return cur          # frozen
        nxt = random.choice(neighbors(cur))
        d = energy(nxt) - cur_e           # >0 means worse (we minimize energy)
        if d < 0 or random.random() < math.exp(-d / T):
            cur, cur_e = nxt, energy(nxt) # accept improving OR (probabilistically) worse

In real projects you usually reach for a library rather than tuning a cooling schedule by hand. scipy.optimize ships a global optimizer (dual annealing, a modern descendant of simulated annealing) that handles the schedule, restarts, and a local-refinement step for you:

import numpy as np
from scipy.optimize import dual_annealing

# minimize a bumpy 2-D function full of local minima
def f(x):
    return np.sum(x**2 - 10 * np.cos(2 * np.pi * x) + 10)   # Rastrigin

result = dual_annealing(f, bounds=[(-5.12, 5.12)] * 2, seed=0)
print(result.x, result.fun)     # lands near the global minimum at (0, 0), f≈0

The landscape below is exactly what local search navigates — and why naive hill climbing fails. A climber starting on the left slope walks up to the small bump and stops, never seeing that a far higher peak lies beyond a valley it would have to descend into first:

Tip

Rule of thumb: if you want the path, use A*; if you only want a good final state in a vast space, use local search. And when hill climbing keeps stalling, reach for random restarts before anything fancier — it is one line of code and often beats elaborate methods.

32.6 — Genetic algorithms (population-based search)

Local search keeps one state (or \(k\) states) and nudges it. A genetic algorithm (GA) keeps a whole population of states and lets them breed. The everyday picture is animal breeding: keep a herd, let the fittest individuals reproduce, mix their traits, sprinkle in the occasional random mutation, and over many generations the herd drifts toward whatever “fit” rewards. It is local search with sex — recombination lets two decent partial solutions combine into a better whole, which plain hill climbing can never do.

The loop has four moves, repeated each generation:

Fitness — score every individual with the objective function (higher is better).
Selection — pick parents with probability rising in fitness (good states breed more).
Crossover — splice two parents into a child, taking part of the state string from each.
Mutation — flip a small random piece of the child, to keep injecting fresh diversity.

The probability of selecting an individual is just its share of the total fitness:

\[P(\text{select } i) = \frac{\text{fit}(i)}{\sum_{j} \text{fit}(j)}\]

In words: an individual’s chance of becoming a parent is its own fitness divided by the fitness of the entire population — fitter individuals get a proportionally bigger slice. Also written: \(P(i) = \text{fit}(i) / Z\) where \(Z = \sum_j \text{fit}(j)\) is the normalizing total (a softmax-free, fitness-proportional weighting, a.k.a. “roulette-wheel” selection).

Worked example — 8-queens as a string. Encode a board as 8 digits, one per column, giving each queen’s row (e.g. 24748552). Fitness = number of non-attacking pairs (max 28). Two parent strings cross over at a random cut point — 2474|8552 and 3275|1641 produce child 2474|1641 — and a mutation might bump one digit. Run a few hundred generations and a population reliably finds a 28-fitness (zero-conflict) board, even though the space has \(8^8 \approx 16.7\) million arrangements.

flowchart LR
  P["Population<br/>(many states)"] --> F["Score fitness"]
  F --> S["Select parents<br/>(fitter = likelier)"]
  S --> C["Crossover<br/>(splice two parents)"]
  C --> M["Mutate<br/>(random tweak)"]
  M --> P

A compact GA in NumPy, solving 8-queens:

import numpy as np
rng = np.random.default_rng(0)

def fitness(ind):                       # non-attacking pairs, max 28
    clashes = 0
    for i in range(8):
        for j in range(i + 1, 8):
            if ind[i] == ind[j] or abs(ind[i] - ind[j]) == j - i:
                clashes += 1
    return 28 - clashes

pop = rng.integers(0, 8, size=(100, 8))
for _ in range(500):
    scores = np.array([fitness(ind) for ind in pop])
    if scores.max() == 28:
        break
    probs = scores / scores.sum()                       # roulette-wheel selection
    parents = pop[rng.choice(len(pop), size=(100, 2), p=probs)]
    cut = rng.integers(1, 8, size=100)
    kids = np.array([np.concatenate([a[:c], b[c:]])     # crossover
                     for (a, b), c in zip(parents, cut)])
    m = rng.random((100, 8)) < 0.1                       # 10% mutation rate
    kids[m] = rng.integers(0, 8, size=m.sum())
    pop = kids

assert max(fitness(ind) for ind in pop) == 28

GAs are one corner of a larger family — evolutionary computation and metaheuristics (genetic programming, evolution strategies, particle swarms, ant colonies) — all of which generalize the population-based idea. They shine when the landscape is rugged and you have no usable gradient or heuristic, at the cost of many fitness evaluations and a fistful of hyperparameters (population size, mutation rate, selection pressure) to tune.

Tip

Reach for a GA only after simpler local search (with random restarts) has visibly failed. The crossover operator is what justifies the extra machinery — if your problem’s partial solutions don’t combine meaningfully, a GA buys you little over annealing.

32.7 — Memory / time tradeoffs

Search algorithms live on a spectrum between two scarce resources: time (nodes generated) and memory (nodes stored). The exponential frontiers of BFS and A* are usually limited by memory first — you run out of RAM long before you run out of patience. The whole art of practical search is buying back memory by spending a little extra time.

Iterative deepening is the canonical trade: it pays to regenerate the shallow nodes each round in exchange for DFS-level \(O(bd)\) memory instead of BFS’s \(O(b^d)\). The re-work is cheap because the deepest level holds the overwhelming majority of the nodes anyway, so re-expanding the small upper levels barely registers.

Iterative-deepening A* (IDA*) applies the same idea to A*: do a cost-bounded DFS where the bound is on \(f\), and after each failed round, raise the bound to the smallest \(f\) value that exceeded it last time. It keeps A*’s optimality with linear memory — it’s the standard fix when A*’s frontier won’t fit in RAM.

Other memory-bounded variants — recursive best-first search (RBFS) and SMA* (simplified memory-bounded A*) — use whatever memory you give them and, when that fills up, forget the least-promising nodes (remembering their best \(f\) so they can be regenerated if the search comes back). They trade time (re-expansion) for a hard memory ceiling you set in advance.

A small numeric feel for why iterative deepening’s re-work is cheap: with branching factor \(b=10\), a tree of depth \(d=5\) has about \(10^5 = 100{,}000\) nodes at the bottom level but only \(11{,}111\) in all the levels above combined. Re-generating those upper levels a handful of times adds only single-digit percentage overhead — a bargain for cutting memory from exponential to linear.

Why is the re-work so cheap? A search tree is bottom-heavy: each level is \(b\) times bigger than the one above it, so the very last level holds more nodes than all the levels above it combined. Iterative deepening re-visits the shallow levels many times — but those levels are tiny, so the repeats barely add up. The bottom level, which gets generated only once, dominates the whole count.

The total nodes generated by iterative deepening, counting the re-work, is dominated by the last round:

\[N_{\text{IDS}} = \sum_{i=0}^{d} (d - i + 1)\, b^i = (d{+}1)b^0 + d\, b^1 + \dots + 1\cdot b^d \approx \frac{b}{b-1}\, b^d\]

In words: add up every level’s node count, weighting each by how many times it gets re-visited (shallow levels more, the bottom level once). Because the bottom level is so much larger than all the rest, the whole sum lands within a small constant factor of a single BFS sweep. Also written: \(N_{\text{IDS}} = O(b^d)\), the same big-O as BFS — for \(b=10\) the overhead factor \(b/(b-1)\) is about \(1.11\), i.e. roughly 11% extra work.

flowchart LR
  subgraph "more memory, less recomputation"
    BFS["BFS / A*  —  O(b^d) memory"]
  end
  subgraph "less memory, more recomputation"
    IDS["IDS / IDA*  —  O(bd) memory"]
  end
  BFS -. "swap to save RAM" .-> IDS
  IDS -. "swap to save time" .-> BFS

Algorithm	Memory	Re-expands nodes?	Optimal?
BFS / A*	Exponential	No	Yes
Iterative deepening / IDA*	Linear	Yes (each round)	Yes
RBFS / SMA*	Bounded (tunable)	Yes (when memory full)	Yes

Warning

“Just use A*” fails silently on large problems: it doesn’t crash with a clear error, it thrashes and exhausts memory mid-search. If your state space is big, budget for a memory-bounded variant (IDA*) from the start rather than discovering the ceiling at runtime.

32.8 — How search underlies planning and games

Search is not an island — it is the substrate beneath two large branches of AI, and recognizing the same machinery under different names is one of the most useful unifications in the field.

Planning is search through a space of world states, where actions have preconditions (what must be true to apply them) and effects (how they change the world). A planner formulates the task as “from the initial state, find a sequence of actions whose result satisfies the goal” — which is exactly a search problem, just with states described by logical facts rather than tile positions. Heuristics for planners are often derived automatically by relaxing the action model (for instance, ignoring the “delete” effects of actions), which is the very same relaxation trick from §32.4 applied to logic instead of puzzles.

Game playing is search through a space of moves where an adversary also gets to move. The state space becomes a game tree that alternates your moves and the opponent’s. Minimax search assumes the opponent plays optimally and propagates values up the tree — you maximize your score, they minimize it — while alpha–beta pruning is the game-tree analogue of skipping unpromising branches, cutting away parts of the search that cannot change the result. Modern engines layer heuristic evaluation functions and Monte-Carlo Tree Search — the same lookahead idea that shows up in reinforcement learning — on top, but the spine is still tree search.

flowchart TB
  S["State-space search (this chapter)"]
  S --> P["Planning: states = logical facts,<br/>actions = preconditions + effects"]
  S --> G["Games: alternating moves,<br/>minimax + alpha-beta over a game tree"]
  S --> O["Optimization: local search over<br/>configurations (no path needed)"]

The takeaway is that once you can describe states, actions, a goal, and costs, an astonishing range of problems collapses into “explore this graph well.” Planning, games, and combinatorial optimization are all search wearing different costumes.

Which search should I use? A quick decision path through this chapter’s methods:

flowchart TB
  Q1{"Do you need the<br/>actual path, or just a<br/>good final state?"}
  Q1 -->|"final state only"| L{"Landscape<br/>rugged with traps?"}
  L -->|"no — single peak"| HC["Hill climbing<br/>(+ random restarts)"]
  L -->|"yes"| SA{"Partial solutions<br/>combine meaningfully?"}
  SA -->|"no"| ANN["Simulated annealing<br/>/ beam search"]
  SA -->|"yes"| GA["Genetic algorithm"]
  Q1 -->|"need the path"| H{"Have a usable<br/>heuristic h(n)?"}
  H -->|"no"| C{"Step costs equal?"}
  C -->|"yes"| IDS["Iterative deepening<br/>(or BFS if RAM allows)"]
  C -->|"no"| UCS["Uniform-cost / Dijkstra"]
  H -->|"yes"| M{"A* frontier<br/>fits in memory?"}
  M -->|"yes"| AST["A*"]
  M -->|"no"| IDA["IDA* / SMA*"]

Tip

When facing a new AI problem, first ask the four formulation questions — What is a state? What are the actions? When am I done? What does a step cost? If you can answer them, you can almost certainly attack the problem with the machinery in this chapter.

32.9 — Quick reference

Term / method	What it means	When / why to reach for it
Problem formulation	States, initial state, actions, \(\text{Result}(s,a)\), goal test, path cost	The first move on any search problem; get it right and the algorithm is easy
State graph vs. search tree	Graph = the map (each state once); tree = the routes (a state may recur)	Add a `seen` set (graph search) to expand each state at most once and avoid loops
BFS	Expand shallowest first via a FIFO queue; \(O(b^d)\) time and memory	Shallow goal, equal step costs, and the frontier fits in RAM
DFS	Expand deepest first via a LIFO stack; \(O(bm)\) memory	Memory is tight and you don’t need optimality; guard cycles with a visited set
Uniform-cost search (UCS)	Expand lowest \(g(n)\) via a priority queue; = Dijkstra	Varying step costs and you need the cheapest path, no heuristic available
Iterative deepening (IDS)	Depth-limited DFS with a rising limit; \(O(bd)\) memory, \(O(b^d)\) time	Default uninformed method when depth is unknown — BFS guarantees at DFS memory
Heuristic \(h(n)\)	Cheap estimate of cost-to-go from \(n\) to the goal	Domain knowledge that lets informed search skip whole regions
Greedy best-first	Order frontier by \(h(n)\) alone	Fast when “looks closest” usually is closest; not complete, not optimal
A* — \(f(n)=g(n)+h(n)\)	Balance cost-behind (\(g\)) against cost-ahead (\(h\))	The default when you need the cheapest path and have an admissible \(h\)
Admissible \(h\)	\(h(n)\le h^*(n)\) — never overestimates	Required for A* optimality; straight-line distance for road travel
Consistent \(h\)	\(h(n)\le c(n,a,n')+h(n')\) (triangle inequality)	Stronger: graph-search A* needs no node re-expansion
Dominance / relaxation	Higher admissible \(h\) prunes more; relax a constraint to invent \(h\)	Manhattan dominates misplaced-tiles on the 8-puzzle — explores far fewer nodes
Hill climbing	Step to the best neighbor; stop at a peak	Vast space, final state only; pair with random restarts to escape local optima
Simulated annealing	Accept worse moves with prob. \(e^{-\Delta/T}\), cool \(T\to 0\)	Rugged landscape with traps; cool slowly to converge to the global optimum
Beam search (width \(k\))	Keep the best \(k\) states each step	Middle ground between hill climbing (\(k{=}1\)) and breadth-first
Genetic algorithm	Population + selection + crossover + mutation	Gradient-free, rugged spaces where partial solutions combine meaningfully
IDA* / RBFS / SMA*	Memory-bounded A* variants that re-expand nodes	A*’s frontier won’t fit in RAM — trade time for a hard memory ceiling

32.10 — Key takeaways

Formulate first: every search problem reduces to states, an initial state, actions, a transition model, a goal test, and a path cost — get these right (and abstract away irrelevant detail) and the algorithm is the easy part. Keep the state graph (the map) distinct from the search tree (the routes through it); graph search adds a visited set so each state is expanded once.
Uninformed search explores blindly: BFS is complete and optimal (equal costs) but memory-hungry; DFS is memory-light but neither optimal nor (on cyclic graphs) complete; UCS handles varying costs optimally; iterative deepening is the practical default — BFS’s guarantees at DFS’s memory.
Informed search adds a heuristic \(h(n)\): greedy is fast but myopic; A* uses \(f = g + h\) to balance cost-so-far against estimated cost-to-go.
A* is optimal exactly when \(h\) is admissible (never overestimates); consistency additionally guarantees no node is ever re-expanded. Higher (dominant) admissible heuristics expand fewer nodes; derive them by relaxing the problem.
Local search (hill climbing, simulated annealing, beam search) keeps only the current state(s) and optimizes a final configuration with minimal memory — great when you don’t need the path, but watch for local optima, plateaus, and ridges.
Genetic algorithms carry a whole population and use crossover + mutation to combine partial solutions — population-based global search for rugged, gradient-free landscapes.
Time vs memory is the central tradeoff: IDA* and SMA* buy back A*’s exponential memory by re-expanding nodes.
Search is the engine under planning, game playing, and combinatorial optimization — different costumes, same graph exploration.

32.11 — See also

Optimization — gradient-based and continuous optimization, the numerical cousin of local search.
Planning, Constraint Satisfaction & Game Playing — adversarial search (minimax, alpha–beta), planning systems, and CSPs built directly on this chapter.
Knowledge Representation & Reasoning — how states and goals get expressed as logical facts for planners.
Evolutionary Computation & Metaheuristics — population-based global search (genetic algorithms, swarms) that generalizes beam and local search.
Reinforcement Learning — sequential decision-making when the transition model is learned rather than given.

↪ The thread continues → Chapter 33 · 📖 Knowledge Representation & Reasoning

Search explores a space of possibilities; but first you must represent what you know. Knowledge representation and reasoning is how symbolic AI encodes facts and draws conclusions.

📖 All chapters | ← 31 · 🧰 Tools & Frameworks | 33 · 📖 Knowledge Representation & Reasoning →