Chapter 33 — 📖 Knowledge Representation & Reasoning

📖 All chapters | ← 32 · 🧭 Search & Problem Solving | 34 · 🗺️ Planning, Constraint Satisfaction & Game Playing →

📚 Jump to any chapter

🧮 Mathematical Foundations

🧭 The ML Workflow

🧩 Classical Machine Learning

🎲 Probabilistic Models

13 · 🕸️ Probabilistic Graphical Models

🧠 Deep Learning

🗣️ Applied AI: Vision, Language, Audio & Time

🕹️ Reinforcement Learning

25 · 🕹️ Reinforcement Learning

🛠️ Applied ML Systems & Industries

🚀 Production, Tooling & Infrastructure

📚 Classical & Symbolic AI

⚖️ Responsible AI & Frontier

🎓 Advanced & Specialized Topics

🎚️ Post-Training & Fine-Tuning

🚢 Model Serving & Deployment

47 · 🚢 Model Serving & Deployment in Production

Knowledge Representation and Reasoning (KRR) is the branch of AI that asks how to store what a system knows in a form a machine can manipulate, and how to derive new facts from that store by rule rather than by statistics. It is the symbolic heart of “Good Old-Fashioned AI”: instead of learning patterns from data, you write down truths and let a logical engine grind out consequences. It matters because it gives something deep learning struggles with — answers you can explain, audit, and guarantee — and the modern frontier is stitching the two traditions back together.

🧭 In context: Classical & Symbolic AI · used to encode facts and rules so machines can deduce, explain, and prove answers · the one key idea — represent knowledge explicitly as symbols, then reason over it with sound inference rules.

💡 Remember this: write knowledge down as explicit symbols and let a sound inference engine derive guaranteed, auditable conclusions — the exact thing statistical learning cannot promise.

33.1 — Propositional & First-Order Logic (Syntax, Inference, Resolution)

The foundation of symbolic reasoning is formal logic: a language with a precise syntax and a notion of truth that lets us derive conclusions mechanically, without hand-waving.

Think of it like a board game with no luck. The connectives are the rules, the knowledge base is the starting position, and entailment is “every legal way the game can play out ends the same way.” Nothing is left to chance — if the rules force a conclusion, it is forced in every world, not just the likely ones.

Propositional logic is the simplest such language. Its atoms are whole statements that are either true or false — call them \(P\) (“it is raining”), \(Q\) (“the ground is wet”). You combine them with connectives: \(\neg\) (not), \(\wedge\) (and), \(\vee\) (or), \(\rightarrow\) (implies), \(\leftrightarrow\) (iff). A knowledge base (KB) is just a set of such formulas asserted true. Entailment, written \(KB \models \alpha\), means: in every world where the KB is true, \(\alpha\) is also true. That is the whole game — we want to know what is guaranteed, not merely likely.

\[KB \models \alpha \quad\Longleftrightarrow\quad \text{for every model } m,\; (m \models KB) \Rightarrow (m \models \alpha)\]

In words: \(\alpha\) follows from the KB exactly when there is no possible world that makes everything in the KB true while making \(\alpha\) false. Also written: \(KB \models \alpha \iff \{m : m \models KB\} \subseteq \{m : m \models \alpha\}\) — the worlds satisfying the KB are a subset of the worlds satisfying \(\alpha\).

The plainest way to check entailment is a truth table: enumerate every assignment of true/false to the atoms and verify \(\alpha\) holds wherever the KB does. With \(n\) atoms there are \(2^n\) rows, so this is exponential, but it is mechanically certain.

Worked example. Let \(KB = \{P \rightarrow Q,\; P\}\). Does \(KB \models Q\)?

\(P\)	\(Q\)	\(P\rightarrow Q\)	\(P\)	both KB true?	\(Q\)
T	T	T	T	yes	T
T	F	F	T	no	—
F	T	T	F	no	—
F	F	T	F	no	—

Only the first row satisfies the whole KB, and there \(Q\) is true. So \(KB \models Q\). This single-step pattern — from \(P\rightarrow Q\) and \(P\) conclude \(Q\) — is the ancient rule modus ponens.

\[\frac{\alpha \rightarrow \beta \qquad \alpha}{\beta}\]

In words: if “\(\alpha\) implies \(\beta\)” is true and \(\alpha\) is true, then \(\beta\) must be true. Also written: the inference \(\{\alpha \rightarrow \beta,\ \alpha\} \vdash \beta\), equivalently the tautology \(\big((\alpha \rightarrow \beta) \wedge \alpha\big) \rightarrow \beta\).

First-order logic (FOL) adds the machinery propositional logic lacks: objects, relations, and quantifiers. Now we can say something about every member of an infinite domain. The pieces are constants (\(socrates\)), variables (\(x\)), predicates that denote relations (\(Human(x)\), \(Loves(x,y)\)), functions (\(motherOf(x)\)), and two quantifiers: \(\forall\) (“for all”) and \(\exists\) (“there exists”). The classic syllogism becomes:

\[\forall x\,\big(Human(x) \rightarrow Mortal(x)\big), \quad Human(socrates) \;\models\; Mortal(socrates).\]

In words: if everything human is mortal, and Socrates is human, then Socrates is mortal. Also written: instantiating \(x := socrates\) gives \(Human(socrates) \rightarrow Mortal(socrates)\), then modus ponens with \(Human(socrates)\) yields \(Mortal(socrates)\).

FOL is vastly more expressive, but that power costs us: entailment in FOL is only semi-decidable — if \(\alpha\) follows, a procedure can confirm it in finite time, but if it does not follow, the procedure may run forever.

The dominant mechanical inference procedure is resolution, and it rests on one beautiful trick: instead of proving \(KB \models \alpha\) directly, prove that \(KB \wedge \neg\alpha\) is unsatisfiable — that assuming the negation of your goal leads to a contradiction. This is proof by refutation. To run it, everything is first converted to conjunctive normal form (CNF): a conjunction of clauses, where each clause is a disjunction of literals (an atom or its negation). The single resolution rule then says: if one clause contains a literal \(L\) and another contains \(\neg L\), you may combine them, cancelling that pair:

\[\frac{(A \vee L) \qquad (B \vee \neg L)}{(A \vee B)}\]

In words: two clauses that disagree on exactly one literal can be merged into a new clause with that contradictory pair struck out. Also written: since \(A \vee L \equiv \neg A \rightarrow L\) and \(B \vee \neg L \equiv L \rightarrow B\), resolution is just chaining \(\neg A \rightarrow L \rightarrow B\), i.e. \(\neg A \rightarrow B \equiv A \vee B\).

Keep resolving until you derive the empty clause \(\square\) (a clause with no literals, meaning false) — that is your contradiction, and the original entailment is proved.

# Propositional resolution refutation. Clauses are frozensets of literals;
# a literal is a string, negation prefixed with "-".
def negate(lit): return lit[1:] if lit.startswith("-") else "-"+lit

def resolve(ci, cj):                 # all resolvents of two clauses
    out = []
    for l in ci:
        if negate(l) in cj:          # complementary pair found
            out.append((ci - {l}) | (cj - {negate(l)}))
    return out

def entails(kb, query):              # kb |= query ?
    clauses = set(kb) | {frozenset({negate(query)})}   # assume NOT query
    while True:
        new = set()
        cl = list(clauses)
        for i in range(len(cl)):
            for j in range(i+1, len(cl)):
                for r in resolve(cl[i], cl[j]):
                    # r == frozenset() is the EMPTY CLAUSE (false), our goal —
                    # not a "falsy guard". An empty resolvent can ONLY arise from
                    # two complementary unit clauses like {P} and {-P}.
                    if r == frozenset(): return True   # contradiction derived
                    new.add(frozenset(r))
        if new <= clauses: return False            # nothing new -> no proof
        clauses |= new

# Smallest decisive test: complementary unit clauses {P}, {-P} must resolve to []
assert entails([frozenset({"P"})], "P")            # {P} + {-P}  ->  []  -> True
KB = [frozenset({"-P","Q"}), frozenset({"P"})]     # P->Q , P
assert entails(KB, "Q")                            # derives []  ->  KB |= Q
print("KB entails Q:", entails(KB, "Q"))

A note on the empty-clause test: the only way resolve returns frozenset() is when both input clauses are unit clauses holding a complementary pair — {P} resolved with {-P} removes P from one and -P from the other, leaving nothing. Larger clauses always leave at least one surviving literal, so an empty resolvent unambiguously means “contradiction derived,” never “no resolvent.” We test it with r == frozenset() rather than if not r, so the intent is explicit and never relies on the empty set merely being falsy.

For full FOL, resolution needs one more ingredient — unification, the process of finding a substitution of variables that makes two literals identical so a complementary pair can be cancelled. Let us run the Socrates syllogism end-to-end. First clausify the premises and the negated goal:

\(\neg Human(x) \vee Mortal(x)\) — from \(\forall x\,(Human(x)\rightarrow Mortal(x))\)
\(Human(socrates)\) — the fact
\(\neg Mortal(socrates)\) — the negated goal we want to refute

Now resolve. Clauses (2) and (1) contain \(Human(socrates)\) and \(\neg Human(x)\); unify them with the substitution \(\{x/socrates\}\), which makes \(Human(x)\) identical to \(Human(socrates)\), and cancel the pair — leaving \(Mortal(socrates)\). That new clause and clause (3), \(\neg Mortal(socrates)\), are a complementary unit pair: resolving them yields the empty clause \(\square\). Contradiction reached, so \(Mortal(socrates)\) is entailed. The substitution \(\{x/socrates\}\) is exactly the unification step that propositional resolution never needs — and it is the engine inside the logic-programming language Prolog.

flowchart LR
    A["KB + ¬query"] --> B["Convert to CNF<br/>(clauses)"]
    B --> C["Pick two clauses with<br/>complementary literals"]
    C --> D["Unify + Resolve → new clause"]
    D --> E{"Empty<br/>clause □?"}
    E -- yes --> F["Contradiction:<br/>KB ⊨ query ✓"]
    E -- no --> G{"New clauses<br/>produced?"}
    G -- yes --> C
    G -- no --> H["No proof:<br/>query not entailed"]

The same loop animated — clauses keep merging until the empty clause \(\square\) falls out, the moment the contradiction lands:

Tip

Intuition: resolution is “proof by trying to break it.” You assume the opposite of what you want and show the assumption explodes. If the only way to keep the KB consistent is to accept your goal, the goal is proved.

Doing it in practice — Prolog and a Python SAT check

From-scratch resolution is great for understanding, but in practice you reach for a mature engine. The logic-programming language Prolog runs SWI-Prolog’s resolution engine for you; from Python you can call it via pyswip, or you can hand a propositional question to a SAT solver:

# Propositional entailment via a real SAT solver (pip install python-sat).
# KB |= Q  iff  KB ∧ ¬Q is UNSATISFIABLE.
from pysat.solvers import Glucose3      # CDCL solver (see §33.12)

# Vars: P=1, Q=2.  KB = (¬P ∨ Q) ∧ P ;  goal = Q.
solver = Glucose3()
solver.add_clause([-1, 2])             # P -> Q
solver.add_clause([1])                 # P
solver.add_clause([-2])                # ¬Q  (negated goal)
print("KB |= Q ?", not solver.solve()) # UNSAT -> entailed -> True
solver.delete()

% The same syllogism in Prolog (SWI-Prolog). Backward resolution is built in.
mortal(X) :- human(X).      % rule: ∀X human(X) -> mortal(X)
human(socrates).            % fact
% ?- mortal(socrates).      % query -> true

33.2 — Rule-Based / Expert Systems (Forward vs Backward Chaining, the Inference Engine)

An expert system packages human expertise as a large set of if–then rules and lets an inference engine apply them to specific cases. The 1970s–80s systems MYCIN (diagnosing blood infections) and DENDRAL (identifying molecules) showed that, in a narrow domain, a few hundred rules could rival a specialist. The architecture cleanly separates three parts: the knowledge base (rules), the working memory (known facts about the current case), and the inference engine (the reasoning loop). That separation is the whole appeal — domain experts edit rules without touching code.

flowchart TB
    subgraph ES["Expert System"]
        KB["Knowledge Base<br/>(IF…THEN rules)"]
        WM["Working Memory<br/>(current facts)"]
        IE["Inference Engine<br/>(matches & fires rules)"]
    end
    U["User / sensors"] --> WM
    KB --> IE
    WM <--> IE
    IE --> EXP["Explanation<br/>(why? how?)"]

The engine runs the rules in one of two directions. Forward chaining is data-driven: start from the known facts and fire any rule whose conditions are satisfied, adding its conclusions to working memory, and repeat until nothing new fires. It is the natural choice when you have data and want to know everything it implies (monitoring, alarms). Backward chaining is goal-driven: start from a hypothesis you want to confirm and work backwards, asking “what would I need to prove this?”, recursively turning each premise into a sub-goal until you bottom out in known facts. It is the natural choice for diagnosis — you have a specific question and want only the facts relevant to answering it.

Everyday analogy. Forward chaining is a cook who looks at whatever is in the fridge and figures out every dish they could make. Backward chaining is a cook who wants lasagna and works backwards to the exact shopping list. Same kitchen, same recipes — opposite starting point.

Worked example. Rules and facts:

R1: IF barks AND has_fur THEN dog
R2: IF dog THEN mammal
Facts: barks, has_fur.

Forward from the facts: R1’s conditions both hold → add dog. Now R2 fires → add mammal. Working memory is now {barks, has_fur, dog, mammal}; done. Backward for goal mammal: R2 concludes mammal, so prove dog; R1 concludes dog, so prove barks and has_fur — both are facts. Goal confirmed. Same answer, opposite search direction.

# Tiny forward-chaining engine. Rules: (set_of_conditions, conclusion).
rules = [({"barks","has_fur"}, "dog"), ({"dog"}, "mammal")]
facts = {"barks", "has_fur"}

changed = True
while changed:                         # repeat until a full pass adds nothing
    changed = False
    for conds, concl in rules:
        if conds <= facts and concl not in facts:
            facts.add(concl); changed = True   # fire the rule
print(sorted(facts))                   # ['barks','dog','has_fur','mammal']
assert "mammal" in facts

The two directions visualised — the same rule graph, traversed opposite ways:

	Forward chaining	Backward chaining
Driven by	available data	a goal/hypothesis
Direction	facts → conclusions	goal → required facts
Best for	monitoring, deriving all consequences	diagnosis, answering one query
Risk	derives many irrelevant facts	may re-explore the same sub-goals
Example system	CLIPS production rules	Prolog, MYCIN (with certainty factors)

Warning

Common mistake: assuming forward and backward chaining give different answers. For a given rule set and facts they prove exactly the same things — they differ only in efficiency and which intermediate facts they bother to compute. Pick the direction that touches fewer irrelevant rules.

Production rules in practice — the Rete idea and a real engine

Firing rules by re-scanning every rule on every change (as the toy loop above does) is \(O(\text{rules} \times \text{passes})\) — fine for a handful of rules, hopeless for thousands. Real production systems (CLIPS, Drools, the experta Python port) use the Rete algorithm, which builds a network that remembers partial matches so only the rules affected by a changed fact are re-evaluated. The trade is memory for speed — it caches match state so the engine reacts incrementally instead of recomputing from scratch.

# Forward chaining with a real rule engine (pip install experta).
from experta import KnowledgeEngine, Rule, Fact, DefFacts

class Animals(KnowledgeEngine):
    @DefFacts()
    def init(self):
        yield Fact(barks=True); yield Fact(has_fur=True)

    @Rule(Fact(barks=True), Fact(has_fur=True))
    def is_dog(self):    self.declare(Fact(dog=True))

    @Rule(Fact(dog=True))
    def is_mammal(self): self.declare(Fact(mammal=True)); print("mammal derived")

engine = Animals(); engine.reset(); engine.run()   # -> "mammal derived"

33.3 — Logic Programming: Prolog, Datalog & Answer Set Programming

The rule engines above are procedural — you write the matching loop. Logic programming flips this: you declare what is true and let a built-in proof procedure work out how to answer queries. You state facts and rules; the engine does the search. It is the most widely deployed form of symbolic reasoning, quietly running inside databases, network configuration tools, and program analyzers.

Think of it as a database that can reason. An ordinary SQL table stores rows; a logic program stores rows plus rules that derive new rows on demand. Ask a question and the system chains rules together until it hits stored facts — no loops, no joins written by hand.

Prolog is the classic. A program is a set of Horn clauses — rules with at most one positive conclusion, written head :- body. The engine answers a query by backward chaining with unification (§33.2, §33.11), trying clauses top-to-bottom and backtracking on failure. A Horn clause is the sweet spot of logic: expressive enough for real programs, restricted enough that inference is efficient.

\[\underbrace{H}_{\text{head}} \;\leftarrow\; \underbrace{B_1 \wedge B_2 \wedge \dots \wedge B_n}_{\text{body}} \qquad\equiv\qquad H \vee \neg B_1 \vee \dots \vee \neg B_n\]

In words: the head is true whenever every condition in the body is true — a single rule with one conclusion and a conjunction of premises. Also written: as a disjunctive clause \(H \vee \neg B_1 \vee \dots \vee \neg B_n\) (exactly one un-negated literal), which is precisely what makes it a Horn clause and lets resolution stay efficient.

% Family relations in Prolog. Facts, then a recursive rule.
parent(tom, bob).   parent(bob, ann).   parent(bob, pat).
ancestor(X, Y) :- parent(X, Y).                  % base case
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).  % recursive case
% ?- ancestor(tom, ann).   -> true
% ?- ancestor(tom, W).     -> W = bob ; W = ann ; W = pat

The single most important — and most surprising — feature is negation as failure (NAF): Prolog concludes \+ Goal (“not Goal”) whenever it fails to prove Goal. This is the closed-world assumption (§33.8) made operational: unproven means false. It is not the same as classical negation — \+ raining says “I have no proof it is raining,” not “it is provably dry.”

Datalog is Prolog with the sharp edges filed off: no function symbols, no clause ordering, no NAF traps in its pure form. That restriction buys a guarantee Prolog lacks — every query terminates, and evaluation can run bottom-up (forward chaining) to a fixpoint. This is why Datalog powers modern systems where decidability is non-negotiable: program analysis (the Doop and Soufflé frameworks), cloud-network reachability checks at AWS, and graph databases.

# Datalog in Python (pip install pyDatalog) — transitive closure that always halts.
from pyDatalog import pyDatalog
pyDatalog.create_terms('parent, ancestor, X, Y, Z')
+parent('tom', 'bob'); +parent('bob', 'ann')
ancestor(X, Y) <= parent(X, Y)
ancestor(X, Y) <= parent(X, Z) & ancestor(Z, Y)
print(ancestor('tom', Y))      # -> bob, ann   (all descendants of tom)

Answer Set Programming (ASP) pushes logic programming toward hard combinatorial search. Instead of one proof, an ASP solver (clingo, DLV) computes stable models — complete consistent worlds satisfying every rule — and can enumerate all solutions. With “choice rules” and constraints it solves graph colouring, scheduling, and planning declaratively: you state the constraints, the solver finds every assignment that satisfies them, much like a SAT solver (§33.12) but with a richer rule language.

flowchart LR
  HC["Horn clauses<br/>head :- body"] --> PL["Prolog<br/>top-down + NAF<br/>(may loop)"]
  HC --> DL["Datalog<br/>bottom-up, no functions<br/>(always terminates)"]
  HC --> ASP["Answer Set Prog.<br/>stable models<br/>(all solutions)"]
  PL --> USE1["theorem proving,<br/>NLP, config"]
  DL --> USE2["program analysis,<br/>graph queries"]
  ASP --> USE3["scheduling,<br/>combinatorial search"]

Warning

Common mistake: assuming a Prolog query always terminates. Clause order and left-recursion matter — swapping the two ancestor rules, or writing ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z), can send the engine into an infinite loop. Datalog avoids this by construction; in Prolog you must order clauses with the search strategy in mind.

Tip

Intuition: logic programming is “say what’s true, ask questions, get answers.” Prolog searches top-down and can recurse forever; Datalog trades function symbols for guaranteed termination; ASP enumerates whole consistent worlds. Same Horn-clause DNA, three points on the expressiveness-vs-guarantees curve.

33.4 — Semantic Networks, Frames, Ontologies, the Semantic Web (RDF/OWL), and Knowledge Graphs

Logic stores knowledge as flat formulas; this family stores it as structure — nodes and links you can draw — which is often closer to how people organise concepts.

A semantic network is a graph whose nodes are concepts and whose labelled edges are relations. The signature edge is is-a (subclass/instance), which gives inheritance: if Canary is-a Bird and Bird can-Fly, then a canary inherits flight without our restating it. Frames (Minsky) repackage the same idea object-by-object: a frame is a named record with slots (attributes) and fillers (values), and slots can carry defaults that a more specific frame overrides — the direct ancestor of object-oriented classes.

Note

Default inheritance and its bite. Slots with defaults make networks compact but introduce non-monotonic reasoning: adding a fact can retract a conclusion. The textbook case is Tweety the penguin — Bird has default can-Fly = true, so we infer Tweety flies; then we learn Penguin overrides can-Fly = false, and the earlier conclusion is withdrawn. Classical logic is monotonic (new facts never cancel old conclusions); default/inheritance reasoning deliberately is not, which is both its convenience and its hazard. §33.9 treats this head-on.

An ontology is a formal, shared specification of the concepts in a domain and the constraints among them — the vocabulary plus the rules of the game. Scaled to the whole web, this became the Semantic Web, whose data model is RDF (Resource Description Framework): every fact is a triple of (subject, predicate, object), with things named by URIs so they are globally unambiguous. OWL (Web Ontology Language) layers expressive class axioms on top (e.g. “every Parent is a Person who hasChild some Person”), and SPARQL is the query language for triple stores.

A knowledge graph is the pragmatic, industrial descendant: a large graph of real-world entities and their relationships, powering Google’s search panels, Wikidata, and enterprise data integration. The same triple shows up everywhere:

:Socrates  rdf:type     :Human .       # Socrates is a Human
:Human     rdfs:subClassOf :Mortal .   # every Human is a Mortal
# a reasoner now infers:  :Socrates rdf:type :Mortal .

That last line was never stated — a reasoner derives it from the subclass axiom. This is the bridge to the next section: the inference power of these structures comes from an underlying description logic.

flowchart LR
    SN["Semantic networks<br/>(nodes + is-a links)"] --> FR["Frames<br/>(slots + defaults)"]
    FR --> ONT["Ontologies<br/>(formal vocabularies)"]
    ONT --> RDF["RDF triples<br/>(s, p, o)"]
    RDF --> OWL["OWL / Description Logic<br/>(class axioms)"]
    OWL --> KG["Knowledge Graphs<br/>(Wikidata, Google)"]

Building and querying triples in practice

The Python library rdflib lets you build a graph of triples, attach the RDFS subclass axiom, run a transitive-closure reasoner, and query with SPARQL — the whole Semantic Web stack in a dozen lines:

# pip install rdflib
from rdflib import Graph, Namespace, RDF, RDFS
from rdflib.namespace import OWL

EX = Namespace("http://ex.org/")
g = Graph()
g.add((EX.Socrates, RDF.type, EX.Human))            # :Socrates a :Human
g.add((EX.Human, RDFS.subClassOf, EX.Mortal))       # :Human ⊑ :Mortal

# Materialise entailments (RDFS rule: type + subClassOf -> type)
from rdflib.plugins import sparql  # noqa
import owlrl
owlrl.RDFSClosure.RDFS_Semantics(g, False, False, False).closure()

q = "SELECT ?x WHERE { ?x a <http://ex.org/Mortal> }"
print([str(r.x) for r in g.query(q)])   # -> ['http://ex.org/Socrates']  (derived!)

Tip

Intuition: a triple store is just “subject–predicate–object” sentences at web scale. Reasoning is the act of writing down all the unstated sentences those sentences force to be true.

33.5 — Reasoning Under Uncertainty

Pure logic is brittle in a specific way: it assumes facts are certainly true or certainly false. The real world is full of “probably,” “usually,” and “I’m 80% sure.” Reasoning under uncertainty extends KRR to handle degrees of belief, and it is where the symbolic and probabilistic traditions meet.

The cleanest framework is probability: replace “\(P\) is true” with “\(P\) holds with probability \(0.8\),” and replace logical implication with conditional probability. The early expert system MYCIN used ad-hoc certainty factors (numbers in \([-1, 1]\) combined by hand-tuned formulas); these worked but lacked a rigorous semantics, and the field moved to Bayesian networks, which represent variables as graph nodes and use the laws of probability to propagate evidence soundly. (Certainty factors here are a numeric layer on top of the same backward-chaining rule engine from §33.2 — the engine still fires rules; the factors just attach a confidence to each conclusion.)

Worked example — one step of Bayes’ rule. A medical test is 90% sensitive (\(P(\text{pos}\mid\text{disease})=0.9\)) and has a 10% false-positive rate (\(P(\text{pos}\mid\neg\text{disease})=0.1\)); the disease affects 1% of people (\(P(\text{disease})=0.01\)). A patient tests positive — how worried should they be?

\[P(\text{disease}\mid\text{pos}) = \frac{P(\text{pos}\mid\text{disease})\,P(\text{disease})}{P(\text{pos})} = \frac{0.9 \times 0.01}{0.9\times0.01 + 0.1\times0.99} = \frac{0.009}{0.108} \approx 0.083.\]

In words: the chance you are sick given a positive test is the fraction of all positive tests that come from genuinely sick people — true positives divided by all positives. Also written: \(P(D\mid +) = \dfrac{P(+\mid D)\,P(D)}{P(+\mid D)\,P(D) + P(+\mid \neg D)\,P(\neg D)}\) (denominator expanded by the law of total probability).

Despite a positive result on a fairly accurate test, the probability of disease is only about 8% — because the disease is rare, most positives are false alarms. The clearest way to see this is to drop the fractions and reason over 1000 actual people: only 10 have the disease (and ~9 of them test positive), while 990 are healthy (and ~99 of those still test positive). So of the ~108 positives, only ~9 are truly sick:

# The same base-rate calculation, the way you'd actually write it.
P_pos_given_D, P_pos_given_notD, P_D = 0.9, 0.1, 0.01
P_notD = 1 - P_D
P_pos  = P_pos_given_D*P_D + P_pos_given_notD*P_notD     # law of total prob.
post   = P_pos_given_D*P_D / P_pos                       # Bayes
print(round(post, 3))                                   # 0.083
assert abs(post - 0.083) < 1e-3

A logical engine cannot express this nuance; a probabilistic one captures it exactly. Beyond probability there are other formalisms — fuzzy logic for vague predicates (“tall”), Dempster–Shafer theory for representing ignorance — but Bayesian methods dominate. The deep treatment of these graphical models lives in its own chapter; here the point is only that KRR is not confined to certainty.

Warning

Common mistake: treating a high-accuracy test as near-certain proof. The base rate dominates: when a condition is rare, even a good test produces mostly false positives. Logic alone hides this trap; only carrying the probabilities through reveals it.

33.6 — Description Logics

Description logics (DLs) are the formal engine underneath ontologies and OWL. They occupy a deliberate sweet spot: a carefully chosen fragment of first-order logic that is expressive enough to define rich concepts yet restricted enough that inference stays decidable — guaranteed to terminate with a yes/no answer. This is the central design trade of the whole field: give up some of FOL’s power to win back computational guarantees.

A DL knowledge base has two parts. The TBox (“terminology”) holds the schema — definitions of concepts and how they relate. The ABox (“assertions”) holds the data — facts about specific individuals. From these, concepts (classes, like \(Person\)), roles (binary relations, like \(hasChild\)), and individuals (instances, like \(alice\)) are assembled with constructors: intersection \(\sqcap\), union \(\sqcup\), negation \(\neg\), and quantified role restrictions \(\exists hasChild.\top\) (“has some child”) and \(\forall hasChild.Doctor\) (“all children are doctors”).

For example, the concept “a parent of only doctors” is written:

\[HappyParent \equiv Person \sqcap \exists hasChild.\top \sqcap \forall hasChild.Doctor.\]

In words: a HappyParent is exactly anyone who is a person, and has at least one child, and whose every child is a doctor. Also written: in plain FOL, \(HappyParent(x) \iff Person(x) \wedge \big(\exists y\, hasChild(x,y)\big) \wedge \forall y\,\big(hasChild(x,y) \rightarrow Doctor(y)\big)\).

Given that definition (TBox) plus facts about individuals (ABox), a DL reasoner can automatically deduce class memberships nobody stated. Here is the derivation drawn out — TBox on the left, ABox facts feeding in, and the conclusion the reasoner adds:

Step by step: bob is a \(Person\) (given); bob has some child, namely carol, satisfying \(\exists hasChild.\top\); and every child bob has (just carol) is a \(Doctor\), satisfying \(\forall hasChild.Doctor\). All three conjuncts of the \(HappyParent\) definition hold, so the reasoner concludes \(bob : HappyParent\) — a fact never asserted in the ABox. This is instance checking; alongside it the standard tasks are subsumption (is concept \(A\) necessarily a subclass of \(B\)?), consistency (does the KB contradict itself?), and classification (build the whole subclass hierarchy).

Task	Question it answers
Subsumption	Is every \(A\) necessarily a \(B\)?
Consistency	Is the KB free of contradiction?
Instance checking	Does individual \(x\) belong to concept \(C\)?
Classification	Compute the full subclass hierarchy of the TBox

The naming convention encodes expressiveness: \(\mathcal{ALC}\) is the base logic; adding letters (\(\mathcal{S}\), \(\mathcal{H}\), \(\mathcal{O}\), \(\mathcal{I}\), \(\mathcal{N}\), \(\mathcal{Q}\)) for features like role hierarchies and number restrictions yields \(\mathcal{SHOIN}\) (≈ OWL DL) and \(\mathcal{SROIQ}\) (≈ OWL 2 DL). Each added feature buys expressiveness at a known cost in worst-case complexity — the trade made explicit and tunable.

Authoring a TBox/ABox and letting a reasoner classify it

The Python library owlready2 lets you declare classes, restrictions, and individuals as ordinary Python, then runs the HermiT or Pellet DL reasoner to materialise inferred memberships:

# pip install owlready2  (needs Java for the HermiT reasoner)
from owlready2 import *

onto = get_ontology("http://ex.org/onto.owl")
with onto:
    class Person(Thing): pass
    class Doctor(Person): pass
    class hasChild(Person >> Person): pass
    class HappyParent(Person):                       # TBox: equivalent-to definition
        equivalent_to = [Person & hasChild.some(Person) & hasChild.only(Doctor)]

    carol = Doctor("carol")
    bob   = Person("bob"); bob.hasChild = [carol]    # ABox: bob's only child is carol

sync_reasoner(onto)                                  # run HermiT
print(HappyParent in bob.is_a or bob in HappyParent.instances())  # True (derived)

Tip

Intuition: description logics are “object-oriented modelling with a proof engine.” You define classes by their necessary and sufficient conditions, and the reasoner figures out the class hierarchy and memberships for you — even ones you never wrote down.

33.7 — Strengths (Interpretability, Guarantees) vs the Brittleness / Knowledge-Acquisition Bottleneck

Symbolic KRR earns its place through two strengths that data-driven learning struggles to match. First, interpretability: every conclusion comes with a proof trace — a chain of rules and facts you can read, audit, and challenge. When MYCIN recommended an antibiotic, it could answer “why?” by replaying its reasoning. For a regulated domain (medicine, law, finance, safety), that auditability is not a luxury. Second, guarantees: sound inference means a derived fact is provably entailed by the KB — not probable, not usually-right, but certain given the premises. A neural network offers neither by default.

But these strengths are bought at a steep price, and the price is why symbolic AI fell out of fashion. The crippling problem is the knowledge-acquisition bottleneck: every fact and rule must be hand-authored by experts, which is slow, expensive, and error-prone. Cyc, the most ambitious attempt to hand-encode common sense, consumed decades of effort and still could not cover the open world. Worse, symbolic systems are brittle: they handle exactly the cases their rules anticipate and fail abruptly — not gracefully — outside them. A rule base with no rule for an unseen situation simply stalls. And formalising messy, intuitive human knowledge (“this looks like a cat”) into crisp logical predicates is often impossible — exactly the perceptual tasks where neural networks excel.

flowchart LR
    subgraph S["Symbolic KRR"]
        SP["✓ Interpretable<br/>✓ Provable guarantees<br/>✓ Data-efficient"]
        SM["✗ Brittle outside rules<br/>✗ Manual knowledge entry<br/>✗ Poor with perception"]
    end
    subgraph N["Neural / Statistical"]
        NP["✓ Learns from raw data<br/>✓ Robust to noise<br/>✓ Great at perception"]
        NM["✗ Opaque 'black box'<br/>✗ No guarantees<br/>✗ Data-hungry"]
    end
    S -. "complementary<br/>strengths" .- N

The table makes the contrast precise — and notice the strengths are almost exactly complementary, which is the entire premise of the next section.

Dimension	Symbolic KRR	Neural / statistical ML
Explainability	transparent proof traces	opaque, post-hoc only
Correctness	provable guarantees	empirical, probabilistic
Data needs	knowledge hand-authored	learns from large datasets
Perception (vision, speech)	weak	strong
Behaviour on novel inputs	brittle, may stall	degrades gracefully
Acquiring knowledge	expert bottleneck	automatic from data

Warning

Common mistake: reading this list as “symbolic AI lost.” It didn’t — it became infrastructure. Knowledge graphs, ontologies, and rule engines quietly run search, fraud checks, and clinical decision support today. The lesson is that hand-authored knowledge does not scale to the open world alone, not that it is worthless.

33.8 — The Frame Problem and Closed- vs Open-World Assumptions

Before reasoning systems can be trusted, two foundational assumptions have to be pinned down — and getting them wrong is the source of a surprising amount of grief. Both are about what a system is entitled to conclude from silence.

The closed-world assumption (CWA) says: anything not provable from the KB is taken to be false. The open-world assumption (OWA) says the opposite: anything not stated is merely unknown, neither true nor false. The choice is not cosmetic — it changes the answers.

Everyday analogy. A flight-booking database uses CWA: if “Flight 42 to Tokyo” is not in the table, the app confidently says “no such flight.” A medical ontology uses OWA: if the record does not say a patient is allergic to penicillin, the reasoner must not conclude they are safe — absence of a stated allergy is not evidence of no allergy. Databases and Prolog (via negation-as-failure) assume a closed world; OWL and the Semantic Web assume an open one, precisely because the web is never complete.

The frame problem is the other foundational headache, and it bites whenever a system reasons about change (planning, robotics, action). When an action happens, you must specify not only what it changes but the vast number of things it leaves alone. If a robot moves a cup, the logic must somehow know the colour of the wall, the time of day, and the positions of every other object are all unaffected — yet stating every such non-effect explicitly is hopeless. The practical fix is the frame axiom / commonsense law of inertia: assume every fact persists across an action unless a rule explicitly changes it.

\[\text{Holds}(f, s) \wedge \neg \text{Affects}(a, f) \;\rightarrow\; \text{Holds}(f, \text{do}(a, s))\]

In words: a fact \(f\) that held in situation \(s\) still holds after doing action \(a\), as long as \(a\) does not affect \(f\). Also written: in event-calculus style, \(\text{HoldsAt}(f, t) \wedge \neg\exists e\,[\text{Happens}(e,t) \wedge \text{Terminates}(e,f,t)] \rightarrow \text{HoldsAt}(f, t{+}1)\) — a fluent persists unless some event terminates it.

This single idea — inertia by default, change only when forced — is exactly what underlies the STRIPS action representation used by planners and is why frame, closed-world, and non-monotonic reasoning all show up together: they are the machinery for reasoning sensibly about an incompletely-specified, changing world.

Warning

Common mistake: silently importing a database’s closed-world habit into an ontology. Asking an OWL reasoner “is Bob not a doctor?” returns unknown, not yes — and code that treats “not returned” as “false” will quietly fabricate conclusions the KB never licensed.

33.9 — Non-Monotonic Reasoning & Default Logic

Classical logic has a property that sounds harmless but is actually a straitjacket: it is monotonic. Once you prove something, no new fact can ever take it back — the set of conclusions only ever grows. Human reasoning is nothing like that. We jump to sensible default conclusions and retract them the moment we learn an exception. Non-monotonic reasoning is the formal study of that: logics where adding a premise can remove a conclusion.

Think of it as reasoning with an eraser. Monotonic logic writes in permanent ink — every line stays forever. Non-monotonic logic writes in pencil: “birds fly, so Tweety flies” goes down, then “Tweety is a penguin” arrives and you erase the flying conclusion. The pencil is what lets you act on incomplete information instead of freezing until you know everything.

The canonical example is the one §33.4 hinted at. We believe birds fly — but only typically. Encode it as a default rule: “if \(x\) is a bird, and it is consistent to assume \(x\) flies, then conclude \(x\) flies.” That middle clause — consistent to assume — is what classical logic cannot express, and it is the heart of default logic (Reiter, 1980).

\[\frac{Bird(x) \;:\; Flies(x)}{Flies(x)}\]

In words: given that \(x\) is a bird, if assuming \(x\) flies does not contradict anything you already know, then go ahead and conclude \(x\) flies. Also written: a Reiter default \(\dfrac{\alpha : \beta}{\gamma}\) reads “if prerequisite \(\alpha\) holds and justification \(\beta\) is consistent with the current beliefs, derive consequent \(\gamma\)”; here \(\alpha=\beta=\gamma=\) “\(x\) flies” given \(Bird(x)\).

Now watch the non-monotonicity. With KB \(=\{Bird(tweety)\}\), the default fires and we conclude \(Flies(tweety)\). Add the fact \(\neg Flies(tweety)\) (Tweety is a penguin): now assuming \(Flies(tweety)\) is inconsistent, the default no longer fires, and the earlier conclusion vanishes. More information, fewer conclusions — the exact opposite of monotonic logic.

Several formalisms tame this. Default logic computes extensions — maximal consistent sets of beliefs you can build by applying defaults. Circumscription (McCarthy) minimises the abnormal cases: “everything is normal unless forced otherwise,” formalising “birds fly except abnormal ones.” Negation as failure in Prolog and the stable models of Answer Set Programming (§33.3) are the operational, runnable cousins — \+ flies(X) succeeds precisely when flight cannot be proved. And the frame axiom of §33.8 is itself a non-monotonic default: facts persist unless an action is known to change them.

# Defeasible "birds fly" with an explicit exception list — the operational core
# of negation-as-failure. A conclusion holds unless a known exception blocks it.
birds       = {"tweety", "robin", "opus"}
cannot_fly  = {"opus"}                       # opus is a penguin (the exception)

def flies(x):
    # default: a bird flies UNLESS we can show it can't (closed-world on exceptions)
    return x in birds and x not in cannot_fly

assert flies("tweety")          # default fires
assert not flies("opus")        # exception retracts the default conclusion
# Adding "opus" to cannot_fly REMOVED a conclusion -> non-monotonic
print({b: flies(b) for b in birds})   # {'tweety': True, 'robin': True, 'opus': False}

flowchart TB
  M["Monotonic logic<br/>conclusions only grow"] -->|too rigid for<br/>incomplete info| NM["Non-monotonic reasoning"]
  NM --> DL["Default logic<br/>(extensions)"]
  NM --> CIRC["Circumscription<br/>(minimise abnormality)"]
  NM --> NAF["Negation as failure<br/>(Prolog, ASP)"]
  NM --> FR["Frame axiom<br/>(inertia by default)"]

Warning

Common mistake: treating a default conclusion as a proven fact downstream. Defaults are defeasible — a system that caches “Tweety flies” and never rechecks it when new facts arrive will act on a belief the logic has already retracted. Non-monotonic conclusions must be re-derived, not memoised as permanent.

Tip

Intuition: non-monotonic reasoning is how you act decisively without complete information. Assume the normal case, commit to it, and stay ready to take it back. It is the formal bridge between brittle classical logic and the flexible “usually, but…” reasoning people use every minute.

33.10 — The Modern Neurosymbolic Revival

If symbolic systems are interpretable but brittle, and neural networks are robust but opaque, the obvious move is to combine them — and that is exactly what neurosymbolic AI sets out to do. The goal is a system that perceives with neural networks and reasons with symbolic logic, getting data-driven robustness and rule-based guarantees in one pipeline.

The patterns vary. In a common perceive-then-reason arrangement, a neural network maps raw input (pixels, text) into symbols, and a logic engine reasons over those symbols. A neuro-symbolic question answerer, for instance, uses a vision network to detect objects in an image, then runs a symbolic program over the detections to answer “are there more red cubes than blue spheres?” — perception where neural nets shine, counting and comparison where logic shines. Other approaches push logic into the network: differentiable reasoning relaxes hard logical operations into smooth functions so that rules can be trained by gradient descent; knowledge-graph embeddings turn symbolic entities into vectors so a network can reason approximately over a graph; and logic-as-regularisation penalises a model during training whenever it violates known constraints.

flowchart LR
    IMG["Raw input<br/>(image / text)"] --> NN["Neural perception<br/>(extract symbols)"]
    NN --> SYM["Symbolic facts<br/>(red, cube, left-of)"]
    SYM --> REASON["Logic / KG reasoner<br/>(apply rules)"]
    REASON --> ANS["Answer + proof trace"]
    KB["Knowledge base /<br/>constraints"] --> REASON
    KB -. "regularise" .-> NN

This is also where KRR meets today’s frontier. Large language models absorb vast knowledge statistically but hallucinate — they have no guarantee of factual correctness. Pairing an LLM with a knowledge graph or a symbolic verifier (the spirit of retrieval-augmented generation and tool-use) grounds its fluent output in checkable facts, a direct modern echo of the old logical guarantee. The pendulum that swung from symbolic to neural is swinging back toward the middle.

Worked illustration — logic as a constraint on a classifier. Suppose a network outputs probabilities for is-cat and is-dog, and we know the rule “an animal cannot be both.” We turn that rule into a penalty — \(p_{cat}\cdot p_{dog}\) — which is positive whenever both fire and zero as soon as either drops to zero. Adding it to the training loss therefore pushes the model away from the impossible “both” corner:

\[\mathcal{L} = \mathcal{L}_{\text{data}} + \lambda\, p_{cat}\, p_{dog}\]

In words: train on the usual data loss, plus a tunable penalty \(\lambda\) times the “both-fired” product, so the optimiser is pushed away from logically impossible answers. Also written: as a soft (product/Gödel) t-norm reading of the rule \(\neg(cat \wedge dog)\), the constraint loss is \(\lambda \cdot t\text{-}norm(p_{cat}, p_{dog})\) with \(t\text{-}norm(a,b)=ab\).

import numpy as np
# Model's raw beliefs for one image:
p_cat, p_dog = 0.7, 0.6          # both high -> violates "not both"
# Logical constraint:  NOT (cat AND dog).  Penalise the product.
violation = p_cat * p_dog                      # 0.42  -> nonzero penalty
loss_logic = violation                          # added to the usual training loss

# The penalty is POSITIVE exactly when both fire, and 0 if either is 0.
# Minimising it therefore discourages the joint "cat AND dog" outcome.
assert loss_logic > 0                                  # constraint is violated here
assert p_cat * 0.0 == 0 and 0.0 * p_dog == 0           # satisfied if either drops to 0
print(f"constraint violation penalty = {loss_logic:.2f}")

The constraint as a gentle force — both beliefs lit means a glowing penalty pulls them apart; let either fade and the pull vanishes:

Where this shows up: the same trick — turn a known rule into a differentiable penalty — powers real systems. Self-driving perception stacks penalise physically impossible label combinations; PSL and DeepProbLog train networks under logical constraints; and an LLM re-ranker can down-weight any answer that violates a knowledge-graph fact. The rule is not enforced as a hard gate; it is a soft force nudging the network toward consistency.

Note what the code does and does not show: it computes the penalty, not a gradient step. The point is that the penalty is positive precisely when the constraint is violated and vanishes when it is satisfied — so minimising it (via whatever optimiser trains the network) steers the model away from logically impossible answers. The network still learns from data; the symbolic rule just adds a force pulling it toward consistency — neural perception and symbolic knowledge pulling in the same harness.

The same idea, written as a real training step in PyTorch — the logic term is just another addend in the loss, fully differentiable:

import torch, torch.nn.functional as F

logits = model(x)                       # raw scores for [cat, dog]
p = torch.sigmoid(logits)               # independent probabilities
data_loss  = F.binary_cross_entropy(p, targets)
logic_loss = (p[:, 0] * p[:, 1]).mean() # penalise P(cat)·P(dog): "not both"
loss = data_loss + 0.5 * logic_loss     # λ = 0.5 weights the constraint
loss.backward()                         # gradients flow through the rule too
optimizer.step()

Tip

Intuition: let each tradition do what it is good at. Neurons handle the fuzzy, high-bandwidth job of turning the world into symbols; logic handles the crisp, auditable job of reasoning over them. Neurosymbolic AI is the wiring that connects the two.

33.11 — Resolution and Proof by Refutation

The honest way to prove a statement is to assume it is false and watch the world catch fire. If assuming “the opposite” forces a contradiction, the original statement had to be true. This is proof by refutation, and it is the engine inside almost every automated theorem prover.

Resolution is the single inference rule that makes refutation mechanical. The intuition: if you know “it is raining OR the ground is wet” and you also know “it is NOT raining OR I brought an umbrella,” then in every world the two clauses agree on, you can cancel the rain and conclude “the ground is wet OR I brought an umbrella.” Two clauses that disagree on exactly one literal can be merged, with that literal struck out.

Formally, working in conjunctive normal form (CNF) — an AND of clauses, where each clause is an OR of literals — the resolution rule is:

\[\frac{(A \lor \ell) \qquad (B \lor \lnot \ell)}{(A \lor B)}\]

In words: from two clauses where one has \(\ell\) and the other has its negation, derive a new clause containing everything else from both, with the clashing pair removed. Also written: as a set operation on clauses, \(\text{resolve}(C_1, C_2) = (C_1 \setminus \{\ell\}) \cup (C_2 \setminus \{\lnot\ell\})\) whenever \(\ell \in C_1\) and \(\lnot\ell \in C_2\).

The new clause \((A \lor B)\) is the resolvent. If \(A\) and \(B\) are both empty, the resolvent is the empty clause \(\square\), which is unsatisfiable by definition — there is no literal left to make true. Deriving \(\square\) is the contradiction we were hunting for.

The refutation recipe is always the same:

flowchart LR
  KB["Knowledge base (true facts)"] --> C["Add ¬(goal)"]
  C --> N["Convert all to CNF"]
  N --> R["Resolve clause pairs<br/>repeatedly"]
  R --> Q{"Derived □ ?"}
  Q -- yes --> P["goal is PROVEN"]
  Q -- no, no new clauses --> F["goal does NOT follow"]

Tip

You never try to prove the goal directly. You add its negation to the facts you already trust, then show the enlarged set is impossible. If the facts plus “not the goal” cannot all be true at once, the goal must follow from the facts.

Worked example

Suppose we know three things and want to prove the patient is healthy (\(H\)).

#	English	Clause (CNF)
1	If feverish, then infected	\(\lnot F \lor I\)
2	If infected, then not healthy is false — i.e. treatment makes healthy	\(\lnot I \lor H\)
3	The patient is feverish	\(F\)

Goal: prove \(H\). Negate it and add \(\lnot H\) as clause 4. Now resolve:

# clauses as frozensets of literals; +x = x, -x = not x
clauses = [frozenset({-1, 2}),   # ¬F ∨ I     (F=1, I=2, H=3)
           frozenset({-2, 3}),   # ¬I ∨ H
           frozenset({1}),       # F
           frozenset({-3})]      # ¬H  (negated goal)

def resolve(ci, cj):                       # all resolvents of two clauses
    out = []
    for lit in ci:
        if -lit in cj:                     # complementary literal found
            r = (ci - {lit}) | (cj - {-lit})
            if not any(l in r and -l in r for l in r):  # skip tautologies
                out.append(frozenset(r))
    return out

known = set(clauses)
frontier = list(clauses)
while frontier:
    a = frontier.pop()
    for b in list(known):
        for r in resolve(a, b):
            if len(r) == 0:                # empty clause = ⊥
                print("Contradiction -> H proven"); raise SystemExit
            if r not in known:
                known.add(r); frontier.append(r)
print("H not provable")

Tracing by hand: clause 3 (\(F\)) resolves with clause 1 (\(\lnot F \lor I\)) to give \(I\). That \(I\) resolves with clause 2 (\(\lnot I \lor H\)) to give \(H\). And \(H\) resolves with the negated goal \(\lnot H\) to give the empty clause \(\square\). Contradiction reached, so \(H\) is proven.

Resolution is refutation-complete: if a set of clauses is genuinely unsatisfiable, resolution will eventually derive \(\square\). The catch is eventually — the search space of clause pairs explodes, which is exactly the pressure that motivates the smarter solvers below.

Warning

Resolution proves unsatisfiability, not satisfiability. If it never derives \(\square\) and runs out of new clauses, the goal simply does not follow — but on large problems “runs out” can take astronomically long, and naive resolution generates oceans of useless tautological clauses if you do not prune them.

33.12 — Unification: Pattern-Matching for First-Order Logic

Propositional resolution cancels \(\ell\) against \(\lnot \ell\) — an exact match. First-order logic has variables, so \(Loves(x, \text{Mary})\) and \(Loves(\text{John}, y)\) ought to cancel too: there is clearly a way to make them the same statement. Unification is the algorithm that finds that way, computing the substitution that makes two expressions identical.

The intuition is jigsaw matching. A variable is a blank that can take any shape; a constant is a fixed shape; a function term is a shape with its own blanks. Unification slides the pieces together and records what each blank had to become. The result is the most general unifier (MGU) — the least committal substitution that works, leaving everything not forced to be equal still free.

\[\text{unify}(Loves(x,\text{Mary}),\; Loves(\text{John},y)) = \{\,x/\text{John},\; y/\text{Mary}\,\}\]

In words: to make “x loves Mary” and “John loves y” the same sentence, x has to be John and y has to be Mary — and nothing more is forced. Also written: a unifier \(\theta\) is any substitution with \(s\theta = t\theta\); the MGU is the \(\theta\) such that every other unifier \(\sigma\) factors as \(\sigma = \theta\circ\gamma\) for some further substitution \(\gamma\).

The rules are short:

variable vs. anything → bind the variable to that term (record it)
constant vs. same constant → succeed, no binding
constant vs. different constant → fail
\(f(\dots)\) vs. \(g(\dots)\) with different functor/arity → fail
\(f(s_1..s_n)\) vs. \(f(t_1..t_n)\) → unify arguments pairwise, threading the substitution forward

One rule has teeth: the occurs check. You may not bind \(x\) to a term that contains \(x\), such as \(x = f(x)\) — that describes an infinite term \(f(f(f(\dots)))\) and no finite substitution satisfies it. Skipping the occurs check is what lets a logic engine quietly produce garbage; many Prolog systems omit it for speed and accept the risk.

def unify(s, t, sub):
    s, t = walk(s, sub), walk(t, sub)          # follow existing bindings
    if s == t:                  return sub                       # same const/var
    if isvar(s):                return bind(s, t, sub)           # x = t
    if isvar(t):                return bind(t, s, sub)           # t = x
    if isfun(s) and isfun(t):                                    # f(...) vs g(...)
        if s.op != t.op or len(s.args) != len(t.args): return None
        for a, b in zip(s.args, t.args):                         # args pairwise
            sub = unify(a, b, sub)
            if sub is None: return None
        return sub
    return None                                                  # const clash

def bind(v, t, sub):
    if occurs(v, t, sub): return None          # occurs check: no x = f(x)
    return {**sub, v: t}

Worked example

Unify \(P(a,\; y,\; f(y))\) with \(P(x,\; b,\; z)\) (lowercase letters \(x,y,z\) are variables; \(a,b\) are constants).

Step	Compare	Action	Substitution so far
1	\(P\) vs \(P\), arity 3	match functor, recurse on args	\(\{\}\)
2	\(a\) vs \(x\)	bind \(x\) to \(a\)	\(\{x/a\}\)
3	\(y\) vs \(b\)	bind \(y\) to \(b\)	\(\{x/a,\, y/b\}\)
4	\(f(y)\) vs \(z\)	apply \(y/b\) then bind \(z\)	\(\{x/a,\, y/b,\, z/f(b)\}\)

The MGU is \(\{x/a,\ y/b,\ z/f(b)\}\). Apply it to either term and both become \(P(a, b, f(b))\).

With unification in hand, first-order resolution generalizes the propositional rule: to resolve two clauses, find a literal in one and a negated literal in the other whose atoms unify, apply the MGU to both clauses, and drop the matched pair. This is how a prover reasons over “all men are mortal, Socrates is a man” without ever enumerating every individual — the variable does the quantifying.

A runnable, self-contained MGU on simple nested tuples — handy for testing the algorithm without a full term library:

# Variables are strings starting uppercase; constants/functors are everything else.
# Terms: 'a' (atom) or ('f', arg1, arg2, ...) (compound).
def walk(t, s):
    while isinstance(t, str) and t[:1].isupper() and t in s: t = s[t]
    return t

def unify(x, y, s):
    if s is None: return None
    x, y = walk(x, s), walk(y, s)
    if x == y: return s
    if isinstance(x, str) and x[:1].isupper(): return {**s, x: y}   # var = y
    if isinstance(y, str) and y[:1].isupper(): return {**s, y: x}
    if isinstance(x, tuple) and isinstance(y, tuple) and len(x) == len(y) and x[0] == y[0]:
        for a, b in zip(x[1:], y[1:]):
            s = unify(a, b, s)
        return s
    return None                                                     # clash

mgu = unify(('P', 'a', 'Y', ('f', 'Y')), ('P', 'X', 'b', 'Z'), {})
assert mgu == {'X': 'a', 'Y': 'b', 'Z': ('f', 'b')}
print(mgu)

Warning

Variables in two different clauses are unrelated even if spelled the same. Always standardize apart (rename variables uniquely per clause) before unifying, or you will accidentally force \(x\) in one fact to equal \(x\) in an unrelated fact and derive nonsense.

33.13 — DPLL and CDCL: Inside Modern SAT Solvers

The Boolean satisfiability problem (SAT) asks: given a CNF formula, is there any assignment of true/false to the variables that makes every clause true? It was the first problem proven NP-complete, and in the worst case nothing beats trying assignments. Yet today’s solvers routinely crack formulas with millions of variables. The gap between “exponential in theory” and “fast in practice” is the story of two algorithms.

DPLL: backtracking search with two shortcuts

The naive approach is a binary tree over variables: try \(x_1 = \text{true}\), recurse, backtrack, try false. DPLL (Davis–Putnam–Logemann–Loveland) keeps that backtracking skeleton but adds two cheap deductions that prune enormous subtrees:

Unit propagation: a clause with all-but-one literal already false forces the last literal true. \((\lnot x_1 \lor x_2)\) with \(x_1 = \text{true}\) forces \(x_2 = \text{true}\). Then propagate again — one decision can cascade into dozens of forced assignments, a chain reaction called Boolean Constraint Propagation (BCP), where solvers spend most of their time.
Pure literal elimination: if a variable appears only positively (or only negatively) across all remaining clauses, assign it the satisfying polarity for free.

flowchart TD
  S["Pick unassigned var<br/>(decision)"] --> U["Unit-propagate<br/>all forced literals"]
  U --> C{"Conflict?<br/>(a clause all false)"}
  C -- no --> D{"All clauses<br/>satisfied?"}
  D -- yes --> SAT(["SAT"])
  D -- no --> S
  C -- yes --> B{"Any decision<br/>to undo?"}
  B -- yes --> J["Backtrack, flip<br/>last decision"] --> U
  B -- no --> UN(["UNSAT"])

Worked example: unit propagation does the work

Formula: \((x_1 \lor x_2) \land (\lnot x_1 \lor x_3) \land (\lnot x_3)\).

Clause 3 is a unit clause: it forces \(x_3 = \text{false}\).
Now clause 2 \((\lnot x_1 \lor x_3)\) has \(x_3\) false, so it becomes a unit forcing \(x_1 = \text{false}\).
Now clause 1 \((x_1 \lor x_2)\) has \(x_1\) false, forcing \(x_2 = \text{true}\).
Every clause is satisfied with no branching at all: \(x_1{=}F,\ x_2{=}T,\ x_3{=}F\). SAT.

Propagation alone solved it — that is why it dominates solver runtime.

DPLL’s search is a token threading down a decision tree: decide, propagate, and on a conflict (the red dead-end) backtrack and try the other branch until a leaf satisfies every clause:

# The same formula handed to a production CDCL solver (pip install python-sat).
from pysat.formula import CNF
from pysat.solvers import Glucose3
cnf = CNF(from_clauses=[[1, 2], [-1, 3], [-3]])      # (x1∨x2)∧(¬x1∨x3)∧(¬x3)
with Glucose3(bootstrap_with=cnf.clauses) as s:
    print(s.solve())          # True  (SAT)
    print(s.get_model())      # e.g. [-1, 2, -3]  ->  x1=F, x2=T, x3=F

CDCL: learning from every mistake

Plain DPLL, on hitting a conflict, just flips the most recent decision — it forgets why the conflict happened and will often rediscover the same dead end elsewhere. CDCL (Conflict-Driven Clause Learning) is the upgrade that powers every serious modern solver. When propagation hits a conflict, CDCL analyzes the implication graph — which decisions and forced assignments led here — and distills the root cause into a brand-new learned clause that it adds permanently to the formula. That clause forbids the entire class of assignments that caused the conflict, so the solver never repeats it.

CDCL then performs non-chronological backtracking (backjumping): instead of undoing one level, it leaps straight back to the decision level that actually caused the conflict, potentially skipping many irrelevant levels. Combined with activity-based variable heuristics (VSIDS, which favors variables involved in recent conflicts) and periodic restarts, this is what turns SAT from a toy into industrial infrastructure.

Feature	DPLL	CDCL
Backtracking	chronological (one level)	non-chronological (backjump)
Conflict response	flip last decision	learn a clause, never repeat
Memory of failures	none	accumulates learned clauses
Typical scale	thousands of vars	millions of vars

Warning

SAT solvers expect CNF. Converting an arbitrary formula to CNF by distributing \(\lor\) over \(\land\) can blow up exponentially. Real tools use the Tseitin transformation, which introduces fresh auxiliary variables to keep the encoding linear in size — never hand-distribute a large formula into CNF.

33.14 — Model Checking: Verifying Systems Exhaustively

Testing shows the presence of bugs, never their absence — you only ever check the inputs you thought to try. Model checking flips this: it explores every reachable state of a system and proves a property holds in all of them, or hands you a concrete counterexample trace showing exactly how it breaks. Where a theorem prover reasons about formulas, a model checker reasons about a finite (or finitely-abstracted) state machine.

The intuition: model the system as states and transitions, write the desired property in a temporal logic, then ask “is there any path that violates it?” If the search comes back empty, the property is verified for all behaviors; if not, the violating path is the bug report.

Properties are written in temporal logic, which adds operators for “over time” to ordinary logic:

\(\mathbf{G}\,\varphi\) — globally: \(\varphi\) holds in every state along every path (a safety/invariant property, e.g. “two trains are never on the same track”).
\(\mathbf{F}\,\varphi\) — finally: \(\varphi\) eventually holds (a liveness property, e.g. “every request is eventually served”).
\(\mathbf{X}\,\varphi\) — \(\varphi\) holds in the next state.
\(\varphi\,\mathbf{U}\,\psi\) — \(\varphi\) holds until \(\psi\) becomes true.

A small timeline makes the four operators concrete — each row marks where \(\varphi\) (●) must hold along a path:

Worked example: a two-process mutual-exclusion check

Two processes each cycle through idle → trying → critical. The safety property we want: \(\mathbf{G}\,\lnot(\text{crit}_1 \land \text{crit}_2)\) — never both in the critical section at once. The model checker enumerates the reachable combined states:

stateDiagram-v2
  [*] --> I_I
  I_I: idle , idle
  T_I: trying , idle
  I_T: idle , trying
  C_I: crit , idle
  I_C: idle , crit
  BAD: crit , crit
  I_I --> T_I
  I_I --> I_T
  T_I --> C_I
  I_T --> I_C
  C_I --> I_I
  I_C --> I_I
  BAD:::err
  classDef err fill:#922,color:#fff

If the lock logic is correct, the BAD state \((\text{crit},\text{crit})\) simply has no incoming transition — it is unreachable, and \(\mathbf{G}\,\lnot(\text{crit}_1\land\text{crit}_2)\) holds. If a buggy protocol let both processes move from trying to crit independently, the checker would find a path \(I\_I \to T\_I \to \dots \to BAD\) and print it as the counterexample.

The fundamental enemy is state-space explosion: \(n\) Boolean variables give \(2^n\) states, and concurrency multiplies process states together. Two families of techniques fight it, and both lean directly on the SAT machinery of the previous section:

Symbolic model checking represents huge state sets compactly — classically with Binary Decision Diagrams (BDDs) — instead of enumerating states one by one.
Bounded model checking (BMC) asks “is there a counterexample of length \(\le k\)?” by unrolling the transition relation \(k\) steps and encoding the whole question as one giant CNF formula, then handing it to a CDCL SAT solver. A satisfying assignment is a counterexample trace; UNSAT means no bug up to depth \(k\).

# Bounded model checking, in spirit: unroll k steps, ask SAT for a bad path.
# state vars s0..sk; Init, Trans, Bad are CNF predicates over them.
formula = Init(s[0])                                  # start in an initial state
for i in range(k):
    formula &= Trans(s[i], s[i+1])                    # each step is legal
formula &= Or(*[Bad(s[i]) for i in range(k+1)])       # some state violates safety
# if SAT  -> assignment is a concrete counterexample trace
# if UNSAT -> property holds for all paths up to length k

The BMC unrolling expressed as a single propositional formula:

\[\text{Init}(s_0) \;\wedge\; \bigwedge_{i=0}^{k-1}\text{Trans}(s_i, s_{i+1}) \;\wedge\; \bigvee_{i=0}^{k}\text{Bad}(s_i)\]

In words: start in a legal initial state, take \(k\) legal steps, and require that some state along the way is bad — if a solver can satisfy this, that satisfying run is a real counterexample. Also written: SAT of this formula \(\equiv\) “\(\exists\) a length-\(\le k\) path from Init to a Bad state”; its negation being valid (UNSAT) \(\equiv\) the bounded safety property \(\text{Init} \rightarrow \mathbf{G}_{\le k}\,\lnot\text{Bad}\).

Warning

BMC with bound \(k\) only proves “no bug within \(k\) steps.” Declaring a system correct outright requires a completeness threshold (a \(k\) large enough to cover the whole state space) or an unbounded method like BDD-based checking or k-induction. A clean BMC run at small \(k\) is reassuring, not a proof.

33.15 — Where This Powers Planning and Verification

These four ideas are not museum pieces — they are the substrate under tools that ship real software and fly real spacecraft. The unifying trick is reduction: take a problem that does not look like logic, encode it as satisfiability or a reachability query, and let a decades-optimized solver do the work.

Planning as satisfiability. Classical AI planning — find a sequence of actions from an initial state to a goal — maps cleanly onto SAT. You unroll time into \(T\) steps, create a Boolean variable for every fact-at-time and every action-at-time, and add clauses encoding the rules: preconditions must hold before an action, effects hold after, and only one action fires per step. A satisfying assignment is a valid plan; the true literals spell out which action happens when. This is exactly the BMC unrolling pattern, reused for action sequences instead of bug traces.

flowchart LR
  P["Planning / verification<br/>problem"] --> E["Encode as CNF<br/>(unroll over time)"]
  E --> S["CDCL SAT solver"]
  S --> R{"SAT?"}
  R -- yes --> A["Read off plan /<br/>counterexample"]
  R -- no --> U["No plan / no bug<br/>within bound"]

Hardware and software verification. Every modern CPU is model-checked before tape-out — a single arithmetic bug recalled in silicon costs hundreds of millions, so chipmakers prove properties like “this cache protocol never deadlocks” exhaustively rather than testing samples. The same SAT/SMT engines drive software verifiers that prove a function never dereferences null or overflows a buffer on any input.

Domain	Question posed	Engine	Output
AI planning	reach goal in \(\le T\) steps?	SAT (encode + CDCL)	the action sequence
Chip design	can two writers collide?	model checker	counterexample waveform
Program proof	can this assert ever fail?	BMC / SMT	failing input, or “safe”
Protocol design	does every request get served?	temporal model check	violating execution

Beyond plain SAT: SMT. Most real problems mix Boolean logic with arithmetic, arrays, and bit-vectors. SMT (Satisfiability Modulo Theories) solvers — Z3, CVC5 — wrap a CDCL core with dedicated theory solvers that understand “\(x + y \le 10\)” or “array \(a\) at index \(i\).” The Boolean engine handles the logical skeleton; the theory solvers check that the chosen literals are consistent in their domain. This combination is what lets a verifier reason about actual program semantics, and it is the bridge from the pure logic of this chapter to the messy, typed, arithmetic world of the systems we actually build.

A real SMT query in Python — the z3-solver library proves a tiny arithmetic fact by failing to find a counterexample:

# pip install z3-solver
from z3 import Int, Solver, Not, Implies, And, sat

x, y = Int('x'), Int('y')
s = Solver()
# Claim: x>0 ∧ y>0  →  x+y>1.  Prove by checking the negation is UNSAT.
s.add(Not(Implies(And(x > 0, y > 0), x + y > 1)))
print(s.check())          # unsat  -> the claim holds for all integers
assert s.check() != sat   # no counterexample exists

Tip

The throughline of this whole topic: you rarely build a bespoke reasoner. You translate your question — a plan, an invariant, a proof obligation — into clauses, and ride the enormous engineering investment poured into SAT/SMT solvers and model checkers over the last thirty years. Knowing how to encode is now more valuable than knowing how to search.

33.16 — Quick reference

Term / method	One-line meaning	When / why you reach for it
Entailment \(KB \models \alpha\)	\(\alpha\) is true in every world the KB allows	The gold standard of “follows logically” — what every sound reasoner must respect
Modus ponens	From \(\alpha\!\to\!\beta\) and \(\alpha\), conclude \(\beta\)	The atomic forward step in chaining and proofs
Resolution + refutation	Assume \(\neg\)goal, derive the empty clause \(\square\)	The one mechanical rule behind theorem provers (§33.1, §33.11)
CNF (conjunctive normal form)	AND of clauses, each an OR of literals	The input format every resolver and SAT solver expects
Unification / MGU	Substitution making two FOL terms identical	Lets resolution cancel literals with variables; the engine inside Prolog
Forward chaining	Facts → all conclusions (data-driven)	Monitoring, alarms, deriving everything implied
Backward chaining	Goal → needed facts (goal-driven)	Diagnosis, answering one specific query
Horn clause	Rule with ≤1 positive literal, `head :- body`	The efficient sweet spot powering Prolog/Datalog/ASP
Negation as failure	Unproven ⇒ false (closed-world)	Defaults and exceptions in Prolog/ASP — beware: not classical \(\neg\)
RDF triple \((s,p,o)\)	Atomic fact in a knowledge graph	Web-scale knowledge; reasoners derive unstated triples
Description logic (TBox/ABox)	Decidable FOL fragment: schema + data	OWL ontologies; guaranteed-terminating subsumption/instance checks
Bayes’ rule + base rate	\(P(D\mid+)=\frac{P(+\mid D)P(D)}{P(+)}\)	Reasoning under uncertainty; exposes rare-disease false-positive traps
Open vs closed world	Silence ⇒ unknown vs silence ⇒ false	OWL/web use OWA; databases/Prolog use CWA — picking wrong fabricates facts
Frame axiom (inertia)	A fact persists unless an action changes it	Reasoning about change without listing every non-effect
Non-monotonic reasoning	New facts can retract old conclusions	“Birds fly, except penguins” — acting on incomplete info
DPLL	Backtracking + unit propagation + pure-literal	The classic SAT search skeleton
CDCL	DPLL + learned clauses + backjumping	Industrial SAT: millions of variables, never repeat a conflict
SMT (Z3, CVC5)	CDCL core + theory solvers (arithmetic, arrays)	Verifying real programs mixing logic and arithmetic
Model checking (G, F, X, U)	Exhaustively check a temporal property over all states	Hardware/protocol verification; returns a counterexample trace
Neurosymbolic AI	Neural perception feeds symbolic reasoning	Robustness + guarantees in one pipeline; grounds LLMs against hallucination

33.17 — Key Takeaways

Logic is the foundation: propositional logic reasons over whole statements; first-order logic adds objects, relations, and quantifiers. Resolution mechanises proof by refutation — assume the negation of the goal and derive a contradiction (the empty clause). For FOL, unification (e.g. \(\{x/socrates\}\)) finds the substitution that lets a complementary pair cancel.
Expert systems separate a rule knowledge base, working memory, and an inference engine. Forward chaining is data-driven (facts → conclusions); backward chaining is goal-driven (goal → needed facts). They prove the same things, differing only in efficiency.
Logic programming turns Horn clauses into runnable reasoners: Prolog (top-down, with negation-as-failure, may loop), Datalog (no functions, always terminates — used in program analysis and graph queries), and Answer Set Programming (stable models, enumerates all solutions to combinatorial problems).
Structured representations — semantic networks, frames, ontologies, RDF triples, OWL, and knowledge graphs — store knowledge as a graph and derive unstated facts via inheritance and axioms.
Description logics are decidable FOL fragments (TBox schema + ABox data) that power OWL; reasoners perform subsumption, consistency, and instance checking with termination guarantees.
Uncertainty is handled by moving from truth values to probabilities; Bayes’ rule and base rates expose traps that pure logic hides.
Foundational assumptions matter: the closed-world vs open-world choice changes what silence means, and the frame problem forces a default law of inertia for reasoning about change — both underpin planning and non-monotonic reasoning.
Non-monotonic reasoning (default logic, circumscription, negation-as-failure) lets conclusions be retracted when new facts arrive — the formal account of “birds fly, except penguins” and the way to act on incomplete information.
Symbolic AI’s strengths are interpretability and provable guarantees; its weaknesses are brittleness and the knowledge-acquisition bottleneck — complementary to neural networks’ profile.
Mechanised reasoning scales via reduction: resolution, DPLL/CDCL SAT solvers, SMT, and model checking turn proofs, plans, and verification into satisfiability or reachability queries you hand to a decades-optimised engine.
Neurosymbolic AI fuses the two: neural perception feeds symbolic reasoning, and logic constrains or grounds neural models — the modern antidote to LLM hallucination.

33.18 — See Also

Probabilistic Graphical Models — Bayesian networks and the rigorous treatment of reasoning under uncertainty referenced in §33.5.
Search & Problem Solving — the state-space search machinery underlying chaining and theorem proving.
Planning, Constraint Satisfaction & Game Playing — logic-based action representation and constraint reasoning, the sibling of KRR in symbolic AI.
Large Language Models — statistical knowledge, hallucination, and retrieval-augmented grounding that §33.10 pairs with symbolic methods.
Explainable AI & Interpretability — the modern formalisation of the transparency that symbolic systems offer natively.
Frontier & Emerging Directions — where the neurosymbolic program is heading next.

↪ The thread continues → Chapter 34 · 🗺️ Planning, CSP & Game Playing

With knowledge represented, an agent can plan action sequences, satisfy constraints, and outthink an opponent — the domain of planning, CSPs, and game-playing.

📖 All chapters | ← 32 · 🧭 Search & Problem Solving | 34 · 🗺️ Planning, Constraint Satisfaction & Game Playing →