Chapter 41 — 🤖 Robotics & Autonomy

📖 All chapters | ← 40 · 🔗 Graph Machine Learning | 42 · 📐 Learning Theory →

📚 Jump to any chapter

🧮 Mathematical Foundations

🧭 The ML Workflow

🧩 Classical Machine Learning

🎲 Probabilistic Models

13 · 🕸️ Probabilistic Graphical Models

🧠 Deep Learning

🗣️ Applied AI: Vision, Language, Audio & Time

🕹️ Reinforcement Learning

25 · 🕹️ Reinforcement Learning

🛠️ Applied ML Systems & Industries

🚀 Production, Tooling & Infrastructure

📚 Classical & Symbolic AI

⚖️ Responsible AI & Frontier

🎓 Advanced & Specialized Topics

🎚️ Post-Training & Fine-Tuning

🚢 Model Serving & Deployment

47 · 🚢 Model Serving & Deployment in Production

A robot is a machine that senses, decides, and acts in a physical world that never quite matches the model in its head. The hard part is not the motors or the math in isolation — it is closing the loop between them fast enough, and reliably enough, that the robot does the right thing while sensors lie and gears slip. This chapter walks the autonomy stack from raw sensing to motor commands, leaning on intuition first and the precise equations second.

Note

🧭 In context: autonomous systems — mobile robots, manipulators, self-driving — all run the same loop: perceive → estimate → plan → act, closed continuously under uncertainty. Perception leans on computer vision (Ch 16–18), estimation on probabilistic models (Ch 12), control and learning on RL & control (Ch 25), and SLAM back-ends on optimization (Ch 06). This chapter is the glue that wires them into something that moves.

💡 Remember this: autonomy is one closed loop — perceive → estimate → plan → act — run forever under uncertainty, and a robot’s reliability comes from how fast and how surely it closes that loop, not from any single algorithm in it.

41.1 The autonomy stack: perceive → estimate → plan → act

The intuition is a loop, not a pipeline. A robot does not perceive once and then drive forever; it re-senses every few milliseconds and corrects. Think of walking across a dark room: you take a step, feel where you are, adjust, step again. Robots formalize that instinct into four repeating stages.

Perceive turns raw sensor bytes (camera pixels, LiDAR returns, wheel encoder ticks, IMU accelerations) into features — edges, obstacles, landmarks. Estimate fuses those noisy features over time into a belief about the world state: where am I, where is everything else, how fast am I going. Plan chooses a sequence of actions to reach a goal given that belief. Act sends torques and velocities to actuators — and immediately feeds the resulting motion back into perception.

The loop below runs forever; watch the active stage travel around it — that traveling pulse is the robot’s heartbeat, and a real system runs several of these nested at different speeds.

flowchart LR
    W((Physical<br/>world)) -->|sensors| P[Perceive<br/>pixels → features]
    P --> E[Estimate<br/>features → belief]
    E --> PL[Plan<br/>belief → actions]
    PL --> A[Act<br/>actions → torques]
    A -->|motion| W
    E -.->|state feedback| A
    style W fill:#1a2a3a,stroke:#4a90d9,color:#fff

Each stage maps to later sections: perception leans on computer vision (and the CNN backbones behind detection and depth); estimation is §41.4–41.5; planning is §41.6–41.7; acting is §41.3. The loop runs at different rates — a low-level control loop at 1 kHz, a planner at 10 Hz, perception at 30 Hz — and a real system is really nested loops, a fast inner control loop wrapped by slow outer planning.

Tip

The single most useful robotics question is “what rate does this loop close at, and what is its latency?” A perfect plan delivered 200 ms late can be worse than a rough plan delivered now. Autonomy is a real-time discipline before it is an AI discipline.

Sensors and coordinate frames: the substrate everything sits on

Before any algorithm runs, the robot must answer two physical questions: what can I measure, and measured relative to what. Think of frames like the way you give directions — “two meters to my left” only means something once you know where “I” am facing. Robots make this explicit.

Each sensor has strengths and blind spots, and good autonomy fuses complementary ones rather than trusting any single sensor:

Sensor	Measures	Strength	Weakness
Camera	light / color	rich, cheap, dense	no native depth, fails in dark/glare
LiDAR	range via laser ToF	precise 3-D geometry	expensive, sparse, struggles in rain/fog
Radar	range + velocity (Doppler)	works in weather, direct speed	low resolution
IMU	acceleration + angular rate	fast (kHz), no external dependence	drifts badly when integrated
Wheel encoder	wheel rotation	simple odometry	slips, drifts (§41.2)
GPS/GNSS	global position	absolute, no drift	~meters, blocked indoors/urban canyons

A coordinate frame is a local set of axes attached to something — the world, the robot body, the camera, a gripper. A pose is always of one frame, expressed in another: “the camera frame in the world frame.” The bookkeeping that relates them is a rigid transform from linear algebra — a rotation \(R\) plus a translation \(t\) — composed as you hop from frame to frame. A point \(p\) known in the camera frame becomes, in the world frame,

\[p^{\text{world}} = R^{\text{world}}_{\text{cam}}\, p^{\text{cam}} + t^{\text{world}}_{\text{cam}}.\]

In words: to move a point from the camera’s viewpoint into world coordinates, rotate it to align the axes, then shift it by where the camera sits. Also written: in homogeneous form one \(4\times4\) matrix does both, \(\tilde p^{\text{world}} = T^{\text{world}}_{\text{cam}}\, \tilde p^{\text{cam}}\) where \(T = \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}\) and \(\tilde p = (x,y,z,1)\).

Why bother stuffing rotation and translation into one \(4\times4\) matrix? Because then chaining frames is just multiplication. Picture the frames as a family tree: world → base → arm → camera. To turn a point the camera sees into world coordinates, you walk up that tree, and each hop is one matrix. Multiply them in order and you get a single transform \(T^{\text{world}}_{\text{cam}} = T^{\text{world}}_{\text{base}}\, T^{\text{base}}_{\text{arm}}\, T^{\text{arm}}_{\text{cam}}\) that does the whole trip at once. Want the reverse trip (world → camera)? Just invert the matrix. That is the entire reason robotics code is full of \(4\times4\) matrices. ROS keeps this whole tree of frames updated live (the tf system), and getting one hop wrong — a flipped sign, a swapped pair — is the single most common robotics bug (§41.10).

41.2 Configuration space and kinematics

To plan motion you need coordinates for “what the robot is doing.” The configuration \(q\) is the minimal set of numbers that pins down every part of the robot. A point robot in a plane has \(q=(x,y)\); a planar mobile robot adds heading, \(q=(x,y,\theta)\); a 7-joint arm has \(q=(\theta_1,\dots,\theta_7)\). The set of all legal configurations is the configuration space (C-space), and its dimension is the robot’s number of degrees of freedom (DOF).

The key mental shift: plan in C-space, not in the world. A fat L-shaped robot squeezing through a doorway is hard to reason about as a shape sliding among shapes. But map each obstacle into C-space — inflating it by the robot’s footprint — and the robot becomes a single point navigating among “configuration-space obstacles.” Planning a point through free space is far easier (this is exactly what §41.7 exploits).

The doodle below shows that trick: a disk robot near a wall (left) becomes a point near a thickened wall (right), grown by the robot’s radius.

Kinematics relates joint/wheel coordinates to where things end up in the world, ignoring forces and mass (that’s dynamics). Two directions:

Forward kinematics (FK): given \(q\), where is the end-effector / robot? Always a well-defined function, \(x = f(q)\).
Inverse kinematics (IK): given a desired end-effector pose \(x\), what \(q\) achieves it? Often nonlinear, with zero, one, many, or infinitely many solutions.

The Jacobian: from joint speeds to tip speeds

There is a layer between FK and full dynamics that does the day-to-day work of controlling an arm: the Jacobian. Intuition first — FK tells you where the hand is for a given pose; the Jacobian tells you which way the hand drifts when you nudge each joint a little. It is the dictionary that translates joint-space velocities into workspace velocities, built from the partial derivatives of calculus.

\[\dot{x} = J(q)\,\dot{q}, \qquad J(q) = \frac{\partial f}{\partial q}.\]

In words: the end-effector’s velocity equals the Jacobian matrix times the joint velocities — each column of \(J\) is the tip motion caused by spinning one joint at unit speed. Also written: componentwise, \(\dot{x}_i = \sum_j \dfrac{\partial f_i}{\partial q_j}\,\dot{q}_j\).

Running it backwards gives a practical, iterative way to do inverse kinematics: to move the tip by a small desired step \(\Delta x\), solve \(\Delta q = J^{+}\Delta x\) (using the pseudo-inverse \(J^{+}\) for redundant arms), step, recompute \(J\), repeat. When \(J\) loses rank the arm is at a singularity — a stretched-straight elbow, say — where some tip directions need infinite joint speed; controllers must damp these or steer around them.

The differential-drive / unicycle model

Most wheeled robots (and, abstracted, most cars) are modeled as a unicycle: a body at \((x,y)\) pointing at angle \(\theta\), commanded by a forward speed \(v\) and turn rate \(\omega\). Its kinematics are

\[\dot{x} = v\cos\theta, \qquad \dot{y} = v\sin\theta, \qquad \dot{\theta} = \omega.\]

In words: the robot always moves in the direction it is facing at speed \(v\), and its heading turns at rate \(\omega\) — it can drive and spin, but never slide sideways. Also written: in vector form \(\dot{q} = \begin{bmatrix}\cos\theta & 0\\ \sin\theta & 0\\ 0 & 1\end{bmatrix}\begin{bmatrix}v\\ \omega\end{bmatrix}\), exposing the two control columns of this \(3\times2\) map.

A real differential-drive robot has two wheels of radius \(r\) a distance \(L\) apart, spinning at \(\omega_L, \omega_R\). Convert wheel speeds to body commands:

\[v = \frac{r(\omega_R + \omega_L)}{2}, \qquad \omega = \frac{r(\omega_R - \omega_L)}{L}.\]

In words: average the two wheel surface speeds to get forward speed; their difference, divided by the track width, gives turn rate — equal wheels drive straight, opposite wheels spin in place. Also written: \(\begin{bmatrix} v \\ \omega \end{bmatrix} = \dfrac{r}{2}\begin{bmatrix} 1 & 1 \\ 2/L & -2/L \end{bmatrix}\begin{bmatrix}\omega_R\\ \omega_L\end{bmatrix}\).

This model is nonholonomic: the robot cannot slide sideways. There are 3 configuration DOF \((x,y,\theta)\) but only 2 controls \((v,\omega)\). You can reach any pose, but not by any path — hence parallel parking takes a wiggle, not a sideways glide.

Worked example — dead reckoning. A differential-drive robot with \(r=0.05\) m wheels, \(L=0.3\) m, runs \(\omega_R=10\), \(\omega_L=8\) rad/s for \(0.5\) s starting at the origin facing \(\theta=0\).

\[v = \frac{0.05(10+8)}{2} = 0.45 \text{ m/s}, \qquad \omega = \frac{0.05(10-8)}{0.3} = 0.333 \text{ rad/s}.\]

After \(0.5\) s, \(\theta \approx 0.167\) rad. Integrating with the midpoint heading \(\approx 0.083\) rad: \(x \approx 0.45 \cdot 0.5 \cdot \cos(0.083) \approx 0.224\) m, \(y \approx 0.45 \cdot 0.5 \cdot \sin(0.083) \approx 0.0187\) m. Notice the error this accrues: integrating wheel speeds open-loop is odometry, and it drifts without bound — the motivation for §41.4.

41.3 Feedback control: PID, feedback linearization, flatness

Kinematics tells you how commands move the robot; control decides what commands to send so the robot tracks a target despite disturbances. The governing idea is feedback: measure the error between where you are and where you want to be, and drive it to zero.

PID — the workhorse

PID forms a command from three views of the error \(e(t) = \text{target} - \text{measured}\):

\[u(t) = K_p\, e(t) + K_i \int_0^t e(\tau)\,d\tau + K_d\, \frac{de(t)}{dt}.\]

In words: push in proportion to how far off you are now, plus a correction for error that has piled up over time, plus a brake that responds to how fast the error is changing. Also written: in discrete time, \(u_k = K_p e_k + K_i \sum_{j\le k} e_j\,\Delta t + K_d\,(e_k - e_{k-1})/\Delta t\) — exactly the code below.

Proportional (\(K_p\)): push harder the farther off you are. Alone, it leaves a steady-state offset and can oscillate.
Integral (\(K_i\)): accumulate past error to erase steady offset (e.g., gravity sag on an arm). Too much causes windup and overshoot.
Derivative (\(K_d\)): react to how fast error is changing — damping that anticipates and reduces overshoot. Sensitive to noisy measurements.

A picture of the three roles: \(P\) is a spring pulling toward target, \(D\) is a shock absorber damping the swing, \(I\) slowly removes any leftover gap.

# from-scratch PID controller, one step
class PID:
    def __init__(self, kp, ki, kd, dt):
        self.kp, self.ki, self.kd, self.dt = kp, ki, kd, dt
        self.int = 0.0; self.prev = 0.0
    def step(self, target, measured):
        e = target - measured
        self.int += e * self.dt                 # accumulate
        deriv = (e - self.prev) / self.dt        # rate of change
        self.prev = e
        return self.kp*e + self.ki*self.int + self.kd*deriv

pid = PID(kp=2.0, ki=0.5, kd=0.1, dt=0.1)
pos = 0.0
for _ in range(3):                               # drive position toward 1.0
    u = pid.step(1.0, pos)
    pos += 0.1 * u                               # crude plant: integrates command
    print(round(pos, 3))                         # 0.2, 0.46, 0.74... climbing to target

Warning

Integral windup is the classic PID failure: while the robot is saturated (motor maxed, arm blocked), the integral term keeps growing, then overshoots wildly when the obstruction clears. Always clamp the integral term, and consider disabling it during saturation.

A production controller adds the guards the from-scratch loop omits — output saturation, anti-windup clamping, and derivative-on-measurement to avoid kicks when the target jumps:

# idiomatic discrete PID with the guards real controllers need
import numpy as np

class PIDController:
    def __init__(self, kp, ki, kd, dt, u_min=-1.0, u_max=1.0):
        self.kp, self.ki, self.kd, self.dt = kp, ki, kd, dt
        self.u_min, self.u_max = u_min, u_max
        self.integral = 0.0
        self.prev_meas = None

    def step(self, target, measured):
        e = target - measured
        # derivative on measurement (not error) -> no spike on setpoint change
        d = 0.0 if self.prev_meas is None else -(measured - self.prev_meas) / self.dt
        self.prev_meas = measured

        u_unsat = self.kp * e + self.ki * self.integral + self.kd * d
        u = float(np.clip(u_unsat, self.u_min, self.u_max))
        # anti-windup: only integrate when not saturated (back-calculation lite)
        if u == u_unsat:
            self.integral += e * self.dt
        return u

ctrl = PIDController(kp=2.0, ki=0.5, kd=0.1, dt=0.1)
print(round(ctrl.step(1.0, 0.0), 3))   # first command toward target

When PID isn’t enough: feedback linearization and flatness

PID treats the system as roughly linear near the setpoint. Real robots are nonlinear — the unicycle’s \(\cos\theta\), an arm’s coupled inertias. Two ideas tame this:

Feedback linearization cancels the nonlinearity with the control law itself. If the dynamics are \(\dot{x} = f(x) + g(x)u\), choose \(u = g(x)^{-1}(-f(x) + v)\), leaving the clean linear system \(\dot{x} = v\), which you then control with PID or LQR — itself a small optimization over feedback gains. It is exact only if your model \(f, g\) is exact — model error leaks straight through.

In words: measure the messy nonlinear part of the dynamics and deliberately push back exactly enough to erase it, so what’s left behaves like a simple straight-line system you already know how to control. Also written: substituting the chosen \(u\) into \(\dot x = f(x)+g(x)u\) gives \(\dot x = f(x) - f(x) + v = v\).

Differential flatness is a structural gift some systems have: all states and inputs can be written as functions of a few flat outputs and their derivatives. A quadrotor is flat in \((x,y,z,\text{yaw})\) — pick a smooth trajectory for those four and every motor command is determined algebraically, no integration of dynamics needed. This is why aggressive drone-trajectory planners work: plan a smooth curve in flat space, recover controls in closed form.

For the optimal-control treatment of the linearized system — LQR, which picks the feedback gains that minimize a quadratic cost on error and effort — see reinforcement learning & control. PID is the hand-tuned cousin; LQR is its principled, model-based sibling.

Model-predictive control (MPC): planning a few steps ahead

PID reacts to the present; MPC looks ahead. The everyday analogy is driving: you don’t steer based only on where the car is now, you glance down the road, imagine the next few seconds, and pick the steering that keeps that whole imagined path on track — then you do it again a moment later with a fresh glance.

Formally, every control cycle MPC solves a short-horizon optimization, applies only the first command, then re-solves at the next step (receding horizon):

\[\min_{u_0,\dots,u_{N-1}} \;\sum_{k=0}^{N-1}\Big[ (x_k - x_k^{\text{ref}})^\top Q (x_k - x_k^{\text{ref}}) + u_k^\top R\, u_k \Big] \quad \text{s.t. } x_{k+1}=f(x_k,u_k),\; u_k\in\mathcal{U}.\]

In words: over the next \(N\) steps, pick the commands that keep the predicted state close to the reference while spending little control effort, respecting the physics and the actuator limits — then commit only to the first command and repeat. Also written: the per-step penalty \(\|x_k-x_k^{\text{ref}}\|_Q^2 + \|u_k\|_R^2\) uses the weighted-norm shorthand \(\|z\|_M^2 = z^\top M z\).

The animation makes the receding horizon idea concrete: at each tick the planner imagines a short path ahead (the faint horizon), commits only the first step, then slides the whole window forward and re-imagines.

MPC’s selling point is that it handles constraints natively — speed caps, lane boundaries, joint limits, obstacle clearance go straight into the optimization, which PID cannot do. Its cost is computational: an optimization problem solved every cycle. It is the workhorse of modern self-driving steering, legged-robot balance, and process control. It is also the fast replanning loop referenced in §41.7.

41.4 State estimation and the recursive Bayes filter

Odometry drifts (§41.2) and every sensor is noisy. State estimation answers “what is the state, given everything I’ve sensed?” not as a point but as a probability distribution — the belief \(bel(x_t)\). Maintaining a distribution, not a guess, is what lets a robot know how sure it is and fuse a fresh measurement with the right weight.

The recursive Bayes filter is the master algorithm behind Kalman and particle filters alike. Each cycle has two steps:

Predict (motion update): push the belief through the motion model. The robot acted, so its uncertainty grows.

\[\overline{bel}(x_t) = \int p(x_t \mid u_t, x_{t-1})\, bel(x_{t-1})\, dx_{t-1}\]

In words: to predict where you are now, take every place you might have been, ask where each would land given the action you just took, and add up those possibilities weighted by how likely each starting place was. Also written: as an expectation, \(\overline{bel}(x_t) = \mathbb{E}_{x_{t-1}\sim bel}\!\left[\,p(x_t \mid u_t, x_{t-1})\,\right]\).

Update (measurement correction): multiply by the likelihood of the observation, then renormalize. A measurement arrived, so uncertainty shrinks.

\[bel(x_t) = \eta\, p(z_t \mid x_t)\, \overline{bel}(x_t)\]

In words: reweight each candidate state by how well it explains the measurement you just got, then rescale so the weights sum to one. Also written: \(bel(x_t) \propto p(z_t \mid x_t)\,\overline{bel}(x_t)\), with \(\eta = 1/\!\int p(z_t\mid x_t)\,\overline{bel}(x_t)\,dx_t\).

The whole filter is one breathing rhythm: acting widens the belief, sensing narrows it. Watch the bell curve below pulse wide (predict) then snap sharp (update), forever.

flowchart LR
    B["bel(x_{t-1})"] -->|"predict: apply motion u_t"| BP["bel-bar(x_t)<br/>wider, less certain"]
    BP -->|"update: weigh by measurement z_t"| BN["bel(x_t)<br/>narrower, more certain"]
    BN -.->|next step| B

Everything downstream is this loop with a specific representation of the belief: a Gaussian (Kalman), a Gaussian after local linearization (EKF/UKF), or a cloud of samples (particle filter). The two-step rhythm — act widens, sense narrows — never changes.

41.5 Kalman, EKF/UKF, and particle filters

The Kalman filter

When motion and measurement are linear and noise is Gaussian, the Bayes filter has a closed form: the belief stays Gaussian forever, summarized by a mean \(\hat{x}\) and covariance \(P\). The Kalman filter is the optimal estimator in that setting, and it is just the two Bayes steps written for Gaussians.

The magic number is the Kalman gain \(K\) — how much to trust the new measurement versus the prediction. High measurement noise → small \(K\) → mostly believe the prediction. High process noise → large \(K\) → mostly believe the sensor.

\[K = \frac{P}{P+R}, \qquad \hat{x} \leftarrow \hat{x} + K(z - \hat{x}), \qquad P \leftarrow (1-K)P.\]

In words: the gain is the prediction’s share of the total uncertainty; nudge the estimate toward the measurement by that fraction of the surprise, and shrink the uncertainty accordingly. Also written: in full matrix form \(K = P H^\top (H P H^\top + R)^{-1}\), with \(\hat{x} \leftarrow \hat{x} + K(z - H\hat{x})\) and \(P \leftarrow (I - KH)P\); the scalar case above is \(H=1\).

The picture below is the whole filter in one frame: a fuzzy prediction (blue) and a fuzzy measurement (amber) blend into a sharper fused belief (green) sitting between them — narrower than either input.

Worked 1-D example. Estimate a robot’s position. We believe \(\hat{x}=0\) m with variance \(P=4\) (std 2 m). The robot does not move (\(\hat{x}\) prediction stays 0), but motion adds process noise \(Q=1\), so after predict \(P = 4+1 = 5\). A GPS reads \(z=3\) m with measurement variance \(R=2\).

\[K = \frac{P}{P+R} = \frac{5}{5+2} = 0.714\] \[\hat{x} \leftarrow \hat{x} + K(z - \hat{x}) = 0 + 0.714(3-0) = 2.14 \text{ m}\] \[P \leftarrow (1-K)P = (1-0.714)\cdot 5 = 1.43\]

The estimate moved 71% of the way to the GPS reading, and variance dropped from 5 to 1.43 — fusing prior and measurement left us more certain than either alone. That variance reduction is the whole point.

# 1-D Kalman update, matches the worked numbers
def kalman_1d(x, P, z, Q, R, dx=0.0):
    x = x + dx;  P = P + Q          # predict
    K = P / (P + R)                 # gain
    x = x + K * (z - x)             # update mean
    P = (1 - K) * P                 # update covariance
    return x, P

x, P = kalman_1d(0.0, 4.0, 3.0, Q=1.0, R=2.0)
assert abs(x - 2.142) < 1e-2 and abs(P - 1.428) < 1e-2   # check
print(round(x, 2), round(P, 2))     # 2.14 1.43

The multivariate filter is the same two steps in matrix form. In practice you reach for a library rather than re-deriving the algebra:

# multivariate Kalman filter with FilterPy: constant-velocity tracker
import numpy as np
from filterpy.kalman import KalmanFilter

kf = KalmanFilter(dim_x=2, dim_z=1)        # state = [position, velocity]
dt = 1.0
kf.F = np.array([[1, dt], [0, 1]])         # motion: x += v*dt
kf.H = np.array([[1.0, 0.0]])              # we measure position only
kf.P *= 4.0                                # initial covariance
kf.R = np.array([[2.0]])                   # measurement noise
kf.Q = np.array([[1.0, 0.0], [0.0, 1.0]])  # process noise

for z in [3.0, 3.9, 5.2, 6.1]:             # noisy position stream
    kf.predict()
    kf.update(z)
    print(np.round(kf.x.ravel(), 2))       # fused position + inferred velocity

EKF and UKF — coping with nonlinearity

Real motion (\(\cos\theta\)) and sensors (range = \(\sqrt{\Delta x^2+\Delta y^2}\)) are nonlinear, so the belief is no longer exactly Gaussian. Two pragmatic fixes keep the Kalman machinery:

Extended Kalman Filter (EKF): linearize the models with a first-order Taylor expansion (Jacobians) around the current estimate, then apply the linear Kalman equations. Cheap and dominant in practice, but the linearization can diverge if the function curves sharply or uncertainty is large.
Unscented Kalman Filter (UKF): skip Jacobians. Push a small set of deterministically chosen sigma points through the true nonlinear function and refit a Gaussian to where they land. More accurate for strong nonlinearity, no derivatives to derive, similar cost.

The particle filter

When the belief is genuinely non-Gaussian — multi-modal (“I’m at one of three identical doorways”), or arbitrarily shaped — represent it with a swarm of weighted samples (particles), each a hypothesis about the state. The loop: move every particle by the motion model (with noise), weight each by how well it explains the measurement, then resample — duplicate heavy particles, kill light ones. Survivors cluster on the truth.

flowchart LR
    S[particles<br/>spread out] -->|move + noise| M[scattered<br/>hypotheses]
    M -->|weight by p of z| Wt[some heavy,<br/>some light]
    Wt -->|resample| C[cluster on<br/>likely states]
    C -.->|next step| S

This is Monte Carlo localization, the standard way a robot finds itself on a known map from a uniform prior. It handles arbitrary distributions and nonlinearity, at the cost of needing many particles in high dimensions (the curse of dimensionality).

A minimal NumPy particle filter makes the three steps concrete:

# 1-D particle filter: localize from a noisy distance-to-landmark sensor
import numpy as np
rng = np.random.default_rng(0)

N, landmark = 2000, 10.0
px = rng.uniform(0, 20, N)                 # particles: spread over the corridor
w  = np.ones(N) / N

def step(px, w, move, z, mn=0.3, sn=1.0):
    px = px + move + rng.normal(0, mn, len(px))   # predict (move + noise)
    pred = np.abs(landmark - px)                  # expected measurement
    w = w * np.exp(-0.5 * ((z - pred) / sn) ** 2) # weight by likelihood
    w += 1e-300; w /= w.sum()                     # normalize
    idx = rng.choice(len(px), len(px), p=w)       # resample
    return px[idx], np.ones(len(px)) / len(px)

for z in [6.0, 5.1, 4.0, 3.2]:             # robot approaching the landmark
    px, w = step(px, w, move=1.0, z=z)
print(round(px.mean(), 2))                 # converges near true position ~6

Filter	Belief shape	Nonlinear?	Cost	Use when
Kalman	single Gaussian	linear only	cheap	linear-Gaussian system
EKF	single Gaussian	yes (Jacobian)	cheap	mild nonlinearity, fast loop
UKF	single Gaussian	yes (sigma points)	moderate	stronger nonlinearity, no derivatives
Particle	arbitrary samples	yes	expensive	multi-modal, global localization

Tip

Rule of thumb for picking a filter: reach for the plain Kalman filter first; only climb to EKF/UKF when a model is genuinely nonlinear, and only to a particle filter when the belief is genuinely multi-modal (you cannot summarize it with one mean and covariance). Particle filters are powerful but pay an exponential price in dimension — a global-localization tool, not a default.

41.6 SLAM: simultaneous localization and mapping

There is a chicken-and-egg trap in §41.5: to know where you are you need a map, but to build a map you need to know where you are. SLAM solves both at once — estimate the robot’s trajectory and the map from the same stream of sensor data.

The intuition is loop closure. A robot drives a long loop; odometry drift means the map’s start and end don’t line up even though the robot recognizes it has returned to a known place. SLAM detects “I’ve been here before” and snaps the whole trajectory taut to make the ends meet, redistributing the accumulated error backward across the path.

Modern SLAM is posed as a graph optimization (pose-graph / factor-graph): nodes are robot poses and landmarks, edges are noisy constraints (odometry between consecutive poses, observations of landmarks, loop closures). Solving means finding the configuration of all nodes that best satisfies all constraints — a big nonlinear least-squares problem, optimized with the techniques of optimization. Front-ends differ by sensor (visual SLAM from cameras, LiDAR SLAM from point clouds), but the back-end is this same graph squeezed to consistency.

The objective the back-end minimizes makes “best satisfies all constraints” precise:

\[\hat{X} = \arg\min_{X} \sum_{(i,j)\in\mathcal{E}} \big\lVert z_{ij} - h(x_i, x_j) \big\rVert^2_{\Omega_{ij}}\]

In words: choose the poses and landmarks \(X\) that make every predicted measurement match its observed value as closely as possible, weighting each constraint by how trustworthy that sensor reading is. Also written: with \(e_{ij} = z_{ij} - h(x_i,x_j)\), the weighted norm expands to \(\sum_{ij} e_{ij}^\top\,\Omega_{ij}\,e_{ij}\), where \(\Omega_{ij}\) is the information (inverse-covariance) matrix of edge \((i,j)\).

Tip

The reason loop closure matters so much: without it, error grows roughly with distance traveled. With a single good loop closure, error is bounded by the loop — one recognized landmark can correct hundreds of meters of drift. Recognizing places (place recognition) is therefore as critical to SLAM as the geometry.

41.7 Motion and path planning

Given a map and a goal, planning finds a collision-free path through C-space (§41.2). Two families dominate, split by how they handle continuous space.

Search on a graph: Dijkstra and A*

Discretize C-space into a grid (or roadmap) and run a graph search on it. Dijkstra expands outward from the start by cheapest-cost-so-far \(g(n)\), guaranteeing the shortest path but exploring blindly in all directions. A* adds a heuristic \(h(n)\) — an optimistic estimate of remaining cost (e.g., straight-line distance to goal) — and expands by \(f(n) = g(n) + h(n)\), steering the search toward the goal.

\[f(n) = g(n) + h(n).\]

In words: rank each candidate node by the cost already paid to reach it plus an optimistic guess of the cost still to go, and always expand the most promising one next. Also written: Dijkstra is the special case \(h(n)=0\) (no guess, expand by cost-so-far alone).

A* returns the optimal path whenever \(h\) is admissible (never overestimates true remaining cost). The straight-line (Euclidean) distance is the canonical admissible heuristic for spatial grids: a real path can only be longer than the crow-flies line. A better (larger but still admissible) heuristic explores fewer nodes.

The contrast below is the whole reason A* exists: Dijkstra (left) fans out in a fat blind disk; A*’s heuristic (right) pulls the explored set into a narrow corridor aimed at the goal — same optimal path, a fraction of the work.

Tiny worked example. Grid from S to G, \(h\) = Manhattan distance, unit step cost:

S . . #          g(S)=0
. # . .          at a node 2 right of S: g=2, h=3  → f=5
. # . G          a detour node: g=4, h=5          → f=9  (expanded later)

A* pops the lowest-\(f\) node each step. The node with \(f=5\) is explored before the \(f=9\) detour, so A* drives toward G instead of fanning out like Dijkstra — same optimal path, far fewer expansions.

Worked example — heuristic saves work. On an open \(100 \times 100\) grid, a goal \(60\) cells straight ahead, no obstacles. Dijkstra has no sense of direction, so it expands cells in a growing disk around the start: reaching distance \(60\) touches on the order of \(\pi \cdot 60^2 \approx 11{,}000\) cells. A* with the (admissible) straight-line heuristic only expands cells along the corridor toward the goal — on an obstacle-free grid that is essentially the \(\sim\!60\) cells of the path itself plus a thin fringe, a few hundred at most. Same optimal length, roughly 20–50× fewer expansions. Add obstacles and the gap shrinks (the heuristic gets fooled by walls), but the lesson holds: a good heuristic is what makes search tractable at scale.

Sampling-based planning: PRM, RRT, RRT*

Grids explode in high dimensions — a 7-DOF arm has no usable grid. Sampling-based planners dodge this by randomly sampling configurations and connecting only the valid ones, never enumerating the whole space.

PRM (Probabilistic Roadmap): sample many random configurations, connect nearby collision-free pairs into a graph, then run A* on it. Built once, queried many times — ideal for a fixed environment.
RRT (Rapidly-exploring Random Tree): grow a tree from the start. Repeatedly sample a random point, find the nearest tree node, and extend a short step toward the sample. The tree rapidly fills space, biased toward unexplored regions. Single-query and fast, but the path is jagged and not optimal.
RRT*: RRT with rewiring — when adding a node, reconnect nearby nodes through it if that lowers their cost. As samples → ∞, the path converges to optimal (asymptotically optimal), at extra computation per node.

The SVG shows an RRT tree exploring from the start (green), reaching toward the goal (red) while avoiding an obstacle:

Planner	Space	Optimal?	Query	Best for
Dijkstra	discrete grid	yes	single	small grids, no goal heuristic
A*	discrete grid	yes (admissible \(h\))	single	grids with a goal direction
PRM	continuous	asymptotically	multi	fixed maps, repeated queries
RRT	continuous	no	single	fast feasible path, high-DOF
RRT*	continuous	asymptotically	single	high-DOF when path quality matters

A planned geometric path is rarely the final word. Two refinements usually follow. Trajectory optimization (e.g., CHOMP, TrajOpt) takes a rough path and bends it into a smooth, dynamically feasible trajectory by minimizing a cost that trades off smoothness against obstacle clearance — useful when the sampled path is jagged. And because the world moves, planning is wrapped in a fast replanning loop: a local planner (or a model-predictive controller, §41.3’s cousin) re-solves a short-horizon plan every control cycle as new obstacles appear, while the global plan updates less often. Plan globally, react locally.

Here is A* end-to-end on a small grid, the algorithm behind the worked example:

# A* on a 4-connected grid; returns the optimal path or None
import heapq

def astar(grid, start, goal):                 # grid: set of blocked (r,c) cells
    def h(n): return abs(n[0]-goal[0]) + abs(n[1]-goal[1])   # Manhattan, admissible
    open_pq = [(h(start), 0, start, None)]    # (f, g, node, parent)
    came, g_best = {}, {start: 0}
    while open_pq:
        f, g, n, parent = heapq.heappop(open_pq)
        if n in came: continue
        came[n] = parent
        if n == goal:
            path = [n]
            while came[path[-1]] is not None: path.append(came[path[-1]])
            return path[::-1]
        for dr, dc in [(1,0),(-1,0),(0,1),(0,-1)]:
            m = (n[0]+dr, n[1]+dc)
            if m in grid: continue            # blocked
            ng = g + 1
            if ng < g_best.get(m, 1e9):
                g_best[m] = ng
                heapq.heappush(open_pq, (ng + h(m), ng, m, n))
    return None

blocked = {(1,1),(2,1),(0,3)}                  # matches the worked-example walls
print(astar(blocked, (0,0), (2,3)))            # optimal path around the wall

41.8 Grasping, manipulation, and the contact problem

Mobile robots avoid contact; manipulators depend on it. Grasping is where geometry meets physics: the question is not just “can the fingers touch the object” but “will the object stay put when I lift, twist, and the world pushes back.”

The central concept is force closure: a grasp achieves force closure if the contacts can resist any external wrench (force + torque) by applying suitable contact forces. Intuitively, the fingers can squeeze in enough independent directions that no push or twist can break the hold. A frictionless point contact resists only along its normal; add friction and each contact resists within a friction cone (forces angled up to \(\arctan\mu\) from the normal). Enough cones spanning all directions → force closure. A practical rule of thumb: in the plane, two hard-finger contacts with friction on opposite faces can suffice; in 3-D you generally need at least three or four well-placed contacts.

The friction-cone condition that each contact must obey is worth stating crisply:

\[\lVert f_t \rVert \le \mu\, f_n, \qquad f_n \ge 0.\]

In words: a contact can only push (never pull), and the sideways friction it provides can be at most \(\mu\) times how hard it presses — exceed that and the finger slips. Also written: equivalently the contact force vector must lie inside the cone \(\arctan(\lVert f_t\rVert / f_n) \le \arctan\mu\), i.e. within angle \(\arctan\mu\) of the surface normal.

A second axis is prehensile vs. non-prehensile manipulation. Prehensile means grasping — the hand fully constrains the object (a pick-and-place gripper). Non-prehensile means controlling an object without caging it: pushing, sliding, throwing, tilting, toppling. Sliding a heavy box across the floor, or a hockey-style flick, are non-prehensile — cheaper and sometimes the only option for objects too big or oddly shaped to grasp, but harder to control because the object can slip away. Much modern manipulation research blends the two: nudge an object to a graspable pose (non-prehensile), then pick it up (prehensile).

Worked example — will it slip? A two-finger gripper holds a \(0.5\) kg part (\(mg \approx 4.9\) N) by pinching it, each finger pressing with normal force \(f_n = 8\) N against a rubber-on-plastic contact, \(\mu = 0.4\). Each finger can supply up to \(\mu f_n = 0.4 \times 8 = 3.2\) N of vertical friction, and two fingers give \(6.4\) N — comfortably above the \(4.9\) N weight, so the part holds with a safety factor of about \(1.3\). Drop \(\mu\) to \(0.25\) (a greasy part) and the budget falls to \(2 \times 0.25 \times 8 = 4.0\) N \(< 4.9\) N: the part slides. This is why grasp planners squeeze harder than theory demands and prefer high-friction contacts.

Warning

Force closure is a geometric guarantee that assumes your friction coefficient, contact locations, and object model are correct. Real grasps fail because \(\mu\) is lower than assumed (a greasy part), the object is heavier or its mass off-center, or the fingers land a millimeter off. Plan grasps with margin, and prefer grasps that are robust to small pose errors over grasps that are theoretically optimal but brittle.

41.9 Learning for robotics: imitation, sim-to-real, RL

Hand-engineering every behavior does not scale to dexterous, contact-rich, or visually complex tasks. Learning fills the gap, in three complementary modes.

Imitation learning (IL) teaches from demonstrations. The simplest form, behavioral cloning, is plain supervised learning: record an expert’s (observation → action) pairs and fit a policy. Its notorious failure is covariate shift / compounding error — the policy makes a small mistake, drifts to states the expert never visited, and has no idea what to do, so errors snowball. Fixes include DAgger (query the expert on the policy’s own visited states) and richer demonstration coverage. The deeper RL/IL treatment lives in reinforcement learning; here the point is that demonstrations are often the cheapest way to specify a hard-to-reward behavior.

A behavioral-cloning policy is just a supervised regressor from observations to actions — a few lines in PyTorch:

# behavioral cloning: fit a policy net to (observation -> action) demos
import torch, torch.nn as nn

policy = nn.Sequential(                    # obs_dim=8 (e.g. joint state), act_dim=2
    nn.Linear(8, 64), nn.ReLU(),
    nn.Linear(64, 64), nn.ReLU(),
    nn.Linear(64, 2))
opt = torch.optim.Adam(policy.parameters(), lr=1e-3)
loss_fn = nn.MSELoss()                     # regress continuous actions

# obs, act: tensors of expert demonstrations
for epoch in range(100):
    opt.zero_grad()
    loss = loss_fn(policy(obs), act)       # match expert action at each state
    loss.backward(); opt.step()
# at deploy time: action = policy(current_obs) -- but beware covariate shift

Reinforcement learning for control learns a policy by trial and error against a reward (Chapter 25’s algorithms). In robotics, raw RL on hardware is usually impractical — millions of trials would destroy the robot — so it is run in simulation, which is fast, safe, and resettable.

Sim-to-real is then the central challenge: a policy trained in a perfect simulator meets the reality gap — unmodeled friction, sensor latency, slightly wrong masses — and fails on the real robot. The dominant remedy is domain randomization: during training, randomize physics and rendering parameters (friction, masses, lighting, textures, latencies) so widely that the real world looks like just another sample. A policy robust across thousands of simulated variations tends to transfer, because reality falls inside the training distribution.

flowchart LR
    subgraph Sim["Simulation (cheap, fast)"]
        DR[Domain randomization:<br/>vary mass, friction,<br/>lighting, latency] --> RL[Train policy by RL]
    end
    RL -->|deploy weights| Real[Real robot]
    Real -.->|reality gap:<br/>collect failures| DR
    style Sim fill:#1a2a3a,stroke:#4a90d9,color:#fff

A newer current worth naming: end-to-end and foundation-model policies. Instead of the hand-built perceive→estimate→plan→act stack, a single large network maps pixels (and language instructions) straight to actions — vision-language-action models — a multimodal policy trained on huge multi-robot demonstration datasets, aiming for one policy that generalizes across tasks and embodiments. They trade the interpretability and verifiability of the classical stack for breadth; in practice the two coexist, with a safety layer (§41.10) underneath either. Frontier directions cover this in depth.

The unifying theme: learning shifts effort from coding behaviors to specifying objectives and providing data/experience — powerful, but it inherits all of ML’s data and distribution-shift pitfalls, now with a robot that can physically break.

41.10 Safety, human interaction, and calibration realities

A robot exerts force in a world containing people. Three practical realities separate a demo from a deployable system.

Safety is layered, not a single check. A common pattern: a fast, simple, certifiable safety layer runs underneath the smart-but-fallible autonomy stack — emergency-stop on collision-imminent, speed and force limits, a watchdog that halts if the control loop misses its deadline. The principle is that the simple layer is the one you can verify and trust; the learned planner proposes, the safety monitor disposes. Formal tools like control-barrier functions and reachability analysis make “the robot cannot enter this unsafe set” a provable property rather than a hope.

The mental model is a referee sitting under the planner: the clever stack suggests a move, and the simple verified monitor either passes it through or vetoes it down to a safe fallback.

Human-robot interaction (HRI) adds that humans are part of the environment and they are reactive and hard to predict. A robot sharing a space must estimate human intent, signal its own (a self-driving car nudging forward to indicate it will go), and stay legible — moving so a person can read what it will do next. Predictability often beats raw efficiency: a slightly slower motion a human can anticipate is safer than a fast optimal one that surprises them.

Calibration is the unglamorous foundation everything rests on. Models assume the camera is exactly where the CAD says, the wheels are exactly \(r\), the IMU reads zero at rest — and reality disagrees. Sensors need intrinsic calibration (a camera’s lens distortion and focal length) and extrinsic calibration (the rigid transform between the camera and the arm base — the “hand-eye” calibration). Wheels wear, gears have backlash, an IMU has a bias that drifts with temperature, and a PWM servo driver often runs a few percent off its nominal frequency. The best estimator and planner in the world produce garbage if fed an uncalibrated transform.

Warning

The most common “the algorithm is broken” bug in real robotics is actually a calibration or units error — a transform with the wrong sign, degrees fed where radians were expected, a frame mounted upside-down, or a sensor timestamp that is 50 ms stale. Before blaming the filter or planner, verify frames, units, signs, and time synchronization. Leave a calibration knob exposed for every physical constant; the real world will need it tuned.

41.11 The ROS ecosystem and middleware

Almost no real robot is one monolithic program. The everyday analogy is a newsroom: many specialists (a camera reporter, a planning editor, a motor desk) work in parallel, and a wire service routes each story to whoever subscribes to that topic. Robotics middleware — overwhelmingly ROS 2 (Robot Operating System) — is that wire service.

The core abstraction is a graph of independent nodes that communicate without knowing about each other:

Topics — anonymous publish/subscribe streams for continuous data (a camera node publishes images on /camera/image; a perception node subscribes). Many-to-many, fire-and-forget.
Services — request/response for occasional queries (“what is the current map?”).
Actions — long-running goals with feedback and the ability to cancel (“navigate to the kitchen,” streaming progress).
Transforms (tf2) — the live tree of coordinate frames from §41.1, so any node can ask “where is the gripper in the map frame, right now?”

flowchart LR
    CAM[camera node] -->|/image| PER[perception node]
    LID[lidar node] -->|/scan| SLAM[slam node]
    PER -->|/obstacles| NAV[planner node]
    SLAM -->|/map + tf| NAV
    NAV -->|/cmd_vel| BASE[motor driver node]
    style NAV fill:#6366f1,stroke:#6366f1,color:#fff

This decoupling is why robotics is composable: swap the LiDAR SLAM node for a visual one without touching the planner, record every topic to a log (rosbag) and replay it to debug offline, or run nodes across several machines. ROS 2 rides on DDS for real-time-capable, configurable-reliability transport. A minimal publisher shows the shape of node code:

# ROS 2 node publishing a velocity command (rclpy)
import rclpy
from rclpy.node import Node
from geometry_msgs.msg import Twist

class Driver(Node):
    def __init__(self):
        super().__init__("driver")
        self.pub = self.create_publisher(Twist, "/cmd_vel", 10)
        self.create_timer(0.1, self.tick)        # 10 Hz control loop

    def tick(self):
        msg = Twist()
        msg.linear.x = 0.2                       # v = 0.2 m/s forward
        msg.angular.z = 0.5                      # omega = 0.5 rad/s turn
        self.pub.publish(msg)

rclpy.init(); rclpy.spin(Driver())

The takeaway: the algorithms in this chapter rarely ship as standalone scripts — they ship as nodes wired into a graph, and “which node publishes what, at what rate, in which frame” is the architecture you actually debug.

Tip

When a ROS system “freezes” or behaves erratically, suspect the plumbing before the algorithm: a topic nobody publishes (a subscriber waiting forever), a queue depth so small that messages are dropped under load, a node publishing in the wrong frame, or two nodes fighting over /cmd_vel. Tools like ros2 topic hz, ros2 topic echo, and rqt_graph tell you what is actually flowing — check that the graph matches the one in your head before touching any math.

41.1 — Quick reference

Term / formula	Meaning in one line	When / why it matters
Perceive → estimate → plan → act	The four-stage autonomy loop, run continuously	The master pattern of every autonomous system (§41.1)
Rigid transform \(T=\begin{bmatrix}R & t\\0&1\end{bmatrix}\)	Pose of one frame in another; chains by matrix product	Relate camera ↔︎ body ↔︎ world; a wrong hop is the classic bug
Configuration \(q\) / C-space	Minimal coordinates of the whole robot; obstacles inflated by footprint	Plan as a point among grown obstacles, not a shape (§41.2)
Jacobian \(\dot x = J(q)\dot q\)	Maps joint speeds to tip speeds; columns = per-joint tip motion	Iterative IK via \(\Delta q = J^{+}\Delta x\); watch singularities
Unicycle \(\dot x=v\cos\theta,\ \dot y=v\sin\theta,\ \dot\theta=\omega\)	Wheeled-robot kinematics; can drive/turn, not slide	Nonholonomic: fewer controls than DOF, so not every path works
PID \(u=K_p e+K_i\!\int e+K_d\dot e\)	Feedback from present, accumulated, and changing error	The workhorse tracker; clamp integral to avoid windup (§41.3)
MPC (receding horizon)	Optimize \(N\) steps, apply the first, re-solve	Handles constraints (limits, clearance) PID cannot
Bayes filter (predict / update)	Belief widens on motion, narrows on measurement	The recursion behind every filter below (§41.4)
Kalman gain \(K=P/(P+R)\)	Fraction of the surprise to trust the measurement	Optimal fusion in the linear-Gaussian case
EKF / UKF	Linearize (Jacobian) or sigma-points for nonlinear models	Keeps Kalman machinery when motion/sensors curve
Particle filter	Belief as weighted samples: move, weight, resample	Multi-modal / global localization; costly in high dimensions
SLAM + loop closure	Estimate trajectory and map jointly; snap returns taut	One loop closure bounds otherwise-unbounded drift (§41.6)
A* \(f(n)=g(n)+h(n)\)	Cost-so-far plus admissible guess-to-go	Optimal grid path with far fewer expansions than Dijkstra
RRT / RRT*	Sample-and-extend tree; RRT* rewires toward optimal	High-DOF planning where grids explode (§41.7)
Force closure / friction cone \(\lVert f_t\rVert\le\mu f_n\)	Contacts span all disturbance directions; push-only, bounded friction	Whether a grasp holds under twist and load (§41.8)
Domain randomization	Vary sim physics/rendering so reality is “just another sample”	The standard sim-to-real transfer remedy (§41.9)
Safety layer	Simple, verified monitor vetoes the smart stack	Trust the verifiable layer; planner proposes, monitor disposes (§41.10)
ROS 2 node / topic / tf2	Pub-sub graph of processes sharing a live frame tree	The architecture you actually debug (§41.11)

41.2 — Key takeaways

Autonomy is a closed loop — perceive → estimate → plan → act — run continuously and in nested loops at different rates; latency and loop-closure rate matter as much as algorithmic quality.
Sensors are complementary (camera, LiDAR, radar, IMU, GPS), and everything is expressed in coordinate frames chained by rigid transforms; a wrong transform is the classic robotics bug.
Plan in configuration space, where the robot is a point and obstacles are inflated by its footprint. Wheeled robots are nonholonomic: fewer controls than DOF, so not every path is achievable. The Jacobian links joint speeds to tip speeds and drives iterative IK.
Kinematics maps commands to motion; control (PID, feedback linearization, flatness, MPC, and LQR from Ch 25) drives tracking error to zero despite disturbances. Watch for integral windup; MPC adds look-ahead and native constraints.
State estimation maintains a belief, not a guess, via the recursive Bayes filter (predict widens, update narrows). The Kalman filter is its linear-Gaussian closed form; EKF/UKF handle nonlinearity; particle filters handle arbitrary, multi-modal beliefs.
SLAM estimates trajectory and map together; loop closure bounds otherwise-unbounded odometry drift, and the back-end is a weighted nonlinear least-squares graph optimization.
Search planners (Dijkstra, A* with an admissible heuristic) suit low-dimensional grids; sampling planners (PRM, RRT, RRT) scale to high-DOF arms, with RRT asymptotically optimal; trajectory optimization smooths the result and a fast replanning loop reacts to a changing world.
Grasping hinges on force closure within friction cones (a contact can only push, with friction \(\le \mu f_n\)); manipulation spans prehensile (grasping) and non-prehensile (pushing, sliding) contact.
Learning (imitation, RL in sim, sim-to-real with domain randomization, and emerging end-to-end vision-language-action policies) trades behavior-coding for objective-and-data specification, inheriting distribution-shift risks — now with hardware that can break.
Safety is a simple verifiable layer under the smart stack; HRI demands legible, predictable motion; and calibration/units/frames/time-sync errors masquerade as algorithm bugs more often than not.
Real systems are built as a graph of ROS 2 nodes communicating over topics, services, and actions — the algorithms ship as composable nodes, not standalone scripts.

41.3 — See also

Chapter 06 — Optimization: nonlinear least-squares behind SLAM back-ends and bundle adjustment.
Chapters 16–18 — Computer vision: the perception front-end (detection, depth, visual features) feeding estimation.
Chapter 25 — Reinforcement learning & control: LQR, policy learning, and the full imitation/RL treatment pointed to here.
Chapter 12 — Probabilistic models: Bayes filtering, Gaussians, and graphical-model foundations under Kalman and particle filters.
Chapter 39 — Frontier & embodied AI: large-scale robot learning, foundation models for control, and embodied agents.

↪ The thread continues → Chapter 42 · 📐 Learning Theory

We’ve built and shipped a great deal on intuition; now theory asks why learning works at all — the generalization guarantees beneath the magic.

📖 All chapters | ← 40 · 🔗 Graph Machine Learning | 42 · 📐 Learning Theory →