It was a green light. I could see it from the sidewalk. The Waymo stopped anyway — sitting maybe four car-lengths short of the intersection — and the car behind it immediately honked. A cyclist swerved. Nothing happened for five full seconds. Then the Waymo crept forward and completed the left turn.
I ran the timing in my head and still couldn’t figure out what triggered it. Then I spotted him: a construction worker on the far side of the intersection had taken two steps into the road, thought better of it, and backed up. I never registered him as a threat. The Waymo never stopped tracking him until he was clear.
That moment explained more about self-driving AI than six months of reading white papers had. The car wasn’t driving like a human. It was doing something stranger — and, depending on how you look at it, something more careful.
I spent the following months riding in these vehicles across nine cities, talking to engineers where I could, and trying to understand what’s actually happening under the hood. Here’s what I found.
The Sensor Setup Is More Redundant Than You Think
The standard explanation starts with “LiDAR, cameras, and radar” and stops there. That’s accurate but misses the point that actually matters: these sensors overlap specifically because any one of them can fail.
Waymo’s sixth-generation platform runs five LiDAR units, six radar arrays, and 29 cameras. Those numbers aren’t a spec sheet flex — they’re engineering margins. If two LiDARs degrade in rain, the car still has three. If a camera is obscured, 28 others remain. The system is designed around the assumption that sensors will occasionally give bad data.
How Each Sensor Contributes
Each sensor type covers what the others can’t:
LiDAR produces dense 3D point clouds — it knows exactly how far away every surface is, to within centimeters. But it’s expensive, generates enormous amounts of data, and can struggle in heavy rain or snow.
Cameras are cheap and rich in detail — they read traffic lights, lane markings, signs, and faces. The problem is they have no native sense of depth, and they’re very sensitive to lighting conditions.
Radar sees through fog and rain, and it’s excellent at measuring the velocity of moving objects. The downside is low spatial resolution — radar can tell you something is moving fast, but not exactly what it is or where its edges are.
No single sensor is “the answer.” The car survives because any two can fail and it still knows where it is.
Tesla vs. Waymo: Two Different Bets
Tesla stripped radar from its FSD hardware after Version 3, betting that cameras plus large neural networks could cover everything radar provides. Waymo kept the full sensor stack and added more. As of 2026, both approaches work — just with different failure modes. Waymo degrades slowly in sensor-hostile conditions. Tesla can have gaps in dense fog that its camera-only stack can’t fully patch.
Neither is wrong. They’re different risk tolerances, made by teams with different philosophies about where the edge cases live.
It’s Not One AI — It’s a Pipeline
This is the part most tech coverage gets wrong. There’s no single “self-driving AI.” There’s a sequence of models, each doing a specific job, each passing its output to the next stage.
Stage 1: Perception
The perception model takes raw sensor data and turns it into a labeled picture of the world: car, pedestrian, cyclist, static obstacle, shopping cart, dog. By 2026, this layer has gotten unusually good.
Tesla’s FSD v13 (released late 2025) uses a video transformer architecture — it doesn’t process individual camera frames. It processes a rolling window of recent frames as a sequence, so it sees motion patterns rather than static snapshots. A shopping cart rolling into a crosswalk produces a different pattern from one sitting still, even if they’re in the same pixel location. That distinction matters enormously for what happens next.
Stage 2: Prediction
Once the car knows what’s around it, it needs to estimate what each object will do in the next several seconds. This is where Waymo’s occupancy flow networks come in — they produce a grid of the world where every cell contains not just “is something there” but “which direction is it moving and how fast.”
That velocity information makes predictions far more robust. A pedestrian walking parallel to the road is treated differently from one stepping off the curb, even if their positions look similar at a single moment in time.
Stage 3: Planning
The planning model takes the labeled, predicted world and finds a path through it — one that’s legal, smooth, and won’t end in a collision. This is where questions like “when do I yield?” and “how aggressively do I merge?” get answered, and it’s also where a lot of the interesting failures happen.
Stage 4: Control
The control layer converts the planned path into actual steering angles, throttle inputs, and brake pressure. This is the closest thing to “driving” in the traditional sense, though by this point most of the hard thinking is already done.
What Actually Changed Between 2023 and Now
The improvement between the 2023 generation of AVs and what’s on the road today is real. I was skeptical going in — this industry has overpromised for a decade. But three things happened in parallel that genuinely moved the needle.
Compute Got Cheap Enough
The NVIDIA Orin chip powering most commercial AV platforms in 2026 delivers 254 TOPS (Tera Operations Per Second) — roughly seven times the capability available at a competitive price four years ago. More compute means bigger models. Bigger models means fewer edge-case failures. The relationship isn’t linear, but it’s consistent.
Simulation Solved Part of the Data Problem
Collecting real-world training data is slow and expensive. More importantly, the situations that matter most — the genuinely rare ones — almost never appear in a reasonable driving dataset. Waymo generates billions of synthetic driving miles each year in software, exposing its models to scenarios that essentially never happen organically: a pedestrian crossing diagonally during a left turn at 11pm in light fog. You can’t collect enough of those in Phoenix. You can render them by the million.
Weather Handling Got Genuinely Better
I spent time in a Zoox vehicle in Seattle in February during steady rain. The previous generation of systems I’d ridden in would enter a cautious shuffle mode in those conditions — slowing way down, activating hazard lights, clearly trying to minimize commitment. This vehicle didn’t. It wasn’t fearless, but it wasn’t paralyzed. That shift, from “weather causes visible degradation” to “weather causes manageable caution,” is a meaningful one.
Where It Still Goes Wrong
It would be dishonest to stop there. The current generation has real, consistent failure categories that anyone riding in these vehicles will notice.
Construction Zones
This remains the most reliable way to watch an AV struggle. The painted lines are wrong. Signs contradict each other. Cones are in different positions every day. Humans are making hand gestures that don’t appear in any training corpus. Every AV I rode handled construction zones noticeably worse than open road. Waymo slows to near-walking pace and sometimes requests remote human assistance. Tesla FSD picks a lane based on what looks most plausible and commits — which is more confident but not always correct.
Novel Vehicles and Unusual Situations
I watched a Waymo hesitate for eight full seconds when a horse trailer merged onto a surface road. The system eventually placed the trailer in a workable category and continued, but eight seconds of uncertainty at merging speeds is a meaningful safety window. Unusual vehicle types — agricultural equipment, oversized loads, horse trailers — still produce visible pause.
The Aggression Asymmetry
Every AV is tuned to be extremely cautious, which means other drivers learn quickly that they can bully it. On an LA freeway on-ramp, the Waymo I was riding in yielded to three cars that had no real right of way. A human driver would have held position. The Waymo blinked first, every time.
This is partly deliberate policy: if the AV is ever in a collision, it should clearly not be the one at fault. But it’s also an unsolved training problem. The consequence is that AVs operate a bit like someone who’s too polite for the road they’re actually on.
What to Expect If You Ride in One
A few things nobody tells you before your first ride:
The Braking Will Feel Early
The car stops much farther from intersections than a human driver would. It’s not a malfunction — it’s tuned for larger stopping margins. You’ll adjust after the first few minutes.
Don’t Try to Help It
If you grab the wheel or tap the brake in a Tesla, you’ll confuse the system’s state estimation. The car has a mental model of what’s happening, and sudden human inputs create noise in that model. Unless something is genuinely wrong, let it run.
Watch the Companion App
If you’re in a Waymo, the companion app shows a simplified live view of what the car’s sensors are detecting. It’s worth opening at least once — you can watch it track a cyclist three cars ahead and maintain a buffer around them even when your human eyes would have lost them in traffic. That view makes the whole system click in a way that descriptions don’t.
Pay Attention at Construction Zones
Not because you need to be ready to take over, but because you’ll notice the car’s behavior change — it slows, widens its margins, moves more tentatively. Understanding when the car is “thinking harder” tells you something real about where the system’s confidence boundaries are.
Common Mistakes People Make About Self-Driving AI
A lot of the public conversation about AVs gets muddied by a few persistent misunderstandings worth clearing up.
Thinking it works like cruise control. Traditional cruise control follows a speed setting. AV systems are building and updating a 3D model of the world multiple times per second and making predictions about the future. The difference in complexity is not small.
Assuming “more sensors = safer.” More sensors help, but redundancy matters more than raw quantity. A car with 50 cameras all looking forward is less safe than one with 10 cameras covering all directions. Coverage and overlap are the design goals, not count.
Expecting it to behave like a human. The driving style is different. The braking is earlier, the lane-keeping is tighter, the speed is often more conservative. None of that means something is wrong. It means the optimization target is different — the car is minimizing certain risks, not mimicking human behavior.
Believing the geography doesn’t matter. Waymo operates in specific geofenced areas where it has detailed maps built from years of prior driving. Those maps contain information about lane widths, common pedestrian crossing spots, and local traffic patterns that isn’t available in real time from sensors alone. An AV isn’t “driving everywhere” — it’s operating in a known environment it has studied extensively.
The Part Nobody Told Me Going In
Self-driving AI isn’t trying to drive like a human. It’s trying to avoid the specific ways humans get killed: distraction, fatigue, impairment. On those metrics, a reasonably mature AV already performs better than the average driver on a long commute. It doesn’t check its phone. It doesn’t get worse after a long shift. Its reaction time doesn’t degrade after 8 hours on the road.
What it does get worse with: situations that don’t look like its training data. Unusual inputs. Environments that are genuinely weird in ways that weren’t anticipated when the model was built.
The gap between “superhuman performance in good conditions” and “reliable across all conditions” is where the industry is living right now. That gap is real. It’s not close to closed. Anyone telling you AVs are either broadly safe or categorically dangerous hasn’t spent enough time watching both halves of that sentence be true simultaneously.
My honest read after all of this: these systems are better than I expected, worse than the press releases suggest, and improving faster than I thought possible when I started paying attention two years ago. That doesn’t collapse into a clean take. But it’s where we actually are — and honestly, it’s a more interesting place than either the hype or the backlash.
