Amar Bhattarai: How to Actually Read a Blue Screen: BSOD Troubleshooting

A blue screen is not Windows failing randomly — it's Windows refusing to continue because something running in kernel mode (a driver, usually, or hardware misbehaving underneath one) did something that would corrupt data if execution continued. Crucially, the crash writes a detailed confession to disk every time. Most people reboot past it. This guide is about reading it.

01 — Capture the Two Facts on the Screen

Before the automatic reboot, the screen gives you two leads:

The stop code — e.g., IRQL_NOT_LESS_OR_EQUAL. This is the category of crime.
"What failed:" — sometimes a filename like nvlddmkm.sys. When present, this is the prime suspect (that example is NVIDIA's display driver; .sys files are drivers, and a web search for any of them identifies the owner in seconds).

Missed it? Everything is recorded. The stop code lands in Event Viewer (System log, source BugCheck, Event ID 1001), and Reliability Monitor (type reliability in Start) shows every crash on a timeline — which is itself diagnostic: crashes that started the week a new driver or app arrived tell you where to look.

02 — What the Common Stop Codes Mean

Stop code	Usual meaning	First suspects
`IRQL_NOT_LESS_OR_EQUAL`	Driver touched memory it shouldn't	Drivers, then RAM
`PAGE_FAULT_IN_NONPAGED_AREA`	Reference to memory that isn't there	RAM, drivers
`SYSTEM_SERVICE_EXCEPTION`	Exception in kernel code	Drivers (often graphics/AV)
`DPC_WATCHDOG_VIOLATION`	A driver hung too long	Storage drivers, SSD firmware
`WHEA_UNCORRECTABLE_ERROR`	The CPU reported a hardware fault	Overclocking/XMP, power, CPU, heat
`CRITICAL_PROCESS_DIED`	A core Windows process terminated	System file corruption, disk
`VIDEO_TDR_FAILURE`	GPU stopped responding	Graphics driver, GPU itself
`KERNEL_SECURITY_CHECK_FAILURE`	Kernel structure corruption detected	Drivers, RAM

💡 Plain English: read the table and notice the pattern — almost every road leads to drivers or memory. Windows itself is rarely the culprit; the third-party code it's forced to trust usually is. That's why the diagnosis below revolves around those two.

03 — Make Sure Crash Dumps Are Being Saved

Confirm once: System Properties → Advanced → Startup and Recovery → Settings (run sysdm.cpl), under Write debugging information choose Small memory dump. Dumps land in C:\Windows\Minidump\, one file per crash, a few hundred KB each. Note: dumps need a page file on C: — if you've disabled it (a "performance tweak" that should die), you've also disabled crash forensics.

04 — Read the Dump: Easy Mode and Proper Mode

Easy mode — BlueScreenView (free, NirSoft): open it and every dump appears as a row, with the drivers involved in the crash highlighted in red. If three different crashes all highlight the same .sys file, you're done diagnosing — go deal with that driver.

Proper mode — WinDbg (Microsoft Store, free): File → Open dump file → pick the latest minidump, then run:

!analyze -v

The output is verbose, but you only need three lines: the bugcheck name at the top, MODULE_NAME / IMAGE_NAME (the suspect file), and PROCESS_NAME (what was running). When IMAGE_NAME says ntoskrnl.exe — the Windows kernel itself — that usually means the true culprit corrupted memory and ran, leaving the kernel holding the evidence. That pattern is a strong hint toward RAM problems or a hit-and-run driver, which the next two sections are built to catch.

05 — Test the RAM Properly

Varied stop codes, different drivers blamed each time, crashes under load: classic bad memory. Two tools:

Windows Memory Diagnostic (mdsched.exe) — convenient, but its standard pass misses subtle faults.
MemTest86 (free, bootable USB) — the real test. Boot it and let it run all four passes; hours, ideally overnight. Any red error line means bad RAM — there is no acceptable error count.

Two practical notes from the trenches: if errors appear, test one stick at a time to find the guilty module. And before condemning hardware, disable XMP/EXPO memory overclocking in UEFI and retest — memory running beyond its stable speed produces identical symptoms, and "it crashed until I turned off XMP" closes a lot of WHEA cases.

06 — Flush Out a Hiding Driver: Driver Verifier

When dumps keep blaming the kernel and RAM tests clean, Driver Verifier is the trap: it puts chosen drivers under strict supervision so the guilty one crashes red-handed, named in the next dump, instead of corrupting memory and escaping.

verifier
# Choose: Create standard settings
# Then:  Select driver names from a list →
#        select all NON-Microsoft drivers → Finish → reboot

⚠️ Read before running: Verifier makes crashes more likely on purpose, including possibly at boot. Know the exit before you enter: boot to Safe Mode (or WinRE → Command Prompt) and run verifier /reset. Set a System Restore point first. Then use the machine normally until it crashes, read the new dump — it will now name the verified driver directly — and update, roll back, or remove that driver.

07 — The Non-Driver Causes Worth Checking

Heat: crashes only under gaming/rendering load → check temperatures with HWiNFO; a CPU brushing 100 °C or a GPU with dead fans produces "random" crashes that aren't.
Storage: CRITICAL_PROCESS_DIED and friends justify a SMART check — covered in my 100% disk usage article [link to article #6], section 02.
Power: crashes under load with WHEA codes on a desktop can be a failing PSU — the hardest component to diagnose without swapping it.
System files: DISM /Online /Cleanup-Image /RestoreHealth then sfc /scannow costs little and rules out corruption.

[Personal note placeholder: a short story of a BSOD you diagnosed — what the dump showed and what the fix was.]

🔒 Bottom line: every blue screen files a report; the skill is reading it instead of rebooting past it. Stop code → dump analysis → the driver-or-RAM fork: BlueScreenView or !analyze -v for the named suspect, MemTest86 overnight for memory, Driver Verifier when the culprit hides. Guessing is optional.