DOS Development in the Early Days: What It Was Actually Like to Ship Software on 640KB

Before Stack Overflow: Shipping Code When the Machine Was the Debugger

The thing that surprises most developers when they first dig into DOS 3.x/4.x code is how total the control was. Your program didn’t compete for CPU time. It didn’t wait on a scheduler. When your code ran, it owned everything — RAM, interrupts, the keyboard buffer, the video hardware. No kernel standing between you and the metal, no MMU throwing a segfault when you walked off the end of an array. You just corrupted memory silently and wondered why the screen turned green twenty instructions later.

I got into this space by inheriting a codebase for an industrial controller that was still shipping on DOS 4.01 in 2018. Before you laugh — go count how many point-of-sale terminals, medical devices, and embedded HMIs are running some variant of a real-mode x86 environment right now. The constraints from 1988 didn’t disappear; they got frozen into production systems that nobody wants to rewrite because they haven’t crashed in fifteen years. Understanding early DOS development isn’t nostalgia, it’s practical archaeology that pays actual money.

What this piece covers is the real texture of that environment: the toolchain (Turbo C 2.0, MASM 5.x, DEBUG.COM as your last-resort disassembler), the memory model nightmare that burned everyone at least once, and the interrupt-driven I/O patterns that modern async programming quietly reinvented. I’m also going to talk about what debugging felt like when your only feedback loop was a POST card, a hex dump, and the sound of a speaker beep you coded yourself. For a broader look at how developer productivity has evolved since then, see our Ultimate Productivity Guide: Automate Your Workflow in 2026.

The toolchain itself is worth understanding concretely. A typical mid-era DOS project compiled with Borland’s Turbo C 2.0 or Microsoft C 5.1, linked with their respective linkers, and produced either a .COM file (flat 64KB image, origin at 0x100) or an .EXE with a relocation table. The .COM format was brutally simple — the entire program fit in a single segment. The moment you needed more than 64KB of code plus data plus stack, you graduated to .EXE and immediately had to choose a memory model:

  • tiny — everything in one segment, .COM output only
  • small — one code segment, one data segment, the sweet spot for most utilities
  • compact — small code, far data pointers — the model that confused everyone
  • large — far pointers everywhere, the one you reached for when your data exceeded 64KB
  • huge — like large but with pointer normalization for arrays crossing segment boundaries, and a significant performance cost

Picking the wrong model was a silent killer. You’d compile in small model, pass a near pointer to a function that expected a far pointer, and the program would work perfectly on your machine with its specific memory layout — then corrupt a customer’s BIOS data area on different hardware because the segment assumption was wrong. No warning, no crash, just wrong behavior. This is exactly the class of bug you still see in microcontroller code where pointer width assumptions get baked into function signatures.

; Calling DEBUG.COM to inspect a .COM binary — old-school but it works
C:\> debug myprog.com
-u 100 120        ; unassemble from offset 0x100 for 32 bytes
-d ds:0 ff        ; dump data segment
-g =100           ; run from entry point
-q                ; quit

The debugging story is where the real character of the era shows up. DEBUG.COM shipped with every DOS installation and was your interactive disassembler, memory inspector, and step-through debugger all in one 20KB binary. Turbo Debugger was a luxury — and a genuinely good one, probably still the best pure-text-mode debugger I’ve ever used for step-time clarity. But in the field, on a customer machine where nothing extra was installed, you were dropping back to DEBUG, reading hex dumps, and cross-referencing your linker map file by hand. The skill that era built — being able to read a register dump and mentally reconstruct what the stack frame looked like — is something I still rely on when I’m debugging a firmware panic on a Cortex-M4 and the JTAG adapter is three time zones away.

The Hardware Constraints That Shaped Every Decision

The thing that surprises modern developers most about DOS hardware constraints isn’t that they existed — it’s how directly those constraints mapped into code. There was no OS layer padding you from the metal. Every decision about memory layout, I/O, and timing had immediate, visible consequences in your binary.

The 640KB Wall Was Structural, Not Configurable

The 8086’s 20-bit address bus gave you 1MB of addressable space, but IBM partitioned that map at the hardware level. The top 384KB was reserved for ROM BIOS, video memory, and adapter cards. That left 640KB of conventional memory for everything: your program, the DOS kernel, any TSRs (terminate-and-stay-resident programs), and whatever data you needed at runtime. If you loaded a mouse driver, a network stack, and DOS itself, you might have 400KB of usable space for your actual application before you wrote a single line of logic. Developers tracked memory maps obsessively. Here’s the canonical breakdown burned into every DOS programmer’s memory:

0x00000 - 0x003FF   Interrupt Vector Table (1KB)
0x00400 - 0x004FF   BIOS Data Area (256 bytes)
0x00500 - 0x9FFFF   Conventional Memory (DOS + programs)
0xA0000 - 0xBFFFF   Video Memory (EGA/VGA mapped here)
0xC0000 - 0xFFFFF   ROM, BIOS, adapter firmware

The cruel part was that video memory sat right in the middle of what could have been a contiguous space. Writing directly to 0xA000:0000 in a far pointer was how you drew to the screen — no framebuffer abstraction, no driver call, just a memory write. Fast and dangerous in equal measure.

Real Mode Pointer Math Will Break Your Brain

Almost every DOS application ran in Real Mode, which meant the CPU addressed memory using a segmented model: a 16-bit segment register shifted left by 4 bits, plus a 16-bit offset. The physical address formula was (segment × 16) + offset. This meant multiple segment:offset combinations mapped to the same physical byte, and pointer comparisons could lie to you. 0x1000:0x0010 and 0x1001:0x0000 both resolved to physical address 0x10010 — but a naive equality check on those pointers returns false. I’ve seen junior DOS code that passed pointer comparisons and worked perfectly until someone changed a compilation flag that shifted the segment values. Protected Mode (available on 286+) offered proper 24-bit or 32-bit flat addressing, but switching into it meant leaving DOS services behind. DOS extenders like DOS/4GW (which shipped with early Doom) solved this by bootstrapping into Protected Mode while maintaining a compatibility shim back to real-mode INT calls. That was the pragmatic workaround; most applications didn’t bother.

No Memory Protection Means Bugs Are Ambushes, Not Crashes

A wild pointer in a modern process triggers a segfault at the exact line of bad code. In Real Mode DOS, that same pointer overwrites whatever memory happens to live at that address. If you’re lucky, it’s your own program’s data and it crashes immediately. If you’re unlucky, it’s the interrupt vector table at the bottom of memory, and your program keeps running until you call INT 21h and the handler address is now garbage. I remember spending two days debugging a DOS program where a buffer overrun was silently corrupting the far pointer to the keyboard handler — everything worked until you pressed a key. Tools like Borland’s Turbo Debugger helped, but you were fundamentally debugging in a system that had no safety net and no concept of process isolation. Every running program shared one flat address space with the OS.

The INT API Was the Entire Platform

The interrupt table wasn’t just for exceptions — it was the function call ABI for the entire platform. You loaded registers with arguments and fired a software interrupt. That was the interface contract. Three interrupts covered almost everything you’d need:

  • INT 21h — DOS services: file I/O, memory allocation, process control. AH register held the function number.
  • INT 10h — BIOS video services: set video mode, write characters, pixel operations in graphics modes.
  • INT 13h — Direct disk access: read/write sectors by CHS (cylinder/head/sector) addressing. Bypassed the filesystem entirely.
; DOS INT 21h example: write string to stdout
; AH=09h, DS:DX points to '$'-terminated string
MOV AH, 09h
MOV DX, OFFSET myString
INT 21h

myString DB 'Hello, DOS', 0Dh, 0Ah, '$'

The $ terminator for strings was an INT 21h quirk — it had nothing to do with C’s null terminator, which caused constant friction when mixing DOS service calls with C string functions. You also had the option to hook interrupts yourself. Need a custom keyboard handler? Overwrite the INT 09h vector with your function’s address and chain to the original. This was how TSRs worked, how anti-virus programs worked, and how a lot of malware worked too — same mechanism, different intent.

Timing Loops Were Genuinely Broken by Faster CPUs

The original IBM PC ran at 4.77 MHz, and a lot of early games and demos used delay loops calibrated to that speed: count down a register, do nothing, repeat. When the 286 and 386 arrived running at 8–16 MHz, those loops finished 2-4x faster. Animations played at double speed. Sound effects changed pitch. Games became unplayable. The correct fix was to use a hardware timer — the 8253/8254 PIT chip fired INT 08h 18.2 times per second, and you could reprogram it for higher resolution. But the dirty shortcut, the one you see in a lot of old code, was detecting CPU speed at startup and scaling the loop counter. Neither solution was clean. I’ve disassembled DOS games from the mid-80s where the speed detection routine is literally “time a loop against the PIT, store a multiplier, use it everywhere.” Modern code has the inverse problem: you assume the CPU is fast and worry about doing too much work. DOS developers had to worry about doing too little work — and doing it at a consistent rate across hardware they couldn’t predict.

The Actual Toolchain People Used

The thing that surprises modern developers about the DOS toolchain isn’t how primitive it was — it’s how fast the edit-compile-run cycle ran. Turbo Pascal 5.5 and 6.0 compiled to native 8086 code faster than most interpreted languages run today. Sub-second compiles were normal, not exceptional. The entire IDE fit in 80 columns, ran in conventional memory, and you went from editing to running with a single F9 keypress. Borland understood that developer feedback loops matter before anyone had that phrase in their vocabulary. I’ve talked to devs who used TP6 daily and they describe the experience the way people describe Vim — once it’s in your fingers, everything else feels sluggish.

Borland C++ 3.1 was the serious choice if you were writing anything that needed to ship commercially. The compiler itself was tight, but the flags were something you memorized because IDE projects were for amateurs and everyone had a MAKE file. The ones burned into my brain from reading old documentation:

BCC -O2 -G -Z -ml mygame.c graphics.c -o mygame.exe
# -O2: full optimization
# -G: favor speed over size
# -Z: suppress redundant loads (actually meaningful on 286/386)
# -ml: large memory model (64K code + 64K data per segment was a real constraint)

You picked your memory model at compile time and lived with that decision. Small, Medium, Compact, Large, Huge — each one changed pointer sizes and how the linker laid out segments. Getting this wrong meant subtle data corruption that didn’t crash immediately, just corrupted the heap three seconds later.

Microsoft C 6.0 and the early MSVC builds had noticeably slower compile times than Borland — everyone knew it, nobody pretended otherwise. What MSC had was CodeView, and CodeView was genuinely ahead of its time for debugging native code. You could step through assembly interleaved with source, inspect segment registers, watch memory addresses update in real time. If you were tracking down a stack corruption bug or a bad far pointer dereference, CodeView made it survivable. The build command with debug info looked like:

cl /Zi /Od /AL mainloop.c io.c -link /CO
# /Zi: embed CodeView debug info
# /Od: disable optimization (required for meaningful debugging)
# /AL: large model
# /CO: pass /CODEVIEW to the linker

For hot paths — blitters, audio mixing loops, anything touching hardware directly — you dropped to MASM. The two ways to do it were inline _asm blocks in your C file for short sequences, or separate .ASM files you compiled with MASM 5.x or 6.x and linked in. Inline was convenient but the Borland and Microsoft compilers handled register clobbering rules differently, which bit people who tried to port code between the two. The separate-file approach was cleaner for anything longer than 20 instructions:

; memfill.asm — fills a far buffer with a word value, fast
PUBLIC _FastFill
_FastFill PROC FAR
    push bp
    mov bp, sp
    les di, [bp+6]    ; far pointer to destination
    mov cx, [bp+10]   ; count in words
    mov ax, [bp+12]   ; fill value
    rep stosw
    pop bp
    ret
_FastFill ENDP
END

Then in your MAKE file — and yes, MAKE files from 1991 look weird but they’re doing exactly what cmake and ninja do, just without the abstraction layers:

mainloop.obj: mainloop.c defs.h
    bcc -c -ml -O2 mainloop.c

memfill.obj: memfill.asm
    masm /MX memfill.asm, memfill.obj, memfill.lst, memfill.crf

mygame.exe: mainloop.obj memfill.obj graphics.obj
    tlink /m /l mainloop+memfill+graphics, mygame, mygame.map, emu+math+cl

Version control was essentially nonexistent for most DOS shops. The workflow was: before touching a file, your editor made a .BAK copy automatically. Before a milestone, you’d do PKZIP -r project_1993_03_15.zip *.c *.h *.asm *.mak and copy it to a second hard drive or a set of 3.5″ disks. Some teams kept a logbook — an actual paper notebook — with dates and what changed. RCS existed, SCCS existed, but running them on DOS took real effort and most people didn’t bother. The honest trade-off: you lost granular history, but the zip archive approach meant your backup was also your distribution artifact, which mattered when you were mailing source to contractors on physical media.

Setting Up a DOS Dev Environment Today (DOSBox-X and Real Hardware)

The thing that surprised me most when I first fired up vanilla DOSBox for DOS development was how many subtle hardware behaviors it gets wrong. For playing games, that doesn’t matter. For writing code that’s supposed to run on real DOS hardware, it absolutely does. DOSBox-X is the fork you want — it exposes accurate INT 13h disk interrupt behavior, handles the memory model quirks that Borland’s compilers actually care about, and doesn’t paper over hardware details that vanish in the sanitized DOSBox experience. I switched about two hours into trying to get a Borland C++ project linking correctly.

The config options that matter are minimal but non-obvious. Create or edit dosbox-x.conf and get these right first:

[dosbox]
machine=svga_s3
memsize=16

[cpu]
cycles=max
cputype=pentium

[dos]
hard drive data rate limit=0

machine=svga_s3 gives you the S3 Trio64 chipset that most real 486-era machines shipped with — the VESA modes behave correctly and the BIOS extensions match what period code expects. memsize=16 is 16MB of RAM, which is the comfortable upper bound for what DOS extended memory managers like HIMEM.SYS actually dealt with in practice. cycles=max removes the artificial cycle cap so compilation doesn’t take thirty seconds for a 500-line file.

The killer feature for a modern workflow is mounting your host filesystem directly:

# Inside DOSBox-X console
mount C ~/dos_projects
C:
cd myproject
tpc main.pas   # Turbo Pascal compiler from the command line

You edit source files in VS Code or whatever you like on the host side, flip to your DOSBox-X window, and compile. No floppy image juggling, no copying files around. The files live on your real filesystem and DOSBox-X just sees them as drive C. This alone makes DOS development feel tolerable rather than nostalgic-painful.

For the compilers: Borland released Turbo Pascal and Turbo C as freeware years ago, and Embarcadero (who acquired Borland’s assets) has kept some of them available. Turbo Pascal 7.0 and Turbo C++ 3.0 are the ones worth grabbing — verify you’re pulling them from cc.embarcadero.com/museum or the Vetusware mirror rather than random abandonware sites where the zips may be modified. Borland C++ 3.1 is a step up from TC++ 3.0 and has better IDE integration, but its legal status is grayer — it was never officially declared freeware, so you’re in abandonware territory. For serious work I use Turbo Pascal 7.0 because the licensing is clean and the compiler is fast. Once you have the zip extracted to ~/dos_projects/tp7, the setup inside DOSBox-X is just:

mount C ~/dos_projects
C:
cd tp7\bin
tpc /CP+ hello.pas

Real hardware is genuinely different, and I don’t mean that romantically. I mean the timing behaviors, the memory bus contention, the way a real ISA Sound Blaster responds to port I/O — none of it is fully emulatable. A 486DX2-66 or a Pentium 75 machine is still findable on eBay for under $100 most weeks, and a machine with a working ISA slot matters if you want to deal with period hardware (ISA slots disappeared around the late Pentium II era). The experience of writing a TSR that hooks INT 9h and actually watching it work on real hardware, where timing bugs will manifest that DOSBox-X silently tolerates, teaches you things that emulation simply won’t. That said, real hardware is where you go after you’ve got a working workflow in emulation — the iteration cycle of edit-compile-test is too slow on period hardware to use it as your primary environment.

Writing Your First .COM vs .EXE Program — and Why the Difference Matters

The thing that surprised me most when I first cracked open a DOS .COM file in a hex editor was how naked it was. No header, no magic bytes at the start — just raw x86 instructions beginning at offset 0x00. The OS loads it at segment:offset CS:0100h and immediately jumps. That 256-byte gap before 0100h is the Program Segment Prefix (PSP), which DOS uses to pass command-line args and environment info. Your code never owns those bytes, but it can read them. The whole model is almost absurdly simple: one segment, max 64KB including code, data, and stack, no relocation needed because there’s nothing to relocate. That simplicity is exactly why a .COM is so easy to fully understand — you can read the entire thing in a disassembler in an afternoon.

A minimal MASM .COM that prints a string and exits cleanly looks like this:

; hello.asm — assemble with: masm hello.asm; link hello.obj;
; then rename hello.exe to hello.com (or use EXE2BIN)
; Alternatively: nasm -f bin -o hello.com hello.asm

    org 100h            ; tell assembler code starts at offset 100h

start:
    mov  dx, offset msg ; DS:DX must point to '$'-terminated string
    mov  ah, 09h        ; INT 21h function 09h: print string
    int  21h

    mov  ax, 4C00h      ; AH=4Ch terminate, AL=exit code (0)
    int  21h

msg db 'Hello, DOS!', 0Dh, 0Ah, '$'  ; CR+LF, '$' is the terminator

The org 100h directive is load-bearing — without it, all your offset calculations are wrong by exactly 256 bytes and you’ll spend an hour debugging a working program. The $ string terminator for INT 21h/09h is one of those DOS-isms that trips people up; it has nothing to do with null-termination. Mixing up the two termination styles will print garbage until it hits a dollar sign somewhere in memory.

.EXE files are a different world. The MZ header (named after Mark Zbikowski, whose initials are the first two bytes: 4D 5A) contains a relocation table that lets the loader fix up segment references at load time. This is what allows multiple code and data segments to coexist. When you link with Microsoft’s LINK.EXE from the MASM 5.x or 6.x era, the /MAP flag is genuinely essential during development — it produces a .MAP file that lists every segment, its size, and its relative address. Without it, you’re guessing why your 20KB program somehow allocates 48KB:

LINK hello.obj, hello.exe, hello.map /MAP /NOE
; /NOE = no extended dictionary, avoids duplicate symbol errors with older libs
; The .MAP file will show you segment order, sizes, and public symbols

A .MAP excerpt looks like 0000:0000 00019H _TEXT — that tells you the text segment starts at offset 0 and is 25 bytes. I’ve fixed more mysterious crashes by reading a map file than by attaching a debugger.

The memory model question in Borland C++ and Microsoft C 6.0 is where things get genuinely dangerous at scale. The six models — TINY, SMALL, MEDIUM, COMPACT, LARGE, HUGE — control whether code and data pointers are near (16-bit, same segment) or far (32-bit, segment:offset). SMALL gives you one 64KB code segment and one 64KB data segment, which is fine for most utilities. LARGE gives you multiple segments for both, with far pointers everywhere. HUGE adds special runtime support for individual data items larger than 64KB. Picking SMALL when your data grows past 64KB gives you silent pointer wrap-around — malloc succeeds, you write to the pointer, and you’ve clobbered something else in the segment. The 2am version of this bug is realizing your char * and a char far * are pointing to the same physical memory only by accident, because you mixed near and far pointers across a module boundary. Borland’s huge keyword and Microsoft’s __far let you force far semantics per-variable without switching the whole model, which is how you patch this without recompiling everything.

  • TINY: everything in one 64KB segment — this is literally what produces a .COM file via EXE2BIN
  • SMALL: one code segment, one data segment — default for most small tools, near pointers everywhere
  • MEDIUM: multiple code segments, one data segment — right choice for programs with lots of functions but small data
  • COMPACT: one code segment, multiple data segments — unusual; fits data-heavy but logic-light programs
  • LARGE: multiple code and data segments, far pointers default — what you use when you need the space and accept the overhead
  • HUGE: like LARGE but sizeof(array) > 64KB is legal — pointer arithmetic crosses segment boundaries via runtime normalization, which is measurably slower

Debugging Without a Debugger (and With DEBUG.COM)

The thing that broke me early on wasn’t writing bad code — it was not knowing where the bad code was. You’d assemble your .COM file, run it, the screen would go black or the machine would lock, and that was it. No stack trace. No error message. Just silence. That’s when DEBUG.COM became the most important tool in the box.

DEBUG.COM ships with every version of DOS, lives in your PATH, and requires zero setup. You launch it with your .COM file as an argument and it drops you at a hyphen prompt with the file loaded at offset 0x100 (where all .COM programs live in the PSP). The five commands you had to internalize were non-negotiable:

  • d — dump memory as hex + ASCII. d DS:0100 shows you what’s actually at your program start.
  • u — unassemble. u CS:0100 disassembles from that address forward. Vital for understanding what the assembler actually emitted vs. what you thought you wrote.
  • r — show/set registers. Bare r dumps AX, BX, CX, DX, SP, BP, SI, DI, DS, ES, SS, CS, IP, and the flags word in one shot.
  • t — single-step one instruction, showing updated registers after each one.
  • g — go/run. g =100 1A3 starts execution from offset 100h and sets a breakpoint at 1A3h. When it hits, you’re back at the hyphen prompt with full register state.

The actual workflow looked like this: crash, hard reboot (or soft reboot via Ctrl+Alt+Del if you were lucky), boot back to DOS, then:

C:\> DEBUG MYPROG.COM
-r                          ; check initial register state, IP should be 0100
-u 100 140                  ; disassemble first chunk to find your suspect code
-g =100 1A3                 ; run until the address just before the bad branch
-r                          ; examine AX, BX — did the comparison set flags right?
-t                          ; step one instruction
-t                          ; step again
-d DS:0200                  ; dump the data segment if you're chasing a memory issue

Borland’s Turbo Debugger (TD.EXE) was a genuine revelation after that workflow. Source-level debugging. Watch windows. You could split the screen and see your C source code in one pane and the generated x86 assembly directly below it, stepping through both simultaneously. The thing that caught me off guard the first time I used it: you had to compile with -v in Turbo C to embed debug info, and TD.EXE had to be able to find the .C source files at the paths recorded at compile time — move your project folder and it’d silently fall back to assembly-only mode. Microsoft’s CodeView (CV.EXE) had the same gotcha but different flags: compile with /Zi and link with /CO, otherwise CodeView loads fine but shows you nothing useful.

; Turbo C debug build
tcc -v -N myprog.c          ; -v = debug symbols, -N = stack overflow check
td myprog.exe               ; launch Turbo Debugger

; Microsoft C / CodeView
cl /Zi myprog.c /link /CO   ; /Zi embeds symbols, /CO passes debug flag to linker
cv myprog.exe               ; launch CodeView

Both debuggers had a class of bugs they couldn’t reliably catch: anything timing-sensitive. Hardware interrupt handlers, code that polled the 8253 timer, anything where the debugger’s own INT hooks perturbed execution. For those, I fell back to printf-style debugging via INT 21h Function 02h — single character output directly through DOS, no library overhead:

; Drop this inline wherever you need a breadcrumb
; Outputs 'A' to stdout without touching any library code
mov ah, 02h
mov dl, 'A'     ; change the letter at each checkpoint
int 21h

For the cases where you didn’t trust even INT 21h (deep inside an ISR, for example), you read the BIOS Data Area directly. The BDA starts at physical address 0040:0000 and holds the machine’s low-level state — keyboard buffer head/tail pointers at 0040:001A/001C, equipment flags at 0040:0010, video mode at 0040:0049. Reading it directly told you what the hardware thought was happening, independent of whatever DOS or your own code believed. The move was to d 40:00 in DEBUG and just read the dump manually against the BIOS reference chart you’d photocopied from the IBM Technical Reference manual. No fancy tooling — just knowing what the bytes meant.

Memory Management: The Part That Will Break You

The thing that gets everyone first isn’t the 640KB ceiling itself — it’s that the ceiling is actually lower than that before your program even starts. DOS loads, your CONFIG.SYS drivers pile in, your AUTOEXEC.BAT TSRs grab chunks, and by the time your application gets control, you might have 560KB or less of conventional memory. I remember shipping a program that worked fine on my dev machine and crashed silently on a customer’s box because their CD-ROM driver ate another 18KB. That’s the DOS development experience in a nutshell.

The memory map looked like this, and you had to hold all of it in your head simultaneously:

0x00000 - 0x9FFFF  : Conventional memory (640KB) — your arena
0xA0000 - 0xBFFFF  : Video memory (EGA/VGA buffers live here)
0xC0000 - 0xEFFFF  : Upper Memory Blocks (UMBs) — ROM, option ROMs, mappable space
0xF0000 - 0xFFFFF  : System BIOS ROM

; Above 1MB (only reachable via protected mode or EMS/XMS trampolines)
0x100000+           : Extended memory (XMS) — HMA starts at 0x100000

UMBs were the hack that let you reclaim real estate by shoving drivers into the 384KB between 640KB and 1MB. The config that made this work required exact load order in CONFIG.SYS:

DEVICE=C:\DOS\HIMEM.SYS        ; MUST be first — installs the A20 handler
DEVICE=C:\DOS\EMM386.EXE NOEMS ; enables UMB access; swap NOEMS for RAM if you need EMS
DOS=HIGH,UMB                   ; move DOS kernel into HMA (the first 64KB above 1MB)
DEVICEHIGH=C:\DOS\SETVER.EXE   ; now this loads into UMB, not conventional memory

Get that order wrong and EMM386 fails silently or, worse, loads but reports no UMBs available. The common mistake was putting a driver that needed EMS before EMM386.EXE finished initializing. Your game would launch, call INT 67h to detect the EMS driver, get a zero back, and bail with a cryptic “Expanded Memory Manager not found” message that had nothing to do with the actual problem.

EMS (via the LIM 4.0 spec) and XMS were two completely different interfaces solving the same problem in incompatible ways. EMS mapped 64KB “pages” into a physical page frame in the UMB area — you had to explicitly map pages in and out through INT 67h calls, which meant your code looked like this:

; Map EMS logical page 3 into physical page 0 of the page frame
mov ax, 4400h      ; AH=44h: map unallocated page / AH=44h Function: Map Pages
mov bx, 3          ; logical page number
mov cx, 0          ; physical page (0-3)
mov dx, ems_handle ; handle from earlier alloc call
int 67h
or ah, ah
jnz ems_error      ; AH != 0 means failure — check it every single time

XMS was cleaner — you moved blocks above 1MB using a far call through the XMS driver, not an interrupt. The driver address came from INT 2Fh AX=4310h. If you mixed EMS and XMS calls in the same program without careful state tracking, you’d corrupt memory in ways that wouldn’t manifest until three function calls later, making the bug almost impossible to trace with the tools available at the time.

TSRs deserve their own horror story. INT 27h was the old way to go resident — it was simple but limited you to 64KB and, critically, it didn’t close open file handles. INT 21h AH=31h was the right approach: you set DX to the number of paragraphs to keep, and DOS marked that memory as owned. The real trap was interrupt vector cleanup. If your TSR hooked INT 9h (keyboard) or INT 1Ch (timer tick) and the user unloaded it out of order, your vectors now pointed at freed memory:

; On TSR install, save the old vector before replacing it
mov ax, 3509h      ; Get Interrupt Vector for INT 9h
int 21h
mov [old_int9_seg], es
mov [old_int9_off], bx

; On unload, check that your handler is still the current one FIRST
; If another TSR loaded after you and also hooked INT 9h, you cannot safely remove
; yourself — you'd orphan their handler. Most TSRs just didn't bother with unload.

The heap fragmentation problem in real mode is subtle and I watched it bite experienced C programmers. malloc() under Borland C++ called DOS INT 21h AH=48h, which allocated paragraphs (16-byte blocks). DOS used a simple first-fit or best-fit strategy depending on AH=58h settings. If you allocated and freed blocks of varying sizes — say, a 200-byte struct, then a 1000-byte buffer, then a 50-byte string — you’d end up with holes that couldn’t satisfy a 4KB allocation even if the total free bytes exceeded 4KB. There was no compaction. The solution was to think in terms of pools: allocate large blocks once, subdivide them yourself.

Far pointers are where Borland C developers lost entire weekends. In the large or compact memory model, void far *ptr stored a segment and an offset as a 32-bit value — 16 bits each. The bug pattern looked innocent:

char far *p = (char far *)0x50000010L; // segment 0x5000, offset 0x0010
char far *q = p + 0xFFF5;              // offset wraps around! 0x0010 + 0xFFF5 = 0x0005
                                        // segment is STILL 0x5000
                                        // q now points BELOW p in physical memory

// Normalized form (same physical address, different segment:offset):
// Physical = segment * 16 + offset
// 0x5000 * 16 + 0x0010 = 0x50010 (physical byte 0x50010)
// After bad arithmetic: 0x5000 * 16 + 0x0005 = 0x50005 — different physical address!

Borland’s _fptrnorm() could normalize a pointer, but most developers forgot it existed. The real lesson was: never do pointer arithmetic across a 16-bit offset boundary without normalizing first. Two far pointers that appeared equal with == could point to different physical locations if they weren’t normalized. Turbo Debugger could show you the raw segment:offset, which was the only way to diagnose this — and even then it required you to already suspect the pointer was the problem.

The 3 Things That Still Surprise Developers Who Dig Into This Era

The thing that hits hardest when you actually sit down with a DOS-era codebase is how much INT 21h could do on its own. One software interrupt, and you get file I/O, console input/output, process termination, environment variable access, and memory allocation — all dispatched by whatever value you loaded into AH before triggering it. Function 0x3C creates a file, 0x3F reads from a handle, 0x40 writes. No libc wrapper, no syscall table abstraction — just:

; Write "Hello" to stdout (handle 1) using INT 21h AH=40h
mov  ah, 40h        ; function: write to file/device
mov  bx, 1          ; handle 1 = stdout
mov  cx, 5          ; byte count
mov  dx, offset msg ; pointer to buffer
int  21h            ; DOS dispatcher picks it up from AH
; On return: AX = bytes written, CF set on error

What surprises modern devs isn’t the simplicity — it’s the completeness. The full function list across INT 21h covers maybe 80+ services. The entire API surface of a 1980s operating system fit in a single interrupt handler. I spent time cross-referencing against Ralf Brown’s interrupt list and kept expecting to find some parallel mechanism for certain features — there isn’t one. It’s all in there. That’s philosophically different from how we design systems now, where surface area sprawl is treated as normal.

The BIOS documentation thing genuinely caught me off guard. The IBM PC Technical Reference Manual — the original 1981 edition and its successors — doesn’t just describe the hardware. It includes the actual BIOS source code, printed in the appendix, in 8086 assembly, with comments. Every INT vector (INT 10h for video, INT 13h for disk, INT 16h for keyboard) is documented with entry conditions, return values, and register preservation guarantees. The memory map starting from segment 0000h is spelled out with what lives at every significant address: 0040:0000 through 0040:00FF is the BIOS Data Area, and you knew exactly what offset held the cursor position for each video page, what held the keyboard buffer head pointer, what held the equipment flags. This wasn’t reverse-engineered after the fact — IBM handed you the map. Hardware transparency at that level simply doesn’t exist anymore. Intel’s Architecture Software Developer’s Manual is thorough, but it’s 5,000 pages and describes a chip you can’t fully observe at runtime.

The practical consequence of that transparency was that shipping software required — and produced — developers who understood the whole stack. Not aspirationally, not as a career goal, but because you had no choice. A game developer in 1990 writing a sound driver for the OPL2 chip on an Ad Lib card was reading the Yamaha YM3812 register map, directly poking I/O ports at 0x388 and 0x389, and timing the writes manually because the chip needed a 23-microsecond delay between register select and data write or it would silently corrupt state:

; Write to OPL2 register — timing matters, no driver abstracts this for you
mov  dx, 388h    ; OPL2 status/address port
mov  al, reg_num
out  dx, al
; Burn ~23 microseconds — on a 4.77MHz 8088, 6 I/O reads does it
in   al, dx
in   al, dx
in   al, dx
in   al, dx
in   al, dx
in   al, dx
inc  dx          ; 389h = data port
mov  al, value
out  dx, al
; Now burn ~84 microseconds before next register write

There was no HAL, no kernel driver model, no audio API. You either knew the chip or your audio was broken. That constraint produced a specific kind of developer competence that’s genuinely rare now — not better or worse, just different. When something didn’t work, the answer was always in a document you could actually read, not a closed firmware blob or a kernel subsystem with 400,000 lines of history. The debugging workflow was: read the manual, check your register setup, verify your timing. The entire observable universe of the problem fit in your head. That’s the thing modern developers who dig into this era find most disorienting — not the constraint, but the legibility.

When You Should NOT Try to Write DOS Code (Honest Assessment)

If your goal is shipping something to real users in 2025, I’ll be blunt: close this tab and go back to whatever framework you were ignoring. DOS development is archaeology. The toolchain is fragile, the documentation is scattered across abandonware sites and 30-year-old PDFs, and the skills don’t transfer to your next sprint. I spent a weekend getting a simple text-mode menu rendering correctly under DOSBox and the main thing I shipped was a headache. Fun archaeology, zero career ROI for most of us.

The one situation where I’d argue this pays off immediately is legacy maintenance — and I mean real legacy, not “we still use jQuery.” There are CNC machines, patient monitoring systems, and industrial control panels running MS-DOS 6.22 on actual hardware in hospitals and factory floors right now. If you’re the person who gets the call when one of those breaks, knowing how INT 21h file I/O works or how to read the BIOS parameter block off a FAT12 floppy image is not academic. It’s the difference between a 2-hour fix and a $40,000 equipment replacement conversation with management.

As a learning tool for x86 internals, DOS is genuinely useful — but only after you’ve hit a ceiling with modern abstractions. The moment I actually understood what a GDT entry does in protected mode Linux was after I’d manually set up a segment descriptor in real mode DOS. Segmentation makes zero sense when you first read Intel’s Vol. 3 manual cold. It makes complete sense after you’ve written code where CS:IP is a thing you track manually and far pointers exist because your address space is 1MB. Same with interrupt dispatch — writing a TSR that hooks INT 9h to intercept keystrokes demystifies what your kernel’s interrupt controller abstraction is actually doing underneath.

The crossover to embedded work is more direct than most people expect. If you’re writing bare-metal firmware for an STM32, an ESP32, or anything RISC-V without an RTOS underneath, the mental model is almost identical to DOS: no MMU protecting you from yourself, no scheduler handing off the CPU, no libc you can trust blindly. You are the OS. The habit of thinking “what memory does this pointer actually point to, and who owns it right now” that DOS forces on you transfers directly to fighting a HardFault on Cortex-M4 at 3am. The tooling is totally different — you’re in GCC, OpenOCD, and gdb with a J-Link — but the reasoning is the same.

  • Ship a product to real users? Don’t. Use something with a package manager and a Stack Overflow presence.
  • Maintain actual DOS-era industrial or medical equipment? This knowledge pays immediately — find a copy of Ralf Brown’s Interrupt List and bookmark it now.
  • Learn x86 internals or OS concepts from scratch? Valid, but go in knowing it’s a ladder you kick away once you’ve climbed it. OSDev wiki + MIT 6.828 will take you further once DOS has given you the intuition.
  • Bare-metal embedded without an OS? The mental model maps directly. The specifics don’t, but the discipline of owning every byte of memory does.

The honest filter is: are you trying to understand something, or build something? DOS development is one of the best tools I know for understanding — the layer cake of PC hardware, how an OS actually bootstraps itself, why protected mode exists. As a building platform in 2025, it’s a dead end. Most developers reading this should treat it the way you treat reading K&R C — illuminating, worth doing once, not your daily driver.

Resources That Are Actually Worth Your Time

Ralf Brown’s Interrupt List is the one resource I keep coming back to no matter what. Every INT call, every register expected on entry, every possible return value — it’s all there. The original is a massive text dump (RBIL in zip form), but searchable HTML versions exist at sites like ctyme.com that make it much faster to use in practice. The thing that surprised me: it covers not just DOS interrupts but BIOS, EMS, XMS, DPMI, network adapters, CD-ROM extensions — stuff that’s genuinely hard to find documented anywhere else. If you’re trying to understand why some program calls INT 21h/AH=4Ch or what INT 10h/AH=0Eh actually does to the cursor state, this is the first place to look, not Stack Overflow.

The IBM PC Technical Reference Manual is a primary source, and that distinction matters. A lot of secondary write-ups about PC architecture get details slightly wrong — register widths, timing, which behavior is undefined vs. guaranteed. Archive.org has scanned PDFs of the original IBM manuals, including the technical reference for the 5150, 5160 (XT), and AT. Reading the actual schematics and BIOS listing for the original 5150 is a different experience from reading someone’s blog post about it. The BIOS source listing alone explains design decisions that still echo in modern x86 firmware.

Borland’s old compilers — Turbo Pascal 7.0 and Borland C++ 3.1 specifically — are the compilers that most DOS-era code was actually written with. Embarcadero (who acquired Borland’s assets) has made some of these available through their museum/legacy pages, but the availability and licensing has shifted over time, so verify the current status directly at museum.embarcadero.com before assuming anything is freely redistributable. The reason these matter: if you’re reading source from that era or trying to reproduce a build environment, GCC isn’t a drop-in substitute. The memory model assumptions, inline assembly syntax, and interrupt handler pragmas are compiler-specific. Turbo Pascal’s {$F+} far call directives and Borland C’s interrupt keyword are not things you replicate trivially.

DOSBox-X on GitHub is the fork to use for development work. The original DOSBox targets game compatibility; DOSBox-X targets accuracy and covers things like PC-98 hardware, different machine types, more complete EMS/XMS implementations, and better debugger integration. The built-in debugger alone is worth it — you can set breakpoints, inspect segment registers, and step through real-mode code. The wiki is solid, and more importantly, the issue tracker is actually useful: if something behaves unexpectedly, there’s a good chance someone already filed it with reproduction steps. Running it looks like this:

# Clone and build on Linux (needs SDL2, libfluidsynth, etc.)
git clone https://github.com/joncampbell123/dosbox-x.git
cd dosbox-x
./build-dosbox.sh

# Or grab a release binary and point it at your DOS directory
dosbox-x -conf my_dos.conf -c "mount c /home/user/dos" -c "c:"

For quick experiments where you don’t want to configure a local emulator, pcjs.org runs actual DOS in the browser with cycle-accurate emulation. The cycle accuracy is what makes it genuinely useful rather than just a curiosity — you can observe real timing behavior, not an approximation. It ships pre-loaded with various IBM PC configurations including the original 5150 with PC DOS 1.0. I’ve used it to quickly test how a program behaves on a CGA-only system without reconfiguring my local DOSBox-X setup. The source is on GitHub too if you want to understand how the emulation works, which is itself an education in x86 real-mode behavior.


Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.


Eric Woo

Written by Eric Woo

Lead AI Engineer & SaaS Strategist

Eric is a seasoned software architect specializing in LLM orchestration and autonomous agent systems. With over 15 years in Silicon Valley, he now focuses on scaling AI-first applications.

Leave a Comment