Number Systems & Data Representation
Understanding binary, hexadecimal, signed integers, floating-point numbers, and character encodings — the foundational language of computers.
Introduction
Computers run on electricity — on or off, 1 or 0. Everything else (images, text, video, sound) boils down to patterns of these bits. Understanding how numbers and data get represented in memory matters if you want to write code that doesn’t break in production.
The choices made decades ago still affect how we write software today. Floating-point math that made sense in the 1960s still produces head-scratching results. Representing negative numbers led to several competing schemes, each with their own quirks.
When to Use This Knowledge
Apply your understanding of data representation when:
- Debugging numerical precision issues in financial or scientific calculations
- Writing code that manipulates binary protocols or file formats
- Interpreting crash dumps and core files
- Working with bitwise operations for flags and masks
- Understanding why
0.1 + 0.2 != 0.3in floating-point - Ensuring cross-platform compatibility for integer sizes
Skip the deep dive if:
- You’re exclusively using high-level languages with arbitrary precision libraries
- You never touch network protocols, binary files, or hardware registers
Architecture Diagram
The journey of data through a computer’s representation layers:
flowchart LR
A[User Input: -123.456] --> B[ASCII Characters]
B --> C[Binary Encoding]
C --> D[IEEE 754 Single Precision]
D --> E[Hexadecimal: 0xC2F6E979]
E --> F[Memory Layout]
F --> G[Registers]
G --> H[ALU Operations]
Core Concepts
Binary: The Foundation
Binary is a base-2 number system using only digits 0 and 1. Each position represents a power of 2, reading right to left: 1, 2, 4, 8, 16, 32, 64, 128, 256, and so forth.
The binary number 1101 equals: 1×8 + 1×4 + 0×2 + 1×1 = 13
Computers use binary because physical circuits naturally have two states: electricity flowing (on) or not flowing (off). It’s easier to build reliable hardware that distinguishes between two clear states than between ten different voltage levels.
Hexadecimal: The Human-Friendly Binary
Hexadecimal (base-16) provides a more compact notation for binary data. Each hex digit represents exactly 4 binary bits, and hex digits 0-9, A-F map to decimal 10-15.
Binary: 1111 0101 1010 0011
Hex: F 5 A 3
Hex is preferred for:
- Memory addresses (which are often aligned to 4 or 8 bytes)
- Machine code opcodes
- Bit masks and flags
- Color codes in web development (though that uses hex for RGB values)
Signed Integers: Three Competing Visions
Computers need to represent negative numbers. Three schemes exist:
Sign-and-Magnitude — The leftmost bit represents the sign (0 = positive, 1 = negative), and the remaining bits represent the magnitude. In 8 bits, 10000001 would be -1 and 00000001 would be +1. This is intuitive but has two representations of zero (+0 and -0) and makes arithmetic circuits complex.
Two’s Complement — The dominant scheme today. Negative numbers are represented by inverting all bits and adding 1. In 8-bit two’s complement, the range is -128 to +127. This scheme has exactly one zero, and arithmetic circuits can add positive and negative numbers using the same circuitry.
// Two's complement demonstration
#include <stdio.h>
#include <stdint.h>
void print_binary(uint8_t n) {
for (int i = 7; i >= 0; i--) {
printf("%d", (n >> i) & 1);
if (i == 4) printf(" "); // Separate nibbles
}
}
int main() {
uint8_t positive = 42; // 00101010
uint8_t negative = -42; // In two's complement: 11010110
printf("+42 in binary: "); print_binary(positive); printf("\n");
printf("-42 in binary: "); print_binary(negative); printf("\n");
printf("+42 + (-42) = "); print_binary(positive + negative); printf(" (should be 0)\n");
// Key insight: -42 = ~42 + 1
printf("~42 + 1 = "); print_binary((~positive) + 1); printf("\n");
return 0;
}
Biased Representation — Used for exponents in floating-point numbers. A bias value is subtracted to get the actual exponent. The IEEE 754 single-precision uses a bias of 127 for exponents.
Floating-Point: Approximating the Reals
Real numbers (like pi or the square root of 2) can’t be exactly represented in binary with a finite number of bits. Floating-point provides a practical approximation using scientific notation.
IEEE 754 single-precision (32-bit) format:
- 1 bit: Sign (0 = positive, 1 = negative)
- 8 bits: Exponent (biased by 127)
- 23 bits: Mantissa (fractional part, normalized)
Sign (1) | Exponent (8) | Mantissa (23)
S | EEEEEEEE | MMMM...MMM
The value represented is: (-1)^S × 1.M × 2^(E-bias)
Special values exist:
- Zero: all exponent and mantissa bits are 0 (sign bit determines +0 or -0)
- Infinity: all exponent bits are 1, mantissa is 0
- NaN (Not a Number): all exponent bits are 1, mantissa is non-zero
- Denormals: all exponent bits are 0 but mantissa is non-zero — these are “subnormal” numbers with reduced precision
#!/usr/bin/env python3
"""
Exploring IEEE 754 floating-point behavior
"""
import struct
import sys
def float_to_hex(f: float) -> str:
"""Convert float to its hexadecimal representation"""
return hex(struct.unpack('<I', struct.pack('<f', f))[0])
def hex_to_float(h: str) -> float:
"""Convert hex string to float"""
return struct.unpack('<f', struct.pack('<I', int(h, 16)))[0]
def analyze_float(f: float):
"""Break down a float into its IEEE 754 components"""
packed = struct.pack('<f', f)
bits = struct.unpack('<I', packed)[0]
sign = (bits >> 31) & 1
exponent = (bits >> 23) & 0xFF
mantissa = bits & 0x7FFFFF
actual_exponent = exponent - 127
implicit_one = 1.0 + (mantissa / (2**23))
print(f"Float: {f}")
print(f"Hex: {float_to_hex(f)}")
print(f"Sign: {sign}")
print(f"Exponent: {exponent} (actual: {actual_exponent})")
print(f"Mantissa: {mantissa:06x}")
print(f"Value: (-1)^{sign} × {implicit_one:.10f} × 2^{actual_exponent}")
print()
if __name__ == "__main__":
# Classic floating-point gotcha
print("=== Classic Gotcha ===")
a, b = 0.1, 0.2
c = 0.3
print(f"0.1 + 0.2 = {a + b}")
print(f"0.3 = {c}")
print(f"(0.1 + 0.2) == 0.3: {a + b == c}") # False!
print()
print("=== Analyzing Special Values ===")
analyze_float(0.0)
analyze_float(-0.0)
analyze_float(float('inf'))
analyze_float(-float('inf'))
analyze_float(float('nan'))
print("=== Powers of 2 ===")
for exp in range(-5, 5):
f = 2.0 ** exp
analyze_float(f)
Character Encodings: Beyond Numbers
Text requires its own encoding scheme. The history is complicated:
ASCII (American Standard Code for Information Interchange) — 7 bits, 128 characters. Uppercase A-Z is 65-90, lowercase a-z is 97-122, digits 0-9 are 48-57. Control characters (0-31) handle things like newline and carriage return.
ISO-8859-1 (Latin-1) — 8 bits, 256 characters. Adds characters needed for Western European languages. First 128 match ASCII.
Unicode — Aims to represent every character in every language. Code points are written as U+XXXX, ranging from U+0000 to U+10FFFF (over 1.1 million possible characters).
UTF-8 — Variable-width encoding for Unicode. ASCII characters (U+0000 to U+007F) use 1 byte. Other characters use 2-4 bytes. UTF-8 is the dominant encoding on the web and in Linux/Unix systems.
Character | Unicode | UTF-8 bytes
----------|---------|-------------
A | U+0041 | 41
€ | U+20AC | E2 82 AC
😀 | U+1F600 | F0 9F 98 80
// Demonstrating character encoding handling in C
#include <stdio.h>
#include <uchar.h>
#include <string.h>
// UTF-8 aware string length
size_t utf8_strlen(const char *s) {
size_t count = 0;
const unsigned char *p = (const unsigned char *)s;
while (*p) {
// Count characters, not bytes
if ((*p & 0x80) == 0) {
// 1-byte character (ASCII)
p += 1;
} else if ((*p & 0xE0) == 0xC0) {
// 2-byte character
p += 2;
} else if ((*p & 0xF0) == 0xE0) {
// 3-byte character
p += 3;
} else if ((*p & 0xF8) == 0xF0) {
// 4-byte character
p += 4;
}
count++;
}
return count;
}
int main() {
const char *text = "Hello, world! 😀";
printf("String: %s\n", text);
printf("Bytes: %zu\n", strlen(text));
printf("Characters: %zu\n", utf8_strlen(text));
return 0;
}
Production Failure Scenarios
Scenario 1: The Integer Overflow in Space
What happened: On June 4, 1996, the Ariane 5 rocket flight 501 exploded 37 seconds after liftoff. The inertial reference system attempted to convert a 64-bit floating-point number to a 16-bit signed integer. The value was larger than 32,767 (the maximum for a 16-bit signed integer), causing an overflow.
Root cause: The Ariane 5 used the same software from Ariane 4, but the horizontal velocity of the larger rocket exceeded the range representable in 16 bits.
Mitigation: Always validate integer conversions. Use language features that trap overflow (like -ftrapv in GCC), or use saturation arithmetic for values that shouldn’t overflow. For critical systems, formal methods can prove overflow absence.
// Safe integer conversion with overflow checking
#include <stdint.h>
#include <stdbool.h>
#include <limits.h>
bool safe_int32_from_int64(int64_t val, int32_t *out) {
if (val > INT32_MAX) {
fprintf(stderr, "Overflow: %ld > INT32_MAX\n", val);
return false;
}
if (val < INT32_MIN) {
fprintf(stderr, "Underflow: %ld < INT32_MIN\n", val);
return false;
}
*out = (int32_t)val;
return true;
}
Scenario 2: Floating-Point Precision in Financial Calculations
What happened: A penny-counting stock trading system accumulated rounding errors over millions of transactions, eventually causing a $500,000 loss.
Root cause: Using IEEE 754 double precision for currency. While 0.1 can’t be exactly represented in binary, the tiny errors accumulated over millions of operations.
Mitigation: Use integer arithmetic for monetary values (store cents, not dollars), or use decimal floating-point libraries like IEEE 754 decimal128 where appropriate.
Scenario 3: The Year 2038 Problem
What happened: 32-bit Unix systems use signed 32-bit integers for time_t (seconds since January 1, 1970). On January 19, 2038, this value overflows to negative, representing dates in 1901.
Root cause: The choice of 32-bit signed integer for time_t made sense in 1971 but will break in 2038.
Mitigation: Migrate to 64-bit systems where time_t is 64 bits. Some systems use --enable-64-bit compilation flags. Some use a dual-date approach where pre-2038 and post-2038 routines coexist.
Trade-off Table
| Representation | Range | Precision | Storage | Use Case |
|---|---|---|---|---|
| uint8_t | 0 to 255 | Exact | 1 byte | Flags, small counters |
| int32_t | -2.1B to +2.1B | Exact | 4 bytes | General integers |
| float (IEEE 754) | ±3.4×10^38 | ~7 decimal digits | 4 bytes | Graphics, approximate math |
| double (IEEE 754) | ±1.8×10^308 | ~15 decimal digits | 8 bytes | Scientific computing |
| int64_t | -9.2Q to +9.2Q | Exact | 8 bytes | Large counts, timestamps |
Implementation Snippets
Bit Manipulation for Flags
#!/usr/bin/env python3
"""
Bit manipulation utilities for flags and masks
"""
from typing import Callable
def set_bit(value: int, bit: int) -> int:
"""Set a specific bit to 1"""
return value | (1 << bit)
def clear_bit(value: int, bit: int) -> int:
"""Set a specific bit to 0"""
return value & ~(1 << bit)
def toggle_bit(value: int, bit: int) -> int:
"""Toggle a specific bit"""
return value ^ (1 << bit)
def get_bit(value: int, bit: int) -> int:
"""Get the value of a specific bit (0 or 1)"""
return (value >> bit) & 1
def is_power_of_two(n: int) -> bool:
"""Check if n is a power of two using bit trick"""
return n > 0 and (n & (n - 1)) == 0
def count_set_bits(n: int) -> int:
"""Count the number of set bits (population count)"""
count = 0
while n:
n &= n - 1 # Clear lowest set bit
count += 1
return count
# Example: File permission flags
READ = 0b100 # 4
WRITE = 0b010 # 2
EXECUTE = 0b001 # 1
def add_permission(perms: int, new_perm: int) -> int:
"""Add a permission using bitwise OR"""
return perms | new_perm
def remove_permission(perms: int, rem_perm: int) -> int:
"""Remove a permission using bitwise AND with complement"""
return perms & ~rem_perm
def has_permission(perms: int, check_perm: int) -> bool:
"""Check if a permission is set"""
return (perms & check_perm) != 0
# Demonstration
perms = READ | WRITE # 0b110 = 6
print(f"Initial permissions: {perms:#o} (octal), {perms:#06b} (binary)")
perms = add_permission(perms, EXECUTE)
print(f"After adding EXECUTE: {perms:#o}")
perms = remove_permission(perms, WRITE)
print(f"After removing WRITE: {perms:#o}")
print(f"Has READ? {has_permission(perms, READ)}")
print(f"Has WRITE? {has_permission(perms, WRITE)}")
Converting Between Bases
#!/usr/bin/env python3
"""
Number base conversion utilities
"""
def int_to_hex(n: int, width: int = 0) -> str:
"""Convert integer to hexadecimal string with optional zero-padding"""
hex_chars = "0123456789ABCDEF"
if n == 0:
result = "0"
else:
result = ""
while n > 0:
result = hex_chars[n % 16] + result
n //= 16
return "0x" + result.zfill(width) if width > 0 else "0x" + result
def int_to_binary(n: int, width: int = 0) -> str:
"""Convert integer to binary string with optional zero-padding"""
if n == 0:
result = "0"
else:
result = ""
while n > 0:
result = str(n % 2) + result
n //= 2
return "0b" + result.zfill(width) if width > 0 else "0b" + result
def hex_to_int(hex_str: str) -> int:
"""Convert hexadecimal string to integer"""
hex_str = hex_str.removeprefix("0x").removeprefix("0X")
return int(hex_str, 16)
def binary_to_int(bin_str: str) -> int:
"""Convert binary string to integer"""
bin_str = bin_str.removeprefix("0b").removeprefix("0B")
return int(bin_str, 2)
def float_to_ieee754(value: float) -> tuple[int, int, int]:
"""Break down float into IEEE 754 components"""
import struct
packed = struct.pack('>f', value)
bits = int.from_bytes(packed, 'big')
sign = (bits >> 31) & 1
exponent = (bits >> 23) & 0xFF
mantissa = bits & 0x7FFFFF
return sign, exponent, mantissa
# Demonstration
print("=== Base Conversions ===")
print(f"255 in hex: {int_to_hex(255)}")
print(f"255 in binary: {int_to_binary(255)}")
print(f"0xFF in decimal: {hex_to_int('0xFF')}")
print(f"0b1010 in decimal: {binary_to_int('0b1010')}")
print("\n=== IEEE 754 Analysis ===")
sign, exp, mant = float_to_ieee754(3.14159)
print(f"π (3.14159): sign={sign}, exponent={exp}, mantissa={mant:06x}")
print(f"Hex representation: {int_to_hex(sign << 31 | exp << 23 | mant, 8)}")
Observability Checklist
When debugging data representation issues:
- Integer overflow detection — Enable compiler overflow flags (
-ftrapv,-fsanitize=integer) - Floating-point exception flags — Check
FE_DIVBYZERO,FE_INVALID,FE_OVERFLOWafter calculations - NaN propagation — Use
isnan()andisinf()to detect special values - Character encoding validation — Verify strings are valid UTF-8 using
iconvor similar - Endianness checks — Use union or struct tricks to detect little vs big endian
- Precision loss tracking — Log when double is coerced to float
Useful diagnostic commands:
# Check for NaN and infinity in a running process
# Attach with GDB and call: isnan(), isinf()
# Hexdump to see raw memory representation
hexdump -C your_file.bin | head -20
# Check floating-point environment
python3 -c "import sys; print(sys.float_info)"
Common Pitfalls / Anti-Patterns
Integer Overflow to Buffer Overflow — Attackers exploit integer overflows that cause buffer lengths to wrap, leading to memory corruption. Always validate integer inputs before using them for allocation or indexing.
Format String Vulnerabilities — Using printf with a user-controlled format string allows reading arbitrary memory (%x, %s) or writing arbitrary values (%n). Always use literal format strings: printf("%s", user_input) not printf(user_input).
Endianness and Network Protocols — Network byte order is big-endian. Different architectures may use different byte orders. Always use htons(), htonl(), ntohs(), ntohl() for network protocol handling.
Unicode Normalization Attacks — Different Unicode sequences can represent the same string. Attackers exploit this for phishing (é vs e combined with acute accent) or to bypass validation. Normalize using NFC or NFD before comparison.
Common Pitfalls / Anti-patterns
Pitfall: Assuming int is 32 bits
In C, int is implementation-defined. Write int32_t or int64_t when you need specific sizes. Embedded systems might have 16-bit int.
Pitfall: Comparing floats for equality
0.1 + 0.2 == 0.3 is false. Use fabs(a - b) < epsilon for approximate equality, where epsilon is your tolerance (like 1e-9 for double).
Pitfall: Assuming char is signed
Whether char is signed or unsigned is implementation-defined. Cast to unsigned char when you need to ensure the range 0-255.
Pitfall: Forgetting byte order in serialization
Writing int directly to a file and reading on a different architecture will corrupt data. Use explicit serialization with defined byte order.
Pitfall: Assuming sizeof(char) == 1
Technically sizeof(char) is always 1, but char might be 9, 16, or 32 bits on unusual architectures. For bytes, use uint8_t.
Quick Recap Checklist
- Binary (base-2) uses powers of 2; each bit represents an increasing power of 2
- Hexadecimal (base-16) compactly represents 4 binary bits per hex digit
- Two’s complement is the standard for signed integers; it has one zero and symmetric range
- IEEE 754 floating-point approximates real numbers with sign, exponent, and mantissa
- Special floating-point values: zero, infinity (positive/negative), NaN, denormals
- ASCII defines 128 characters; Unicode extends to millions; UTF-8 is variable-width
- Integer overflow wraps silently in C; use overflow-checking modes or explicit checks
- Floating-point precision errors accumulate; use decimal types for financial calculations
- Always validate integer conversions and array bounds before use
- Bit manipulation is powerful for flags, masks, and optimized code
Interview Questions
Because 0.1 is a rational number whose binary representation is infinite. In decimal, 1/3 = 0.333... repeats forever. Similarly, in binary (base-2), only fractions with denominators that are powers of 2 terminate. Since 0.1 = 1/10, and 10 = 2 × 5, the denominator has factor 5 which isn't a power of 2, so the representation repeats infinitely. IEEE 754 must truncate this infinite repetition, introducing a rounding error. This is why (0.1 + 0.2) == 0.3 evaluates to false in most languages.
Endianness describes the byte order of multi-byte values in memory. In big-endian, the most significant byte comes first (at the lowest address). In little-endian, the least significant byte comes first. For the 32-bit value 0x12345678 stored at address 0x00:
- Big-endian:
0x00=0x12, 0x01=0x34, 0x02=0x56, 0x03=0x78 - Little-endian:
0x00=0x78, 0x01=0x56, 0x02=0x34, 0x03=0x12
Network protocols use big-endian (also called "network byte order"). Intel and AMD x86 processors use little-endian. ARM supports both. This matters when reading binary protocols, debugging memory dumps, or transferring data between different systems.
Infinity: All exponent bits are 1, mantissa is 0. Positive infinity is 0x7F800000, negative is 0xFF800000. Operations like 1.0/0.0 produce infinity.
NaN (Not a Number): All exponent bits are 1, mantissa is non-zero. Quiet NaNs propagate silently through calculations; signaling NaNs trigger exceptions. 0.0/0.0, sqrt(-1), and inf - inf all produce NaN. NaN comparisons are always false — even NaN == NaN is false.
Denormals (subnormal numbers): All exponent bits are 0 but mantissa is non-zero. These are numbers between the smallest normal number and zero. They have reduced precision but prevent underflow — instead of jumping from minimum positive to zero, there's a smooth transition. Denormals can be up to 1000× slower on some hardware.
For unsigned integers, overflow is defined: the value wraps around using modulo arithmetic. For uint8_t, 255 + 1 = 0 and 0 - 1 = 255.
For signed integers, overflow is undefined behavior in C/C++. The C standard doesn't specify what happens, allowing compilers to assume it never occurs and optimize accordingly. A common result is wrapping (same as unsigned), but compilers may also eliminate code paths that would overflow, leading to surprising results. In practice:
INT_MAX + 1often wraps toINT_MIN- The compiler may assume
x+1 > xalways holds, breaking loops - Attackers can exploit overflow to bypass security checks
Solutions: Use -ftrapv or -fsanitize=undefined in GCC/Clang, or manually check before operations.
UTF-8 is a variable-width encoding for Unicode using 1 to 4 bytes:
- 1 byte:
0xxxxxxx— ASCII (U+0000 to U+007F). This means ASCII text is valid UTF-8. - 2 bytes:
110xxxxx 10xxxxxx— Start with110, continuation bytes start with10. Covers U+0080 to U+07FF. - 3 bytes:
1110xxxx 10xxxxxx 10xxxxxx— Covers U+0800 to U+FFFF. - 4 bytes:
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx— Covers U+10000 to U+10FFFF.
Advantages: ASCII compatibility, self-synchronizing (you can find character boundaries), no null bytes (allowing null-terminated strings), and efficient for Western text. Disadvantage: common CJK characters need 3 bytes each.
Sign-magnitude uses the leftmost bit as a sign flag (0 for positive, 1 for negative) and the remaining bits for magnitude. In 8-bit: +1 is 00000001, -1 is 10000001. This creates two zeros (+0 and -0), which complicates arithmetic.
Two's complement encodes negative numbers by inverting all bits and adding 1. In 8-bit: -1 is 11111111. This has exactly one zero and symmetric range (-128 to +127 for 8-bit), making arithmetic circuits simpler since addition and subtraction use the same circuitry.
Most modern systems use two's complement because the same adder circuit handles both positive and negative addition, whereas sign-magnitude requires different logic for each case.
A buffer overflow occurs when a program writes data beyond the boundaries of allocated memory. If an attacker controls the overflowed data, they can overwrite adjacent memory including return addresses or function pointers, potentially hijacking execution.
Integer overflow can cause buffer overflows when integers are used for buffer sizes. For example, if size_t count = 0xFFFFFFFF (4294967295) is cast to int, it becomes -1, potentially bypassing size checks:
int len = (int)large_count; // -1 due to overflow
char* buf = malloc(len); // malloc(-1) allocates small buffer
memcpy(buf, data, len); // Overflow!
Mitigations include bounds checking, language-level overflow detection, and stack canaries.
Denormal numbers (also called subnormal numbers) have all exponent bits set to 0 but a non-zero mantissa. They represent values between the smallest normal number and zero, using the formula (-1)^S × 0.M × 2^(1-bias).
Without denormals, there would be a sharp gap between the smallest positive normal number and zero. Denormals provide a gradual underflow, enabling a smooth transition and preserving mathematical properties like x + y != x even for very small y.
However, denormals can be 100-1000x slower on some hardware because they require special handling in the FPU. Some systems and applications disable denormal support for performance reasons, accepting the precision loss.
Gray code is a binary numeral system where two consecutive values differ in only one bit. For example, binary sequences: 000, 001, 010, 011, 100, 101, 110, 111. Gray code for same sequence: 000, 001, 011, 010, 110, 111, 101, 100.
The property of changing only one bit at a time eliminates the problem of binary counters where multiple bits can change simultaneously (e.g., 011 to 100 changes all three bits). If you sample during the transition, you might read spurious values like 001 or 010.
Uses:
- Rotary encoders: Position sensors where mechanical contacts may make/break at different times — Gray code ensures any sampled value during transition is valid
- K-maps: Gray code ordering groups minterms by adjacency, simplifying Boolean simplification
- Error correction: In some communication systems, Gray code reduces bit error impact
To convert binary to Gray code: G = B ^ (B >> 1) (XOR with right-shifted version). To convert back: binary bit N is XOR of Gray bit N with all higher Gray bits.
Normalization ensures the leading bit of the mantissa is always 1 (for normal numbers), maximizing the precision available in the given bit width. In IEEE 754, normalized numbers have an implicit leading 1 before the binary point: 1.M × 2^E.
For example, the number 5.0 in binary is 101.0, which normalizes to 1.01 × 2^2. The mantissa stores only the significant digits (01), while the exponent stores 2 (biased). This format achieves maximum precision—every bit of the mantissa is meaningful.
Without normalization, the same number could be represented as 0.101 × 2^3 or 10.1 × 2^1, wasting precision bits. Denormal numbers (where exponent bits are all zero but mantissa is non-zero) sacrifice this property to provide gradual underflow, but they have reduced precision compared to normalized numbers of the same size.
IEEE 754 defines several rounding modes accessible via the floating-point environment:
- Round to nearest (even): Default. Ties round to the nearest even last significant digit—this prevents systematic bias in averaging
- Round toward zero: Truncates the result, also called "chop" or "round toward minus infinity" for negative numbers
- Round toward +infinity: Always rounds up (toward positive infinity)
- Round toward -infinity: Always rounds down (toward negative infinity)
The rounding mode affects every floating-point operation. For example, 2.5 rounded to nearest integer becomes 2 (even), while 3.5 becomes 4 (even). This "round to even" rule is crucial for statistical accuracy over millions of operations.
In C/C++, the fesetround() function from <fenv.h> controls the rounding mode. In Python, Decimal objects allow explicit specification of rounding.
BCD (Binary-Coded Decimal) encodes each decimal digit (0-9) in its own 4-bit binary representation. For example, the decimal number 59 is stored as 0101 1001 (0x59), not the binary 00111011 (0x3B).
BCD advantages:
- Exact representation of decimal fractions (no binary rounding)
- Easy conversion to human-readable text
- Used in financial systems where exact decimal arithmetic is required
BCD disadvantages:
- Wasteful: 4 bits can represent 0-15, but BCD only uses 0-9
- Arithmetic operations are more complex (need adjustment after each digit addition)
- Less efficient storage than binary
Many IBM mainframes and embedded systems still use BCD for financial calculations. The packed BCD format stores two digits per byte, while unpacked BCD uses one byte per digit with the upper 4 bits as 0xF.
Guard bits and sticky bits are extra precision bits used during intermediate floating-point calculations to maintain accuracy before final rounding.
Modern floating-point units typically compute with 80-bit extended precision internally, even when the result is stored as 64-bit. The extra bits provide a buffer for precise rounding. When adding two floating-point numbers with different exponents, the smaller number is shifted right (denormalized) to align with the larger. The bits shifted past the precision boundary go into:
- Guard bits: The first few bits beyond the precision boundary (typically 2 bits)
- Sticky bit: OR of all bits shifted past the guard bits—if any bit is 1, the sticky bit is 1
These bits determine rounding behavior. For "round to nearest," if guard=1 and sticky=1, round up even if the mantissa is exactly representable. This prevents systematic rounding errors in chains of operations.
Grey code (named after Frank Grey, Bell Labs) is a binary numbering system where only one bit changes between consecutive values. For example:
Decimal Binary Grey Code
0 000 000
1 001 001
2 010 011
3 011 010
4 100 110
5 101 111
6 110 101
7 111 100
The key property: adjacent values differ in exactly one bit. This eliminates the "glitch" problem in binary counting where multiple bits can change simultaneously (e.g., 011 to 100 changes all three bits). In binary, different signal paths have different propagation delays, so you might see 111 (invalid state) briefly between 011 and 100.
Uses:
- Rotary encoders: Position sensors where only one track changes at a time
- Analog-to-digital converters: Minimizes conversion errors from timing skew
- Error-correcting codes: Used in DQPSK modulation
You can determine endianness through several methods without executing code:
File system check:
cat /sys/class/misc/mem/cdev # If this exists and shows memory devices
Using the file command:
file /bin/ls # Shows architecture - might hint at endianness
Network byte order:
python3 -c "import struct; print('Big' if struct.pack('H', 1) == b'\\x00\\x01' else 'Little')"
Direct system calls:
getconf LONG_BIT # Returns 32 or 64 (but doesn't determine endianness)
# Or check via sysctl
sysctl -n hw.byteorder
Common architectures: x86 and ARM (usually) are little-endian. Network protocols use big-endian. Some ARM processors support both via bi-endian memory access. Most consumer devices (Intel/AMD CPUs, ARM phones) use little-endian.
The critical gotcha: bit pattern is preserved when casting between signed and unsigned of the same width, but interpretation changes.
Consider uint8_t u = 200; (binary: 11001000). When cast to int8_t, it becomes -56 because the sign bit (128) is now interpreted as negative. The bit pattern 11001000 represents 200 in unsigned and -56 in signed two's complement.
This causes surprising behavior:
uint8_t u = 255; // Binary: 11111111
int8_t s = (int8_t)u; // s = -1
// But:
printf("%d", s); // prints -1
// And:
printf("%u", (uint8_t)s); // prints 255
When performing arithmetic, unsigned promotions can cause unexpected behavior:
int8_t x = -1;
int8_t y = 2;
int result = x + y; // result = 1 (correct)
// But:
uint8_t ux = (uint8_t)x; // 255
uint8_t uy = (uint8_t)y; // 2
uint8_t ur = ux + uy; // ur = 1 (255 + 2 = 257, truncated to uint8_t = 1)
Excess-K encoding (also called bias notation) represents numbers by adding a fixed bias (K) to the true value, then encoding as unsigned. The exponent field in IEEE 754 uses excess-127: a stored exponent of 127 means actual exponent of 0.
For an N-bit excess-K field:
- Stored values range from 0 to 2^N-1
- True values range from -K to 2^N-1-K
- The bias K is typically 2^(N-1) - 1 (so 127 for 8-bit)
The advantage: unsigned integer comparison of the encoded values gives the correct order for signed exponent comparison. If exponent A > exponent B (as signed integers), then the excess-K encoded A is also greater than encoded B. This allows hardware to compare floating-point magnitudes using the same circuitry as integer comparison.
The bias is also designed so the midpoint (127 for 8-bit) represents approximately zero, allowing both positive and negative exponents symmetrically.
BCD (Binary-Coded Decimal): Each decimal digit 0-9 is encoded in 4 bits (nibble). One byte holds two digits.
Packed BCD: Same as BCD—two digits per byte. For example, 12345 in packed BCD is 01 23 45. The sign can be stored in the rightmost nibble: C for positive, D for negative.
Zoned decimal: Each digit is stored in one full byte, with zone bits (usually 0xF or 0x3) in the upper nibble. IBM EBCDIC uses this: 123 becomes F1 F2 F3. For signed zoned decimal, the sign is in the zone of the last digit.
Key differences:
- Packed BCD: Compact (2 digits/byte), efficient for arithmetic
- Zoned decimal: Human-readable hex dumps (easy debugging), used in legacy IBM systems
COBOL programs typically use zoned decimal for compatibility with IBM mainframes. The conversion between formats is common in enterprise integration.
Left shifting a signed integer by 1 bit can cause undefined behavior in C/C++ if the result overflows. Shifting left is equivalent to multiplication by 2, but overflow of signed integers is undefined.
Consider int x = 1073741824; (0x40000000, 2^30). If we compute x << 1, the result is 0x80000000, which is INT_MIN in two's complement. This is technically undefined behavior even though it has a well-defined bit pattern.
For unsigned integers, left shift is well-defined: bits shift left, zeros fill from the right, and overflow bits are discarded.
For negative signed integers, left shift is undefined because left shift of negative values is explicitly undefined in the C standard (allows compiler to assume it never happens for optimization).
Safe shifting practices:
// Safe: Use unsigned for bit operations uint32_t safe_shift(uint32_t x, int bits) { return x << bits; // Well-defined for unsigned }
// Check before shift if signed: int safe_left_shift(int x, int bits) { if (x > (INT_MAX >> bits)) { // Overflow would occur return -1; // Or handle error } return x << bits; }
UTF-16 uses 16-bit code units, but Unicode has over 1 million possible characters—more than 65535. To represent characters beyond the Basic Multilingual Plane (BMP, U+0000 to U+FFFF), UTF-16 uses surrogate pairs.
The range U+D800 to U+DFFF (64 KB) is reserved for surrogates and is never used for actual characters:
- High surrogate: U+D800 to U+DBFF (1024 values)
- Low surrogate: U+DC00 to U+DFFF (1024 values)
To encode U+10000 and above:
- Subtract 0x10000, giving a 20-bit value (0 to 0xFFFFF)
- Add 0xD800 to (value >> 10) for high surrogate (range: 0xD800-0xDBFF)
- Add 0xDC00 to (value & 0x3FF) for low surrogate (range: 0xDC00-0xDFFF)
Example: The emoji U+1F600 (Grinning Face) = 0x1F600. High surrogate = 0xD83C, Low surrogate = 0xDE00.
UTF-16 is the native string encoding in Windows (where wchar_t is 16 bits), Java, and JavaScript. It requires surrogate pair handling for characters outside the BMP, making string indexing O(n) rather than O(1).
Further Reading
- IEEE 754 Floating-Point Standard — The official IEEE 754-2019 standard
- What Every Computer Scientist Should Know About Floating-Point — David Goldberg’s seminal paper
- Two’s Complement arithmetic — Cornell lecture notes on two’s complement
- Unicode Consortium — Official Unicode standard and character database
- The Art of Assembly Language — Randall Hyde’s comprehensive x86 assembly text
Conclusion
Number systems and data representation form the bedrock of all digital computation. From the binary nature of electronic circuits to the complex IEEE 754 floating-point standard, these representation choices ripple through every layer of the system stack. The decisions made decades ago—two’s complement for signed integers, IEEE 754 for floating-point, UTF-8 for text—continue to shape how we build software today.
Understanding these representations isn’t just academic—it directly impacts how you debug numerical issues, design protocols, and write robust code. The integer overflow that destroyed the Ariane 5 rocket, the floating-point precision that causes unexpected test failures, and the endianness bugs that corrupt network packets—all stem from a deep understanding of how data is represented.
Continue your exploration of low-level concepts by studying boolean logic and gates to understand how these representations are manipulated by hardware, or move on to instruction set architecture to see how these data types are consumed by a processor.
Category
Related Posts
ASLR & Stack Protection
Address Space Layout Randomization, stack canaries, and exploit mitigation techniques
Assembly Language Basics: Writing Code the CPU Understands
Learn to read and write simple programs in x86 and ARM assembly, understanding registers, instructions, and the art of thinking in low-level operations.
Boolean Logic & Gates
Understanding AND, OR, NOT gates and how they combine into arithmetic logic units — the building blocks of every processor.