Advanced 14 min

Memory Alignment

Understand memory alignment for performance and correctness

Learn how CPUs access memory and how proper alignment makes your code faster and prevents crashes on some architectures.

A Simple Example

#include <iostream>

struct Unoptimized {
    char a;      // 1 byte
    // 3 bytes padding
    int b;       // 4 bytes
    char c;      // 1 byte
    // 7 bytes padding
    double d;    // 8 bytes
};

struct Optimized {
    double d;    // 8 bytes
    int b;       // 4 bytes
    char a;      // 1 byte
    char c;      // 1 byte
    // 2 bytes padding
};

int main() {
    std::cout << "Unoptimized size: " << sizeof(Unoptimized) << " bytes\n";
    std::cout << "Optimized size: " << sizeof(Optimized) << " bytes\n";

    std::cout << "\nAlignment requirements:\n";
    std::cout << "char: " << alignof(char) << "\n";
    std::cout << "int: " << alignof(int) << "\n";
    std::cout << "double: " << alignof(double) << "\n";

    return 0;
}

Breaking It Down

What is Memory Alignment?

  • What it means: Data addresses must be multiples of their size (int at 4-byte boundary, double at 8-byte boundary)
  • Why: CPUs read memory in fixed-size chunks (typically 4 or 8 bytes) - aligned data fits cleanly
  • Consequence: Misaligned data requires multiple reads and bit-shifting, which is slower
  • Remember: A 4-byte int at address 0x1000 is aligned, at 0x1002 is misaligned

Struct Padding

  • What it does: Compiler inserts unused bytes to align each member
  • Rule: Each member is aligned according to its type (char=1, int=4, double=8)
  • Total size: Struct size is rounded up to largest member's alignment
  • Remember: Order matters! Group larger types first to minimize padding

Performance Impact

  • Cache efficiency: Aligned data fits cleanly in cache lines (typically 64 bytes)
  • Speed difference: Aligned access can be 2-10x faster than misaligned
  • Architecture differences: x86 handles misaligned access (slower), ARM may crash
  • Remember: For large arrays of structs, alignment savings multiply significantly

Controlling Alignment

  • sizeof(T): Returns total size including padding
  • alignof(T): Returns alignment requirement for type T
  • alignas(N): Forces specific alignment (e.g., alignas(64) for cache line)
  • Remember: Use these tools to inspect and control memory layout

Why This Matters

  • CPUs access aligned memory significantly faster - sometimes 2-10x faster than misaligned access.
  • Misaligned access can be slower or even crash on ARM and other architectures.
  • Compilers add padding to structs to satisfy alignment requirements - a poorly-ordered struct can waste significant memory.
  • Understanding alignment helps you write cache-friendly, performance-critical code for games, databases, and systems programming.

Critical Insight

Think of memory alignment like parking cars in parking spots. If each spot is 8 feet wide and your car is 6 feet, you still take up a full spot - the 2 feet is "padding".

Now imagine you have a bus (8 feet), a car (4 feet), and two motorcycles (1 foot each). If you park them in that order, each takes its full spot. But if you park motorcycle-bus-car-motorcycle, you waste lots of space! The bus forces alignment, leaving gaps before it.

This is exactly how struct members work. Put large members first (double, then int, then char) to minimize padding and save memory.

Best Practices

Order struct members by size: Put largest types first (double, long, int, short, char) to minimize padding.

Use alignof() to check requirements: Understand alignment needs with alignof(T) before optimizing.

Pad explicitly for clarity: Use char padding[3]; to document intentional padding instead of relying on implicit padding.

Consider cache lines: For high-performance code, align structs to 64-byte cache line boundaries with alignas(64).

Common Mistakes

Ignoring struct order: Random member ordering can double or triple struct size due to padding.

Assuming packed layout: Without #pragma pack or __attribute__((packed)), compilers add padding automatically.

Portable assumptions: Alignment rules vary by architecture (x86 vs ARM) and compiler.

Array implications: Padding in a struct is multiplied by array size - a 10-byte padded struct in an array of 10,000 wastes 100KB.

Debug Challenge

This struct wastes memory due to poor ordering. Click the highlighted line to see the optimized version:

1 struct Data {
2 char a; // 1 byte + 7 padding
3 double d; // 8 bytes
4 char b; // 1 byte + 3 padding
5 int i; // 4 bytes
6 };
7 // Size: 24 bytes (lots of padding!)

Quick Quiz

  1. Why do compilers add padding to structs?
To ensure each member is aligned according to its type requirements
To waste memory
To make structs bigger
  1. How can you minimize padding in a struct?
Order members from largest to smallest
Order members alphabetically
Use only char types
  1. What happens on ARM processors if you access misaligned data?
Program may crash with alignment fault
It works but is slightly slower
Nothing different from x86

Practice Playground

Time to try out what you just learned! Play with the example code below, experiment by making changes and running the code to deepen your understanding.

Lesson Progress

  • Fix This Code
  • Quick Quiz
  • Practice Playground - run once