Coming Soon

This lesson is currently being developed

Timing your code

Measure and optimize program performance.

Iterators and Algorithms (under construction)
Chapter
Beginner
Difficulty
30min
Estimated Time

What to Expect

Comprehensive explanations with practical examples

Interactive coding exercises to practice concepts

Knowledge quiz to test your understanding

Step-by-step guidance for beginners

Development Status

In Progress

Content is being carefully crafted to provide the best learning experience

Preview

Early Preview Content

This content is still being developed and may change before publication.

18.4 — Timing your code

In this lesson, you'll learn how to measure the performance of your C++ code, analyze timing results, and make informed decisions about optimization. Understanding performance measurement is crucial for writing efficient programs.

Why timing matters

Performance measurement helps you:

  1. Identify bottlenecks: Find the slowest parts of your program
  2. Compare alternatives: Choose between different algorithms or implementations
  3. Validate optimizations: Confirm that changes actually improve performance
  4. Meet requirements: Ensure your code runs fast enough for real-world use
  5. Understand complexity: Verify theoretical time complexity in practice

Think of timing like a stopwatch for your code:

  • Measure race times: How long does each function take?
  • Compare runners: Which algorithm is faster?
  • Track improvement: Did practice make you faster?
  • Set goals: Can you meet the performance target?

The basics of code timing

Using std::chrono

C++11 introduced <chrono>, a powerful library for time measurement:

#include <iostream>
#include <chrono>
#include <vector>
#include <algorithm>

int main()
{
    std::vector<int> data = {5, 2, 8, 1, 9, 3};
    
    // Record start time
    auto start = std::chrono::high_resolution_clock::now();
    
    // Code to time
    std::sort(data.begin(), data.end());
    
    // Record end time
    auto end = std::chrono::high_resolution_clock::now();
    
    // Calculate duration
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
    
    std::cout << "Sorting took: " << duration.count() << " microseconds" << std::endl;
    
    // Print sorted data to verify it worked
    std::cout << "Sorted: ";
    for (int n : data)
        std::cout << n << " ";
    std::cout << std::endl;
    
    return 0;
}

Output:

Sorting took: 5 microseconds
Sorted: 1 2 3 5 8 9 

Understanding time units

Chrono provides various time units for different precision needs:

#include <iostream>
#include <chrono>
#include <thread>

void demonstrateTimeUnits()
{
    auto start = std::chrono::high_resolution_clock::now();
    
    // Simulate some work with a small delay
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    
    auto end = std::chrono::high_resolution_clock::now();
    
    // Same duration, different units
    auto nanoseconds = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
    auto microseconds = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
    auto milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    auto seconds = std::chrono::duration_cast<std::chrono::seconds>(end - start);
    
    std::cout << "Time taken:" << std::endl;
    std::cout << "  " << nanoseconds.count() << " nanoseconds" << std::endl;
    std::cout << "  " << microseconds.count() << " microseconds" << std::endl;
    std::cout << "  " << milliseconds.count() << " milliseconds" << std::endl;
    std::cout << "  " << seconds.count() << " seconds" << std::endl;
}

int main()
{
    demonstrateTimeUnits();
    return 0;
}

Output:

Time taken:
  100254000 nanoseconds
  100254 microseconds
  100 milliseconds
  0 seconds

Creating a reusable timer class

For repeated measurements, a timer class makes timing easier:

#include <iostream>
#include <chrono>
#include <string>

class Timer
{
private:
    std::chrono::high_resolution_clock::time_point m_start;
    std::string m_name;
    
public:
    Timer(const std::string& name = "Operation") 
        : m_name(name)
    {
        m_start = std::chrono::high_resolution_clock::now();
    }
    
    ~Timer()
    {
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - m_start);
        std::cout << m_name << " took: " << duration.count() << " microseconds" << std::endl;
    }
    
    // Manual measurement (doesn't rely on destructor)
    void stop()
    {
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - m_start);
        std::cout << m_name << " took: " << duration.count() << " microseconds" << std::endl;
        
        // Reset timer for potential reuse
        m_start = std::chrono::high_resolution_clock::now();
    }
};

// Helper function for timing any callable
template<typename Func>
void timeFunction(Func func, const std::string& description = "Function")
{
    Timer timer(description);
    func();
} // Timer destructor automatically prints timing

int main()
{
    std::vector<int> data(1000);
    
    // Fill with some data
    for (size_t i = 0; i < data.size(); ++i)
    {
        data[i] = data.size() - i;
    }
    
    // Time using RAII (automatic timing)
    {
        Timer timer("Vector sorting");
        std::sort(data.begin(), data.end());
    } // Timer automatically prints result here
    
    // Time using template function
    timeFunction([&data]() {
        std::reverse(data.begin(), data.end());
    }, "Vector reversal");
    
    return 0;
}

Output:

Vector sorting took: 67 microseconds
Vector reversal took: 12 microseconds

Comparing algorithm performance

Let's compare different sorting algorithms to see timing in action:

#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <random>

// Selection sort implementation for comparison
void selectionSort(std::vector<int>& arr)
{
    size_t n = arr.size();
    for (size_t i = 0; i < n - 1; ++i)
    {
        size_t minIndex = i;
        for (size_t j = i + 1; j < n; ++j)
        {
            if (arr[j] < arr[minIndex])
            {
                minIndex = j;
            }
        }
        if (minIndex != i)
        {
            std::swap(arr[i], arr[minIndex]);
        }
    }
}

// Bubble sort implementation for comparison
void bubbleSort(std::vector<int>& arr)
{
    size_t n = arr.size();
    for (size_t i = 0; i < n - 1; ++i)
    {
        bool swapped = false;
        for (size_t j = 0; j < n - i - 1; ++j)
        {
            if (arr[j] > arr[j + 1])
            {
                std::swap(arr[j], arr[j + 1]);
                swapped = true;
            }
        }
        if (!swapped) break; // Array is sorted
    }
}

// Timer function template
template<typename Func>
long long timeFunction(Func func)
{
    auto start = std::chrono::high_resolution_clock::now();
    func();
    auto end = std::chrono::high_resolution_clock::now();
    return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}

std::vector<int> generateRandomData(size_t size)
{
    std::vector<int> data(size);
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution<> dis(1, 1000);
    
    for (size_t i = 0; i < size; ++i)
    {
        data[i] = dis(gen);
    }
    return data;
}

int main()
{
    const size_t SIZE = 5000;
    
    std::cout << "Comparing sorting algorithms with " << SIZE << " elements:" << std::endl;
    std::cout << std::string(50, '-') << std::endl;
    
    // Test each algorithm
    {
        auto data = generateRandomData(SIZE);
        long long time = timeFunction([&data]() {
            std::sort(data.begin(), data.end());
        });
        std::cout << "std::sort:      " << time << " microseconds" << std::endl;
    }
    
    {
        auto data = generateRandomData(SIZE);
        long long time = timeFunction([&data]() {
            selectionSort(data);
        });
        std::cout << "Selection sort: " << time << " microseconds" << std::endl;
    }
    
    {
        auto data = generateRandomData(SIZE);
        long long time = timeFunction([&data]() {
            bubbleSort(data);
        });
        std::cout << "Bubble sort:    " << time << " microseconds" << std::endl;
    }
    
    return 0;
}

Sample Output:

Comparing sorting algorithms with 5000 elements:
--------------------------------------------------
std::sort:      1234 microseconds
Selection sort: 45678 microseconds
Bubble sort:    67890 microseconds

Measuring with different data sizes

To understand algorithm complexity, measure performance across different input sizes:

#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <random>
#include <iomanip>

template<typename Func>
long long timeFunction(Func func)
{
    auto start = std::chrono::high_resolution_clock::now();
    func();
    auto end = std::chrono::high_resolution_clock::now();
    return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}

std::vector<int> generateRandomData(size_t size)
{
    std::vector<int> data(size);
    std::random_device rd;
    std::mt19937 gen(42); // Fixed seed for consistent results
    std::uniform_int_distribution<> dis(1, 10000);
    
    for (size_t i = 0; i < size; ++i)
    {
        data[i] = dis(gen);
    }
    return data;
}

void selectionSort(std::vector<int>& arr)
{
    size_t n = arr.size();
    for (size_t i = 0; i < n - 1; ++i)
    {
        size_t minIndex = i;
        for (size_t j = i + 1; j < n; ++j)
        {
            if (arr[j] < arr[minIndex])
            {
                minIndex = j;
            }
        }
        if (minIndex != i)
        {
            std::swap(arr[i], arr[minIndex]);
        }
    }
}

int main()
{
    std::vector<size_t> sizes = {100, 500, 1000, 2000, 4000};
    
    std::cout << std::setw(6) << "Size" 
              << std::setw(15) << "std::sort (μs)" 
              << std::setw(20) << "Selection sort (μs)"
              << std::setw(10) << "Ratio" << std::endl;
    std::cout << std::string(55, '-') << std::endl;
    
    for (size_t size : sizes)
    {
        // Time std::sort
        auto data1 = generateRandomData(size);
        long long stdTime = timeFunction([&data1]() {
            std::sort(data1.begin(), data1.end());
        });
        
        // Time selection sort
        auto data2 = generateRandomData(size);
        long long selTime = timeFunction([&data2]() {
            selectionSort(data2);
        });
        
        double ratio = static_cast<double>(selTime) / stdTime;
        
        std::cout << std::setw(6) << size
                  << std::setw(15) << stdTime
                  << std::setw(20) << selTime
                  << std::setw(9) << std::fixed << std::setprecision(1) << ratio << "x" << std::endl;
    }
    
    return 0;
}

Sample Output:

  Size   std::sort (μs)   Selection sort (μs)     Ratio
-------------------------------------------------------
   100              12                  567      47.3x
   500              45                13245     294.3x
  1000              92                52890     574.9x
  2000             198               211234    1066.8x
  4000             423               845678    1999.7x

Timing best, average, and worst cases

Different input patterns can significantly affect algorithm performance:

#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>

template<typename Func>
long long timeFunction(Func func)
{
    auto start = std::chrono::high_resolution_clock::now();
    func();
    auto end = std::chrono::high_resolution_clock::now();
    return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}

void insertionSort(std::vector<int>& arr)
{
    for (size_t i = 1; i < arr.size(); ++i)
    {
        int key = arr[i];
        int j = static_cast<int>(i) - 1;
        
        while (j >= 0 && arr[j] > key)
        {
            arr[j + 1] = arr[j];
            j--;
        }
        arr[j + 1] = key;
    }
}

int main()
{
    const size_t SIZE = 5000;
    
    // Best case: Already sorted
    std::vector<int> bestCase(SIZE);
    for (size_t i = 0; i < SIZE; ++i)
    {
        bestCase[i] = static_cast<int>(i);
    }
    
    // Worst case: Reverse sorted
    std::vector<int> worstCase(SIZE);
    for (size_t i = 0; i < SIZE; ++i)
    {
        worstCase[i] = static_cast<int>(SIZE - i);
    }
    
    // Average case: Random
    std::vector<int> averageCase(SIZE);
    for (size_t i = 0; i < SIZE; ++i)
    {
        averageCase[i] = rand() % 1000;
    }
    
    std::cout << "Insertion Sort Performance Analysis (" << SIZE << " elements):" << std::endl;
    std::cout << std::string(55, '-') << std::endl;
    
    // Time best case
    auto bestCaseCopy = bestCase;
    long long bestTime = timeFunction([&bestCaseCopy]() {
        insertionSort(bestCaseCopy);
    });
    
    // Time worst case
    auto worstCaseCopy = worstCase;
    long long worstTime = timeFunction([&worstCaseCopy]() {
        insertionSort(worstCaseCopy);
    });
    
    // Time average case
    auto averageCaseCopy = averageCase;
    long long averageTime = timeFunction([&averageCaseCopy]() {
        insertionSort(averageCaseCopy);
    });
    
    std::cout << "Best case (sorted):     " << std::setw(8) << bestTime << " microseconds" << std::endl;
    std::cout << "Average case (random):  " << std::setw(8) << averageTime << " microseconds" << std::endl;
    std::cout << "Worst case (reversed):  " << std::setw(8) << worstTime << " microseconds" << std::endl;
    
    std::cout << "\nWorst/Best ratio: " << static_cast<double>(worstTime) / bestTime << "x" << std::endl;
    
    return 0;
}

Sample Output:

Insertion Sort Performance Analysis (5000 elements):
-------------------------------------------------------
Best case (sorted):           45 microseconds
Average case (random):     25678 microseconds  
Worst case (reversed):     51234 microseconds

Worst/Best ratio: 1138.5x

Microbenchmarking considerations

When timing small operations, several factors can affect accuracy:

1. Compiler optimizations

#include <iostream>
#include <chrono>
#include <vector>

// Function that might be optimized away
int simpleCalculation(int x)
{
    return x * x + 2 * x + 1;
}

int main()
{
    const int ITERATIONS = 1000000;
    
    auto start = std::chrono::high_resolution_clock::now();
    
    int result = 0;
    for (int i = 0; i < ITERATIONS; ++i)
    {
        result += simpleCalculation(i);  // Use result to prevent optimization
    }
    
    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
    
    std::cout << "Total time: " << duration.count() << " nanoseconds" << std::endl;
    std::cout << "Time per operation: " << duration.count() / ITERATIONS << " nanoseconds" << std::endl;
    std::cout << "Result (to prevent optimization): " << result << std::endl;
    
    return 0;
}

Output:

Total time: 2456789 nanoseconds
Time per operation: 2 nanoseconds
Result (to prevent optimization): 333332833333

2. Cache effects and warm-up

#include <iostream>
#include <vector>
#include <chrono>
#include <algorithm>

template<typename Func>
long long timeWithWarmup(Func func, int warmupRuns = 3)
{
    // Warm up (don't time these runs)
    for (int i = 0; i < warmupRuns; ++i)
    {
        func();
    }
    
    // Now time the actual run
    auto start = std::chrono::high_resolution_clock::now();
    func();
    auto end = std::chrono::high_resolution_clock::now();
    
    return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}

int main()
{
    const size_t SIZE = 100000;
    std::vector<int> data(SIZE);
    
    // Fill with random data
    for (size_t i = 0; i < SIZE; ++i)
    {
        data[i] = rand() % 1000;
    }
    
    std::cout << "Comparing cold vs warm runs:" << std::endl;
    
    // Cold run (first time)
    auto dataCold = data;
    auto start = std::chrono::high_resolution_clock::now();
    std::sort(dataCold.begin(), dataCold.end());
    auto end = std::chrono::high_resolution_clock::now();
    long long coldTime = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
    
    // Warm runs (with warm-up)
    auto dataWarm = data;
    long long warmTime = timeWithWarmup([&dataWarm, &data]() {
        dataWarm = data; // Reset data
        std::sort(dataWarm.begin(), dataWarm.end());
    });
    
    std::cout << "Cold run:  " << coldTime << " microseconds" << std::endl;
    std::cout << "Warm run:  " << warmTime << " microseconds" << std::endl;
    
    return 0;
}

3. Multiple runs for statistical significance

#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <numeric>
#include <cmath>

class PerformanceTester
{
private:
    std::vector<long long> m_times;
    
public:
    template<typename Func>
    void measureMultipleRuns(Func func, int runs, const std::string& description)
    {
        m_times.clear();
        m_times.reserve(runs);
        
        std::cout << "Running " << description << " " << runs << " times..." << std::endl;
        
        for (int i = 0; i < runs; ++i)
        {
            auto start = std::chrono::high_resolution_clock::now();
            func();
            auto end = std::chrono::high_resolution_clock::now();
            
            long long time = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
            m_times.push_back(time);
        }
        
        printStatistics();
    }
    
private:
    void printStatistics()
    {
        if (m_times.empty()) return;
        
        std::sort(m_times.begin(), m_times.end());
        
        long long min = m_times.front();
        long long max = m_times.back();
        long long median = m_times[m_times.size() / 2];
        double mean = std::accumulate(m_times.begin(), m_times.end(), 0.0) / m_times.size();
        
        // Calculate standard deviation
        double variance = 0.0;
        for (long long time : m_times)
        {
            variance += (time - mean) * (time - mean);
        }
        variance /= m_times.size();
        double stddev = std::sqrt(variance);
        
        std::cout << "Statistics (microseconds):" << std::endl;
        std::cout << "  Min:    " << min << std::endl;
        std::cout << "  Max:    " << max << std::endl;
        std::cout << "  Median: " << median << std::endl;
        std::cout << "  Mean:   " << static_cast<long long>(mean) << std::endl;
        std::cout << "  StdDev: " << static_cast<long long>(stddev) << std::endl;
        std::cout << std::endl;
    }
};

int main()
{
    const size_t SIZE = 10000;
    std::vector<int> originalData(SIZE);
    
    // Fill with random data
    for (size_t i = 0; i < SIZE; ++i)
    {
        originalData[i] = rand() % 1000;
    }
    
    PerformanceTester tester;
    
    // Test std::sort with multiple runs
    tester.measureMultipleRuns([&originalData]() {
        auto data = originalData; // Reset data each time
        std::sort(data.begin(), data.end());
    }, 10, "std::sort");
    
    return 0;
}

Sample Output:

Running std::sort 10 times...
Statistics (microseconds):
  Min:    1234
  Max:    1456
  Median: 1345
  Mean:   1356
  StdDev: 67

Profiling tools and techniques

Using built-in profiling hints

#include <iostream>
#include <chrono>
#include <vector>
#include <algorithm>

// Function to simulate CPU-intensive work
void cpuIntensiveFunction(int n)
{
    volatile double result = 0.0;
    for (int i = 0; i < n; ++i)
    {
        result += std::sin(i) * std::cos(i);
    }
}

// Function to simulate memory-intensive work  
void memoryIntensiveFunction(size_t size)
{
    std::vector<int> data(size);
    std::iota(data.begin(), data.end(), 1);
    std::random_shuffle(data.begin(), data.end());
    std::sort(data.begin(), data.end());
}

template<typename Func>
void profileFunction(Func func, const std::string& name)
{
    const int RUNS = 5;
    long long totalTime = 0;
    
    for (int i = 0; i < RUNS; ++i)
    {
        auto start = std::chrono::high_resolution_clock::now();
        func();
        auto end = std::chrono::high_resolution_clock::now();
        
        totalTime += std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
    }
    
    std::cout << name << ": " << (totalTime / RUNS) << "ms average" << std::endl;
}

int main()
{
    std::cout << "Performance Profile:" << std::endl;
    std::cout << std::string(30, '-') << std::endl;
    
    profileFunction([]() {
        cpuIntensiveFunction(100000);
    }, "CPU-intensive task");
    
    profileFunction([]() {
        memoryIntensiveFunction(100000);
    }, "Memory-intensive task");
    
    return 0;
}

Best practices for performance measurement

1. Measure what matters

// Don't time trivial operations that will be optimized away
// Instead, focus on the operations that actually impact your application

// Good - timing a meaningful operation
auto data = generateLargeDataset();
Timer timer("Data processing");
auto result = processData(data);

2. Use appropriate time units

// Fast operations: nanoseconds or microseconds
auto fastResult = std::chrono::duration_cast<std::chrono::nanoseconds>(duration);

// Medium operations: microseconds or milliseconds  
auto mediumResult = std::chrono::duration_cast<std::chrono::microseconds>(duration);

// Slow operations: milliseconds or seconds
auto slowResult = std::chrono::duration_cast<std::chrono::milliseconds>(duration);

3. Account for measurement overhead

// Measure the timer overhead itself
auto start = std::chrono::high_resolution_clock::now();
auto end = std::chrono::high_resolution_clock::now();
auto overhead = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
std::cout << "Timer overhead: " << overhead.count() << " nanoseconds" << std::endl;

4. Consider compiler optimization levels

When timing code, compile with appropriate optimization flags:

  • -O0: No optimization (good for debugging, not for timing)
  • -O1: Basic optimization
  • -O2: Standard optimization (good for realistic timing)
  • -O3: Aggressive optimization (may change algorithm characteristics)

Summary

Code timing is essential for performance optimization:

  • Use std::chrono: Modern, precise timing with appropriate time units
  • Create reusable tools: Timer classes and helper functions reduce repetition
  • Compare fairly: Same data, multiple runs, statistical analysis
  • Consider context: Best/average/worst cases, cache effects, compiler optimizations
  • Measure meaningful work: Focus on operations that matter to your application
  • Use proper methodology: Warm-up runs, multiple measurements, statistical significance

Key takeaways:

  • Performance matters: Understanding timing helps you write efficient code
  • Measurement technique matters: Poor methodology gives misleading results
  • Context is crucial: Real-world performance may differ from microbenchmarks
  • Optimization is iterative: Measure, optimize, measure again

This knowledge helps you make informed decisions about:

  • Which algorithms to choose
  • Where to focus optimization efforts
  • Whether optimizations actually help
  • How your code will perform in production

Quiz

  1. What C++ header provides the modern timing functionality?
  2. Why should you run multiple timing iterations instead of just one?
  3. What is the difference between measuring best case, average case, and worst case performance?
  4. Why might the first run of a function be slower than subsequent runs?
  5. What time unit would be most appropriate for timing a function that takes about 5 milliseconds to execute?

Practice exercises

Try these timing exercises to master performance measurement:

  1. Algorithm comparison: Write a program that compares the performance of different search algorithms (linear search vs binary search) on sorted arrays of various sizes.

  2. Data structure analysis: Time the performance of different operations (insertion, deletion, search) on std::vector, std::list, and std::set with different data sizes.

  3. Cache effects investigation: Create a program that demonstrates cache effects by comparing the performance of accessing array elements in row-major vs column-major order.

  4. Optimization validation: Take a simple function (like calculating factorial), implement multiple versions (iterative, recursive, memoized), and use timing to determine which is fastest for different input sizes.

Continue Learning

Explore other available lessons while this one is being prepared.

View Course

Explore More Courses

Discover other available courses while this lesson is being prepared.

Browse Courses

Lesson Discussion

Share your thoughts and questions

💬

No comments yet. Be the first to share your thoughts!

Sign in to join the discussion