Coming Soon
This lesson is currently being developed
Timing your code
Measure and optimize program performance.
What to Expect
Comprehensive explanations with practical examples
Interactive coding exercises to practice concepts
Knowledge quiz to test your understanding
Step-by-step guidance for beginners
Development Status
Content is being carefully crafted to provide the best learning experience
Preview
Early Preview Content
This content is still being developed and may change before publication.
18.4 — Timing your code
In this lesson, you'll learn how to measure the performance of your C++ code, analyze timing results, and make informed decisions about optimization. Understanding performance measurement is crucial for writing efficient programs.
Why timing matters
Performance measurement helps you:
- Identify bottlenecks: Find the slowest parts of your program
- Compare alternatives: Choose between different algorithms or implementations
- Validate optimizations: Confirm that changes actually improve performance
- Meet requirements: Ensure your code runs fast enough for real-world use
- Understand complexity: Verify theoretical time complexity in practice
Think of timing like a stopwatch for your code:
- Measure race times: How long does each function take?
- Compare runners: Which algorithm is faster?
- Track improvement: Did practice make you faster?
- Set goals: Can you meet the performance target?
The basics of code timing
Using std::chrono
C++11 introduced <chrono>
, a powerful library for time measurement:
#include <iostream>
#include <chrono>
#include <vector>
#include <algorithm>
int main()
{
std::vector<int> data = {5, 2, 8, 1, 9, 3};
// Record start time
auto start = std::chrono::high_resolution_clock::now();
// Code to time
std::sort(data.begin(), data.end());
// Record end time
auto end = std::chrono::high_resolution_clock::now();
// Calculate duration
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "Sorting took: " << duration.count() << " microseconds" << std::endl;
// Print sorted data to verify it worked
std::cout << "Sorted: ";
for (int n : data)
std::cout << n << " ";
std::cout << std::endl;
return 0;
}
Output:
Sorting took: 5 microseconds
Sorted: 1 2 3 5 8 9
Understanding time units
Chrono provides various time units for different precision needs:
#include <iostream>
#include <chrono>
#include <thread>
void demonstrateTimeUnits()
{
auto start = std::chrono::high_resolution_clock::now();
// Simulate some work with a small delay
std::this_thread::sleep_for(std::chrono::milliseconds(100));
auto end = std::chrono::high_resolution_clock::now();
// Same duration, different units
auto nanoseconds = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
auto microseconds = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
auto milliseconds = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
auto seconds = std::chrono::duration_cast<std::chrono::seconds>(end - start);
std::cout << "Time taken:" << std::endl;
std::cout << " " << nanoseconds.count() << " nanoseconds" << std::endl;
std::cout << " " << microseconds.count() << " microseconds" << std::endl;
std::cout << " " << milliseconds.count() << " milliseconds" << std::endl;
std::cout << " " << seconds.count() << " seconds" << std::endl;
}
int main()
{
demonstrateTimeUnits();
return 0;
}
Output:
Time taken:
100254000 nanoseconds
100254 microseconds
100 milliseconds
0 seconds
Creating a reusable timer class
For repeated measurements, a timer class makes timing easier:
#include <iostream>
#include <chrono>
#include <string>
class Timer
{
private:
std::chrono::high_resolution_clock::time_point m_start;
std::string m_name;
public:
Timer(const std::string& name = "Operation")
: m_name(name)
{
m_start = std::chrono::high_resolution_clock::now();
}
~Timer()
{
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - m_start);
std::cout << m_name << " took: " << duration.count() << " microseconds" << std::endl;
}
// Manual measurement (doesn't rely on destructor)
void stop()
{
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - m_start);
std::cout << m_name << " took: " << duration.count() << " microseconds" << std::endl;
// Reset timer for potential reuse
m_start = std::chrono::high_resolution_clock::now();
}
};
// Helper function for timing any callable
template<typename Func>
void timeFunction(Func func, const std::string& description = "Function")
{
Timer timer(description);
func();
} // Timer destructor automatically prints timing
int main()
{
std::vector<int> data(1000);
// Fill with some data
for (size_t i = 0; i < data.size(); ++i)
{
data[i] = data.size() - i;
}
// Time using RAII (automatic timing)
{
Timer timer("Vector sorting");
std::sort(data.begin(), data.end());
} // Timer automatically prints result here
// Time using template function
timeFunction([&data]() {
std::reverse(data.begin(), data.end());
}, "Vector reversal");
return 0;
}
Output:
Vector sorting took: 67 microseconds
Vector reversal took: 12 microseconds
Comparing algorithm performance
Let's compare different sorting algorithms to see timing in action:
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <random>
// Selection sort implementation for comparison
void selectionSort(std::vector<int>& arr)
{
size_t n = arr.size();
for (size_t i = 0; i < n - 1; ++i)
{
size_t minIndex = i;
for (size_t j = i + 1; j < n; ++j)
{
if (arr[j] < arr[minIndex])
{
minIndex = j;
}
}
if (minIndex != i)
{
std::swap(arr[i], arr[minIndex]);
}
}
}
// Bubble sort implementation for comparison
void bubbleSort(std::vector<int>& arr)
{
size_t n = arr.size();
for (size_t i = 0; i < n - 1; ++i)
{
bool swapped = false;
for (size_t j = 0; j < n - i - 1; ++j)
{
if (arr[j] > arr[j + 1])
{
std::swap(arr[j], arr[j + 1]);
swapped = true;
}
}
if (!swapped) break; // Array is sorted
}
}
// Timer function template
template<typename Func>
long long timeFunction(Func func)
{
auto start = std::chrono::high_resolution_clock::now();
func();
auto end = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}
std::vector<int> generateRandomData(size_t size)
{
std::vector<int> data(size);
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
for (size_t i = 0; i < size; ++i)
{
data[i] = dis(gen);
}
return data;
}
int main()
{
const size_t SIZE = 5000;
std::cout << "Comparing sorting algorithms with " << SIZE << " elements:" << std::endl;
std::cout << std::string(50, '-') << std::endl;
// Test each algorithm
{
auto data = generateRandomData(SIZE);
long long time = timeFunction([&data]() {
std::sort(data.begin(), data.end());
});
std::cout << "std::sort: " << time << " microseconds" << std::endl;
}
{
auto data = generateRandomData(SIZE);
long long time = timeFunction([&data]() {
selectionSort(data);
});
std::cout << "Selection sort: " << time << " microseconds" << std::endl;
}
{
auto data = generateRandomData(SIZE);
long long time = timeFunction([&data]() {
bubbleSort(data);
});
std::cout << "Bubble sort: " << time << " microseconds" << std::endl;
}
return 0;
}
Sample Output:
Comparing sorting algorithms with 5000 elements:
--------------------------------------------------
std::sort: 1234 microseconds
Selection sort: 45678 microseconds
Bubble sort: 67890 microseconds
Measuring with different data sizes
To understand algorithm complexity, measure performance across different input sizes:
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <random>
#include <iomanip>
template<typename Func>
long long timeFunction(Func func)
{
auto start = std::chrono::high_resolution_clock::now();
func();
auto end = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}
std::vector<int> generateRandomData(size_t size)
{
std::vector<int> data(size);
std::random_device rd;
std::mt19937 gen(42); // Fixed seed for consistent results
std::uniform_int_distribution<> dis(1, 10000);
for (size_t i = 0; i < size; ++i)
{
data[i] = dis(gen);
}
return data;
}
void selectionSort(std::vector<int>& arr)
{
size_t n = arr.size();
for (size_t i = 0; i < n - 1; ++i)
{
size_t minIndex = i;
for (size_t j = i + 1; j < n; ++j)
{
if (arr[j] < arr[minIndex])
{
minIndex = j;
}
}
if (minIndex != i)
{
std::swap(arr[i], arr[minIndex]);
}
}
}
int main()
{
std::vector<size_t> sizes = {100, 500, 1000, 2000, 4000};
std::cout << std::setw(6) << "Size"
<< std::setw(15) << "std::sort (μs)"
<< std::setw(20) << "Selection sort (μs)"
<< std::setw(10) << "Ratio" << std::endl;
std::cout << std::string(55, '-') << std::endl;
for (size_t size : sizes)
{
// Time std::sort
auto data1 = generateRandomData(size);
long long stdTime = timeFunction([&data1]() {
std::sort(data1.begin(), data1.end());
});
// Time selection sort
auto data2 = generateRandomData(size);
long long selTime = timeFunction([&data2]() {
selectionSort(data2);
});
double ratio = static_cast<double>(selTime) / stdTime;
std::cout << std::setw(6) << size
<< std::setw(15) << stdTime
<< std::setw(20) << selTime
<< std::setw(9) << std::fixed << std::setprecision(1) << ratio << "x" << std::endl;
}
return 0;
}
Sample Output:
Size std::sort (μs) Selection sort (μs) Ratio
-------------------------------------------------------
100 12 567 47.3x
500 45 13245 294.3x
1000 92 52890 574.9x
2000 198 211234 1066.8x
4000 423 845678 1999.7x
Timing best, average, and worst cases
Different input patterns can significantly affect algorithm performance:
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
template<typename Func>
long long timeFunction(Func func)
{
auto start = std::chrono::high_resolution_clock::now();
func();
auto end = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}
void insertionSort(std::vector<int>& arr)
{
for (size_t i = 1; i < arr.size(); ++i)
{
int key = arr[i];
int j = static_cast<int>(i) - 1;
while (j >= 0 && arr[j] > key)
{
arr[j + 1] = arr[j];
j--;
}
arr[j + 1] = key;
}
}
int main()
{
const size_t SIZE = 5000;
// Best case: Already sorted
std::vector<int> bestCase(SIZE);
for (size_t i = 0; i < SIZE; ++i)
{
bestCase[i] = static_cast<int>(i);
}
// Worst case: Reverse sorted
std::vector<int> worstCase(SIZE);
for (size_t i = 0; i < SIZE; ++i)
{
worstCase[i] = static_cast<int>(SIZE - i);
}
// Average case: Random
std::vector<int> averageCase(SIZE);
for (size_t i = 0; i < SIZE; ++i)
{
averageCase[i] = rand() % 1000;
}
std::cout << "Insertion Sort Performance Analysis (" << SIZE << " elements):" << std::endl;
std::cout << std::string(55, '-') << std::endl;
// Time best case
auto bestCaseCopy = bestCase;
long long bestTime = timeFunction([&bestCaseCopy]() {
insertionSort(bestCaseCopy);
});
// Time worst case
auto worstCaseCopy = worstCase;
long long worstTime = timeFunction([&worstCaseCopy]() {
insertionSort(worstCaseCopy);
});
// Time average case
auto averageCaseCopy = averageCase;
long long averageTime = timeFunction([&averageCaseCopy]() {
insertionSort(averageCaseCopy);
});
std::cout << "Best case (sorted): " << std::setw(8) << bestTime << " microseconds" << std::endl;
std::cout << "Average case (random): " << std::setw(8) << averageTime << " microseconds" << std::endl;
std::cout << "Worst case (reversed): " << std::setw(8) << worstTime << " microseconds" << std::endl;
std::cout << "\nWorst/Best ratio: " << static_cast<double>(worstTime) / bestTime << "x" << std::endl;
return 0;
}
Sample Output:
Insertion Sort Performance Analysis (5000 elements):
-------------------------------------------------------
Best case (sorted): 45 microseconds
Average case (random): 25678 microseconds
Worst case (reversed): 51234 microseconds
Worst/Best ratio: 1138.5x
Microbenchmarking considerations
When timing small operations, several factors can affect accuracy:
1. Compiler optimizations
#include <iostream>
#include <chrono>
#include <vector>
// Function that might be optimized away
int simpleCalculation(int x)
{
return x * x + 2 * x + 1;
}
int main()
{
const int ITERATIONS = 1000000;
auto start = std::chrono::high_resolution_clock::now();
int result = 0;
for (int i = 0; i < ITERATIONS; ++i)
{
result += simpleCalculation(i); // Use result to prevent optimization
}
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
std::cout << "Total time: " << duration.count() << " nanoseconds" << std::endl;
std::cout << "Time per operation: " << duration.count() / ITERATIONS << " nanoseconds" << std::endl;
std::cout << "Result (to prevent optimization): " << result << std::endl;
return 0;
}
Output:
Total time: 2456789 nanoseconds
Time per operation: 2 nanoseconds
Result (to prevent optimization): 333332833333
2. Cache effects and warm-up
#include <iostream>
#include <vector>
#include <chrono>
#include <algorithm>
template<typename Func>
long long timeWithWarmup(Func func, int warmupRuns = 3)
{
// Warm up (don't time these runs)
for (int i = 0; i < warmupRuns; ++i)
{
func();
}
// Now time the actual run
auto start = std::chrono::high_resolution_clock::now();
func();
auto end = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}
int main()
{
const size_t SIZE = 100000;
std::vector<int> data(SIZE);
// Fill with random data
for (size_t i = 0; i < SIZE; ++i)
{
data[i] = rand() % 1000;
}
std::cout << "Comparing cold vs warm runs:" << std::endl;
// Cold run (first time)
auto dataCold = data;
auto start = std::chrono::high_resolution_clock::now();
std::sort(dataCold.begin(), dataCold.end());
auto end = std::chrono::high_resolution_clock::now();
long long coldTime = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
// Warm runs (with warm-up)
auto dataWarm = data;
long long warmTime = timeWithWarmup([&dataWarm, &data]() {
dataWarm = data; // Reset data
std::sort(dataWarm.begin(), dataWarm.end());
});
std::cout << "Cold run: " << coldTime << " microseconds" << std::endl;
std::cout << "Warm run: " << warmTime << " microseconds" << std::endl;
return 0;
}
3. Multiple runs for statistical significance
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <numeric>
#include <cmath>
class PerformanceTester
{
private:
std::vector<long long> m_times;
public:
template<typename Func>
void measureMultipleRuns(Func func, int runs, const std::string& description)
{
m_times.clear();
m_times.reserve(runs);
std::cout << "Running " << description << " " << runs << " times..." << std::endl;
for (int i = 0; i < runs; ++i)
{
auto start = std::chrono::high_resolution_clock::now();
func();
auto end = std::chrono::high_resolution_clock::now();
long long time = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
m_times.push_back(time);
}
printStatistics();
}
private:
void printStatistics()
{
if (m_times.empty()) return;
std::sort(m_times.begin(), m_times.end());
long long min = m_times.front();
long long max = m_times.back();
long long median = m_times[m_times.size() / 2];
double mean = std::accumulate(m_times.begin(), m_times.end(), 0.0) / m_times.size();
// Calculate standard deviation
double variance = 0.0;
for (long long time : m_times)
{
variance += (time - mean) * (time - mean);
}
variance /= m_times.size();
double stddev = std::sqrt(variance);
std::cout << "Statistics (microseconds):" << std::endl;
std::cout << " Min: " << min << std::endl;
std::cout << " Max: " << max << std::endl;
std::cout << " Median: " << median << std::endl;
std::cout << " Mean: " << static_cast<long long>(mean) << std::endl;
std::cout << " StdDev: " << static_cast<long long>(stddev) << std::endl;
std::cout << std::endl;
}
};
int main()
{
const size_t SIZE = 10000;
std::vector<int> originalData(SIZE);
// Fill with random data
for (size_t i = 0; i < SIZE; ++i)
{
originalData[i] = rand() % 1000;
}
PerformanceTester tester;
// Test std::sort with multiple runs
tester.measureMultipleRuns([&originalData]() {
auto data = originalData; // Reset data each time
std::sort(data.begin(), data.end());
}, 10, "std::sort");
return 0;
}
Sample Output:
Running std::sort 10 times...
Statistics (microseconds):
Min: 1234
Max: 1456
Median: 1345
Mean: 1356
StdDev: 67
Profiling tools and techniques
Using built-in profiling hints
#include <iostream>
#include <chrono>
#include <vector>
#include <algorithm>
// Function to simulate CPU-intensive work
void cpuIntensiveFunction(int n)
{
volatile double result = 0.0;
for (int i = 0; i < n; ++i)
{
result += std::sin(i) * std::cos(i);
}
}
// Function to simulate memory-intensive work
void memoryIntensiveFunction(size_t size)
{
std::vector<int> data(size);
std::iota(data.begin(), data.end(), 1);
std::random_shuffle(data.begin(), data.end());
std::sort(data.begin(), data.end());
}
template<typename Func>
void profileFunction(Func func, const std::string& name)
{
const int RUNS = 5;
long long totalTime = 0;
for (int i = 0; i < RUNS; ++i)
{
auto start = std::chrono::high_resolution_clock::now();
func();
auto end = std::chrono::high_resolution_clock::now();
totalTime += std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
}
std::cout << name << ": " << (totalTime / RUNS) << "ms average" << std::endl;
}
int main()
{
std::cout << "Performance Profile:" << std::endl;
std::cout << std::string(30, '-') << std::endl;
profileFunction([]() {
cpuIntensiveFunction(100000);
}, "CPU-intensive task");
profileFunction([]() {
memoryIntensiveFunction(100000);
}, "Memory-intensive task");
return 0;
}
Best practices for performance measurement
1. Measure what matters
// Don't time trivial operations that will be optimized away
// Instead, focus on the operations that actually impact your application
// Good - timing a meaningful operation
auto data = generateLargeDataset();
Timer timer("Data processing");
auto result = processData(data);
2. Use appropriate time units
// Fast operations: nanoseconds or microseconds
auto fastResult = std::chrono::duration_cast<std::chrono::nanoseconds>(duration);
// Medium operations: microseconds or milliseconds
auto mediumResult = std::chrono::duration_cast<std::chrono::microseconds>(duration);
// Slow operations: milliseconds or seconds
auto slowResult = std::chrono::duration_cast<std::chrono::milliseconds>(duration);
3. Account for measurement overhead
// Measure the timer overhead itself
auto start = std::chrono::high_resolution_clock::now();
auto end = std::chrono::high_resolution_clock::now();
auto overhead = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
std::cout << "Timer overhead: " << overhead.count() << " nanoseconds" << std::endl;
4. Consider compiler optimization levels
When timing code, compile with appropriate optimization flags:
-O0
: No optimization (good for debugging, not for timing)-O1
: Basic optimization-O2
: Standard optimization (good for realistic timing)-O3
: Aggressive optimization (may change algorithm characteristics)
Summary
Code timing is essential for performance optimization:
- Use std::chrono: Modern, precise timing with appropriate time units
- Create reusable tools: Timer classes and helper functions reduce repetition
- Compare fairly: Same data, multiple runs, statistical analysis
- Consider context: Best/average/worst cases, cache effects, compiler optimizations
- Measure meaningful work: Focus on operations that matter to your application
- Use proper methodology: Warm-up runs, multiple measurements, statistical significance
Key takeaways:
- Performance matters: Understanding timing helps you write efficient code
- Measurement technique matters: Poor methodology gives misleading results
- Context is crucial: Real-world performance may differ from microbenchmarks
- Optimization is iterative: Measure, optimize, measure again
This knowledge helps you make informed decisions about:
- Which algorithms to choose
- Where to focus optimization efforts
- Whether optimizations actually help
- How your code will perform in production
Quiz
- What C++ header provides the modern timing functionality?
- Why should you run multiple timing iterations instead of just one?
- What is the difference between measuring best case, average case, and worst case performance?
- Why might the first run of a function be slower than subsequent runs?
- What time unit would be most appropriate for timing a function that takes about 5 milliseconds to execute?
Practice exercises
Try these timing exercises to master performance measurement:
-
Algorithm comparison: Write a program that compares the performance of different search algorithms (linear search vs binary search) on sorted arrays of various sizes.
-
Data structure analysis: Time the performance of different operations (insertion, deletion, search) on
std::vector
,std::list
, andstd::set
with different data sizes. -
Cache effects investigation: Create a program that demonstrates cache effects by comparing the performance of accessing array elements in row-major vs column-major order.
-
Optimization validation: Take a simple function (like calculating factorial), implement multiple versions (iterative, recursive, memoized), and use timing to determine which is fastest for different input sizes.
Explore More Courses
Discover other available courses while this lesson is being prepared.
Browse CoursesLesson Discussion
Share your thoughts and questions