Coming Soon

This lesson is currently being developed

Floating point numbers

Understand floating-point representation and precision.

Fundamental Data Types
Chapter
Beginner
Difficulty
40min
Estimated Time

What to Expect

Comprehensive explanations with practical examples

Interactive coding exercises to practice concepts

Knowledge quiz to test your understanding

Step-by-step guidance for beginners

Development Status

In Progress

Content is being carefully crafted to provide the best learning experience

Preview

Early Preview Content

This content is still being developed and may change before publication.

4.8 — Floating point numbers

In this lesson, you'll learn about floating-point numbers, how they represent decimal values, their precision limitations, and best practices for using them in C++.

What are floating-point numbers?

Floating-point numbers are data types that can represent real numbers with fractional parts. Unlike integers which can only store whole numbers, floating-point types can store numbers like 3.14159, 0.5, and 1.23e-4.

The name "floating-point" comes from the fact that the decimal point can "float" to different positions - it's not fixed like in a traditional decimal system.

Think of floating-point numbers like a scientific calculator display: they can show very large numbers, very small numbers, and numbers with decimal places, but there's a limit to how many digits they can display precisely.

C++ floating-point types

C++ provides three main floating-point types:

#include <iostream>
#include <cfloat>  // For floating-point limits
#include <iomanip> // For output formatting

int main()
{
    std::cout << "Floating-point types in C++:\n\n";
    
    // Display sizes
    std::cout << "Type sizes:\n";
    std::cout << "float: " << sizeof(float) << " bytes\n";
    std::cout << "double: " << sizeof(double) << " bytes\n";
    std::cout << "long double: " << sizeof(long double) << " bytes\n\n";
    
    // Display precision (approximate decimal digits)
    std::cout << "Precision (decimal digits):\n";
    std::cout << "float: ~" << FLT_DIG << " digits\n";
    std::cout << "double: ~" << DBL_DIG << " digits\n";
    std::cout << "long double: ~" << LDBL_DIG << " digits\n\n";
    
    // Display ranges
    std::cout << std::scientific;
    std::cout << "Ranges:\n";
    std::cout << "float: " << FLT_MIN << " to " << FLT_MAX << "\n";
    std::cout << "double: " << DBL_MIN << " to " << DBL_MAX << "\n";
    std::cout << "long double: " << LDBL_MIN << " to " << LDBL_MAX << "\n";
    
    return 0;
}

Typical Output:

Floating-point types in C++:

Type sizes:
float: 4 bytes
double: 8 bytes
long double: 16 bytes

Precision (decimal digits):
float: ~6 digits
double: ~15 digits
long double: ~18 digits

Ranges:
float: 1.175494e-38 to 3.402823e+38
double: 2.225074e-308 to 1.797693e+308
long double: 3.362103e-4932 to 1.189731e+4932

Declaring and initializing floating-point numbers

#include <iostream>

int main()
{
    // Different ways to initialize floating-point numbers
    
    // float (single precision) - use 'f' suffix
    float temperature = 98.6f;
    float pi_float = 3.14159f;
    float scientific_float = 1.23e6f;    // 1,230,000
    
    // double (double precision) - default for decimal literals
    double precise_pi = 3.141592653589793;
    double avogadro = 6.022e23;          // No suffix needed for double
    double small_number = 1.5e-10;
    
    // long double (extended precision) - use 'L' suffix
    long double high_precision = 3.141592653589793238462643L;
    long double planck = 6.62607015e-34L;
    
    std::cout << std::fixed << std::setprecision(10);
    
    std::cout << "float values:\n";
    std::cout << "Temperature: " << temperature << "°F\n";
    std::cout << "Pi (float): " << pi_float << "\n";
    std::cout << "Scientific: " << scientific_float << "\n\n";
    
    std::cout << "double values:\n";
    std::cout << "Pi (double): " << precise_pi << "\n";
    std::cout << "Small number: " << small_number << "\n\n";
    
    std::cout << "long double values:\n";
    std::cout << "High precision pi: " << high_precision << "\n";
    
    return 0;
}

Understanding precision limitations

Floating-point numbers have limited precision, which can lead to unexpected results:

#include <iostream>
#include <iomanip>

int main()
{
    std::cout << "Floating-point precision limitations:\n\n";
    
    // Precision differences between types
    float f = 1.23456789012345f;
    double d = 1.23456789012345;
    long double ld = 1.23456789012345L;
    
    std::cout << std::fixed << std::setprecision(15);
    std::cout << "Same number in different types:\n";
    std::cout << "float:       " << f << "\n";
    std::cout << "double:      " << d << "\n";
    std::cout << "long double: " << ld << "\n\n";
    
    // Demonstration of precision loss
    std::cout << "Precision loss examples:\n";
    
    float sum = 0.1f + 0.2f;
    std::cout << "0.1f + 0.2f = " << sum << " (should be 0.3)\n";
    
    double sum_double = 0.1 + 0.2;
    std::cout << "0.1 + 0.2 = " << sum_double << " (should be 0.3)\n";
    
    // Large number + small number
    float large = 1000000.0f;
    float small = 0.1f;
    float result = large + small;
    
    std::cout << "\nLarge + small number precision loss:\n";
    std::cout << large << " + " << small << " = " << result << "\n";
    std::cout << "Lost precision? " << (result == large ? "Yes" : "No") << "\n";
    
    return 0;
}

Output:

Floating-point precision limitations:

Same number in different types:
float:       1.234567999839783
double:      1.234567890123450
long double: 1.234567890123450

Precision loss examples:
0.1f + 0.2f = 0.300000011920929 (should be 0.3)
0.1 + 0.2 = 0.300000000000000 (should be 0.3)

Large + small number precision loss:
1000000.000000000000000 + 0.100000000000000 = 1000000.000000000000000
Lost precision? Yes

Comparing floating-point numbers

Due to precision limitations, comparing floating-point numbers for equality can be problematic:

#include <iostream>
#include <cmath>

bool isEqual(double a, double b, double epsilon = 1e-9)
{
    return std::abs(a - b) < epsilon;
}

int main()
{
    std::cout << "Floating-point comparison issues:\n\n";
    
    // Problematic direct comparison
    double a = 0.1 + 0.2;
    double b = 0.3;
    
    std::cout << std::fixed << std::setprecision(17);
    std::cout << "a = 0.1 + 0.2 = " << a << "\n";
    std::cout << "b = 0.3       = " << b << "\n";
    std::cout << "a == b? " << (a == b ? "true" : "false") << " (surprising!)\n\n";
    
    // Safe comparison using epsilon
    std::cout << "Safe floating-point comparison:\n";
    std::cout << "isEqual(a, b) = " << (isEqual(a, b) ? "true" : "false") << "\n\n";
    
    // Another example
    double x = 1.0 / 3.0;
    double y = x * 3.0;
    
    std::cout << "x = 1.0 / 3.0 = " << x << "\n";
    std::cout << "y = x * 3.0   = " << y << "\n";
    std::cout << "y == 1.0? " << (y == 1.0 ? "true" : "false") << "\n";
    std::cout << "isEqual(y, 1.0)? " << (isEqual(y, 1.0) ? "true" : "false") << "\n";
    
    return 0;
}

Output:

Floating-point comparison issues:

a = 0.1 + 0.2 = 0.30000000000000004
b = 0.3       = 0.29999999999999999
a == b? false (surprising!)

Safe floating-point comparison:
isEqual(a, b) = true

x = 1.0 / 3.0 = 0.33333333333333331
y = x * 3.0   = 0.99999999999999989
y == 1.0? false
isEqual(y, 1.0)? true

Special floating-point values

Floating-point types can represent special values:

#include <iostream>
#include <cmath>
#include <limits>

int main()
{
    std::cout << "Special floating-point values:\n\n";
    
    // Infinity
    double positive_inf = std::numeric_limits<double>::infinity();
    double negative_inf = -std::numeric_limits<double>::infinity();
    double division_by_zero = 1.0 / 0.0;  // Results in infinity
    
    std::cout << "Infinity values:\n";
    std::cout << "Positive infinity: " << positive_inf << "\n";
    std::cout << "Negative infinity: " << negative_inf << "\n";
    std::cout << "1.0 / 0.0 = " << division_by_zero << "\n";
    std::cout << "Is infinite? " << (std::isinf(positive_inf) ? "yes" : "no") << "\n\n";
    
    // NaN (Not a Number)
    double nan_value = std::numeric_limits<double>::quiet_NaN();
    double invalid_operation = 0.0 / 0.0;  // Results in NaN
    double sqrt_negative = std::sqrt(-1.0); // Results in NaN
    
    std::cout << "NaN (Not a Number) values:\n";
    std::cout << "Quiet NaN: " << nan_value << "\n";
    std::cout << "0.0 / 0.0 = " << invalid_operation << "\n";
    std::cout << "sqrt(-1) = " << sqrt_negative << "\n";
    std::cout << "Is NaN? " << (std::isnan(nan_value) ? "yes" : "no") << "\n\n";
    
    // NaN comparison behavior
    std::cout << "NaN comparison behavior:\n";
    std::cout << "NaN == NaN? " << (nan_value == nan_value ? "true" : "false") << "\n";
    std::cout << "NaN != NaN? " << (nan_value != nan_value ? "true" : "false") << "\n";
    
    return 0;
}

Output:

Special floating-point values:

Infinity values:
Positive infinity: inf
Negative infinity: -inf
1.0 / 0.0 = inf
Is infinite? yes

NaN (Not a Number) values:
Quiet NaN: nan
0.0 / 0.0 = nan
sqrt(-1) = nan
Is NaN? yes

NaN comparison behavior:
NaN == NaN? false
NaN != NaN? true

Practical applications

Mathematical calculations

#include <iostream>
#include <cmath>
#include <iomanip>

int main()
{
    std::cout << "Mathematical calculations with floating-point:\n\n";
    
    // Geometry calculations
    double radius = 5.5;
    double pi = 3.141592653589793;
    
    double circumference = 2 * pi * radius;
    double area = pi * radius * radius;
    double volume = (4.0 / 3.0) * pi * radius * radius * radius;
    
    std::cout << std::fixed << std::setprecision(4);
    std::cout << "Circle/Sphere with radius " << radius << ":\n";
    std::cout << "Circumference: " << circumference << "\n";
    std::cout << "Area: " << area << "\n";
    std::cout << "Volume: " << volume << "\n\n";
    
    // Trigonometric functions
    double angle_degrees = 45.0;
    double angle_radians = angle_degrees * pi / 180.0;
    
    std::cout << "Trigonometry for " << angle_degrees << " degrees:\n";
    std::cout << "sin(" << angle_degrees << "°) = " << std::sin(angle_radians) << "\n";
    std::cout << "cos(" << angle_degrees << "°) = " << std::cos(angle_radians) << "\n";
    std::cout << "tan(" << angle_degrees << "°) = " << std::tan(angle_radians) << "\n\n";
    
    // Exponential and logarithmic functions
    double x = 2.0;
    std::cout << "Exponential and logarithmic functions for x = " << x << ":\n";
    std::cout << "e^x = " << std::exp(x) << "\n";
    std::cout << "ln(x) = " << std::log(x) << "\n";
    std::cout << "log₁₀(x) = " << std::log10(x) << "\n";
    std::cout << "2^x = " << std::pow(2, x) << "\n";
    
    return 0;
}

Financial calculations

#include <iostream>
#include <iomanip>
#include <cmath>

double calculateCompoundInterest(double principal, double rate, int years, int compounds_per_year = 1)
{
    return principal * std::pow(1 + rate / compounds_per_year, compounds_per_year * years);
}

double calculateMonthlyPayment(double principal, double annual_rate, int years)
{
    double monthly_rate = annual_rate / 12.0;
    int num_payments = years * 12;
    
    if (monthly_rate == 0) return principal / num_payments;
    
    return principal * (monthly_rate * std::pow(1 + monthly_rate, num_payments)) / 
           (std::pow(1 + monthly_rate, num_payments) - 1);
}

int main()
{
    std::cout << "Financial calculations:\n\n";
    std::cout << std::fixed << std::setprecision(2);
    
    // Compound interest
    double principal = 10000.0;
    double annual_rate = 0.05;  // 5%
    int years = 10;
    
    double simple_interest = calculateCompoundInterest(principal, annual_rate, years, 1);
    double daily_compound = calculateCompoundInterest(principal, annual_rate, years, 365);
    
    std::cout << "Investment of $" << principal << " at " << (annual_rate * 100) << "% for " << years << " years:\n";
    std::cout << "Annual compounding: $" << simple_interest << "\n";
    std::cout << "Daily compounding: $" << daily_compound << "\n";
    std::cout << "Difference: $" << (daily_compound - simple_interest) << "\n\n";
    
    // Loan payment calculation
    double loan_amount = 250000.0;  // $250,000 mortgage
    double mortgage_rate = 0.035;   // 3.5%
    int loan_years = 30;
    
    double monthly_payment = calculateMonthlyPayment(loan_amount, mortgage_rate, loan_years);
    double total_paid = monthly_payment * loan_years * 12;
    double total_interest = total_paid - loan_amount;
    
    std::cout << "Mortgage calculation:\n";
    std::cout << "Loan amount: $" << loan_amount << "\n";
    std::cout << "Interest rate: " << (mortgage_rate * 100) << "%\n";
    std::cout << "Term: " << loan_years << " years\n";
    std::cout << "Monthly payment: $" << monthly_payment << "\n";
    std::cout << "Total paid: $" << total_paid << "\n";
    std::cout << "Total interest: $" << total_interest << "\n";
    
    return 0;
}

Best practices for floating-point numbers

✅ Choose the right precision

#include <iostream>

int main()
{
    // Use float for memory-constrained applications or when precision isn't critical
    float coordinates[1000][3];  // 3D coordinates for many points
    
    // Use double for most calculations (default choice)
    double temperature = 98.6;
    double scientific_calculation = 2.5e-15 * 1.7e23;
    
    // Use long double for high-precision requirements
    long double pi_high_precision = 3.141592653589793238462643383279502884L;
    
    std::cout << "Precision recommendations:\n";
    std::cout << "float: Graphics, games, large arrays\n";
    std::cout << "double: Most calculations, scientific computing\n";
    std::cout << "long double: High-precision mathematics\n";
    
    return 0;
}

✅ Use proper comparison techniques

#include <iostream>
#include <cmath>

bool isNearlyEqual(double a, double b, double epsilon = 1e-9)
{
    return std::abs(a - b) <= epsilon;
}

bool isNearlyZero(double value, double epsilon = 1e-9)
{
    return std::abs(value) <= epsilon;
}

int main()
{
    double a = 0.1 + 0.2;
    double b = 0.3;
    
    // Good: Use epsilon comparison
    if (isNearlyEqual(a, b))
    {
        std::cout << "Values are nearly equal\n";
    }
    
    // Good: Check for near-zero
    double difference = a - b;
    if (isNearlyZero(difference))
    {
        std::cout << "Difference is nearly zero\n";
    }
    
    return 0;
}

✅ Handle special values

#include <iostream>
#include <cmath>
#include <limits>

void safeOperation(double x, double y)
{
    if (std::isnan(x) || std::isnan(y))
    {
        std::cout << "Error: NaN input detected\n";
        return;
    }
    
    if (std::isinf(x) || std::isinf(y))
    {
        std::cout << "Warning: Infinite input detected\n";
    }
    
    double result = x / y;
    
    if (std::isnan(result))
    {
        std::cout << "Result is NaN (possibly 0/0)\n";
    }
    else if (std::isinf(result))
    {
        std::cout << "Result is infinite (possibly division by zero)\n";
    }
    else
    {
        std::cout << "Result: " << result << "\n";
    }
}

int main()
{
    safeOperation(10.0, 2.0);   // Normal case
    safeOperation(10.0, 0.0);   // Division by zero
    safeOperation(0.0, 0.0);    // 0/0 = NaN
    
    return 0;
}

❌ Common mistakes to avoid

// Bad: Direct equality comparison
if (0.1 + 0.2 == 0.3)  // May fail!

// Bad: Using float for financial calculations
float money = 123.45f;  // Limited precision for money!

// Bad: Assuming exact representation
double x = 0.1;
for (int i = 0; i < 10; ++i)
{
    x += 0.1;  // Accumulating errors!
}
// x might not equal 1.1 exactly

// Bad: Not handling special values
double result = someCalculation();
int converted = static_cast<int>(result);  // What if result is NaN or infinite?

Summary

Floating-point numbers are essential for representing real numbers with fractional parts:

Types:

  • float: 32-bit, ~6-7 decimal digits precision
  • double: 64-bit, ~15-17 decimal digits precision (preferred)
  • long double: Extended precision (platform-dependent)

Key concepts:

  • Limited precision can cause rounding errors
  • Never compare floating-point numbers for exact equality
  • Special values: infinity and NaN (Not a Number)
  • Use appropriate suffixes: f for float, L for long double

Best practices:

  • Use double as default for most calculations
  • Use epsilon-based comparison for equality testing
  • Handle special values (NaN, infinity) appropriately
  • Be aware of precision limitations in critical applications

Quiz

  1. What's the difference between float and double?
  2. Why shouldn't you use == to compare floating-point numbers?
  3. What is NaN and when might it occur?
  4. How many decimal digits of precision does a typical double provide?
  5. What suffix should you use for float literals?

Practice exercises

Try these floating-point exercises:

  1. Write a function to calculate the distance between two points using floating-point coordinates
  2. Create a safe floating-point comparison function that works with different epsilon values
  3. Implement a compound interest calculator that handles edge cases (zero rates, etc.)
  4. Write a program that demonstrates floating-point precision loss with repeated operations

Continue Learning

Explore other available lessons while this one is being prepared.

View Course

Explore More Courses

Discover other available courses while this lesson is being prepared.

Browse Courses

Lesson Discussion

Share your thoughts and questions

💬

No comments yet. Be the first to share your thoughts!

Sign in to join the discussion