std::string_view (part 2)

In prior lessons, we introduced two string types: std::string and std::string_view.

Because std::string_view is our first encounter with a view type, we're going to spend additional time discussing it further. We will focus on how to use std::string_view safely, and provide examples illustrating how it can be used incorrectly. We'll conclude with guidelines on when to use std::string vs std::string_view.

An introduction to owners and viewers

Let's sidebar into an analogy. Say you've decided to paint a picture of a legendary sword. But you don't have a sword! What are you to do?

Well, you could go to the local weapon shop and buy one. You would own that sword. This has some benefits: you now have a sword that you can wield. You can guarantee the sword will always be available when you want it. You can modify it, repair it, or display it. There are also downsides. Swords are expensive. And if you buy one, you are now responsible for it. You have to periodically maintain it. And when you eventually decide you don't want it anymore, you have to properly dispose of it.

Ownership can be expensive. As an owner, it is your responsibility to acquire, manage, and properly dispose of the objects you own.

On your way out of the shop, you glance through a window. You notice that the guard has leaned their sword against the wall across from your window. You could just paint a picture of the guard's sword (as seen from your window) instead. There are lots of benefits to this choice. You save the expense of having to acquire your own sword. You don't have to maintain it. Nor are you responsible for disposing of it. When you are done viewing, you can just close your curtains and move on with your life. This ends your view of the object, but the object itself is not affected by this. There are also potential downsides. You can't modify or customize the guard's sword. And while you are viewing the sword, the guard may decide to modify it, or move it out of your view altogether. You may end up with a view of something unexpected instead.

Viewing is inexpensive. As a viewer, you have no responsibility for the objects you are viewing, but you also have no control over those objects.

std::string is a (sole) owner

You might be wondering why std::string makes an expensive copy of its initializer. When an object is instantiated, memory is allocated for that object to store whatever data it needs throughout its lifetime. This memory is reserved for the object and guaranteed to exist for as long as the object does. It is a safe space. std::string (and most other objects) copy the initialization value into this memory so they can have their own independent value to access and manipulate later. Once the initialization value has been copied, the object is no longer reliant on the initializer in any way.

And that's a good thing, because the initializer generally can't be trusted after initialization is complete. If you imagine the initialization process as a function call that initializes the object, who is passing in the initializer? The caller. When initialization is done, control returns back to the caller. At this point, the initialization statement is complete, and one of two things will typically happen:

  • If the initializer was a temporary value or object, that temporary will be destroyed immediately
  • If the initializer was a variable, the caller still has access to that object. The caller can then do whatever it wants with the object, including modify or destroy it
Key Concept
An initialized object has no control over what happens to the initialization value after initialization is finished.

Because std::string makes its own copy of the initializer, it doesn't have to worry about what happens to the initializer after initialization is finished. The initializer can be destroyed or modified, and it doesn't affect the std::string. The downside is that this independence comes with the cost of making an expensive copy.

In the context of our analogy, std::string is an owner - it is responsible for acquiring its string data from the initializer, managing access to the string data, and properly disposing of the string data when the std::string object is destroyed.

Key Concept
In programming, when we call an object an owner, we generally mean that it is the sole owner (unless otherwise specified). Sole ownership (also called single ownership) ensures it is clear who has responsibility for that data.

We don't always need a copy

Let's revisit this example from the prior lesson:

#include <iostream>
#include <string>

void displayWeapon(std::string weapon) // weapon makes a copy of its initializer
{
    std::cout << weapon << '\n';
}

int main()
{
    std::string primaryWeapon { "Flaming Sword" };
    displayWeapon(primaryWeapon);

    return 0;
}

When displayWeapon(primaryWeapon) is called, weapon makes an expensive copy of primaryWeapon. The function prints the copied string and then destroys it.

Note that primaryWeapon is already holding the string we want to print. Could we just use the string that primaryWeapon is holding instead of making a copy? The answer is possibly - there are three criteria we need to assess:

  • Could primaryWeapon be destroyed while weapon is still using it? No, weapon dies at the end of the function, and primaryWeapon exists in the scope of the caller and can't be destroyed before the function returns.
  • Could primaryWeapon be modified while weapon is still using it? No, weapon dies at the end of the function, and the caller has no opportunity to modify primaryWeapon before the function returns.
  • Does weapon modify the string in some way that the caller would not expect? No, the function does not modify the string at all.

Since all three of these criteria are false, there is no risk in using the string that primaryWeapon is holding instead of making a copy. And since string copies are expensive, why pay for one that we don't need?

std::string_view is a viewer

std::string_view takes a different approach to initialization. Instead of making an expensive copy of the initialization string, std::string_view creates an inexpensive view of the initialization string. The std::string_view can then be used whenever access to the string is required.

In the context of our analogy, std::string_view is a viewer. It views an object that already exists elsewhere, and cannot modify that object. When the view is destroyed, the object being viewed is not affected. Having multiple viewers viewing an object simultaneously is fine.

It is important to note that a std::string_view remains dependent on the initializer through its lifetime. If the string being viewed is modified or destroyed while the view is still being used, unexpected or undefined behavior will result.

Whenever we use a view, it is up to us to ensure these possibilities do not occur.

Warning
A view is dependent on the object being viewed. If the object being viewed is modified or destroyed while the view is still being used, unexpected or undefined behavior will result.

A std::string_view that is viewing a string that has been destroyed is sometimes called a dangling view.

std::string_view is best used as a read-only function parameter

The best use for std::string_view is as a read-only function parameter. This allows us to pass in a C-style string, std::string, or std::string_view argument without making a copy, as the std::string_view will create a view to the argument.

#include <iostream>
#include <string>
#include <string_view>

void displayWeapon(std::string_view weapon) // now a std::string_view, creates a view of the argument
{
    std::cout << weapon << '\n';
}

int main()
{
    displayWeapon("Flaming Sword"); // call with C-style string literal

    std::string primaryWeapon { "Battle Axe" };
    displayWeapon(primaryWeapon); // call with std::string

    std::string_view secondaryWeapon { primaryWeapon };
    displayWeapon(secondaryWeapon); // call with std::string_view

    return 0;
}

Because the weapon function parameter is created, initialized, used, and destroyed before control returns to the caller, there is no risk that the string being viewed (the function argument) will be modified or destroyed before our weapon parameter.

Should I prefer std::string_view or const std::string& function parameters?

Prefer std::string_view in most cases. We cover this topic further in the Pass by const lvalue reference lesson.

Improperly using std::string_view

Let's examine a few cases where misusing std::string_view gets us into trouble.

Here's our first example:

#include <iostream>
#include <string>
#include <string_view>

int main()
{
    std::string_view weaponView {};

    { // create a nested block
        std::string weapon { "Mystic Staff" }; // create a std::string local to this nested block
        weaponView = weapon; // weaponView is now viewing weapon
    } // weapon is destroyed here, so weaponView is now viewing an invalid string

    std::cout << weaponView << '\n'; // undefined behavior

    return 0;
}

In this example, we're creating std::string weapon inside a nested block. Then we set weaponView to view weapon. weapon is then destroyed at the end of the nested block. weaponView doesn't know that weapon has been destroyed. When we then use weaponView, we are accessing an invalid object, and undefined behavior results.

Here's another variant of the same issue, where we initialize a std::string_view with the std::string return value of a function:

#include <iostream>
#include <string>
#include <string_view>

std::string generateWeapon()
{
    std::string weapon { "Lightning Spear" };
    return weapon;
}

int main()
{
    std::string_view weaponView { generateWeapon() }; // weaponView initialized with return value of function
    std::cout << weaponView << '\n'; // undefined behavior

    return 0;
}

This behaves similarly to the prior example. The generateWeapon() function is returning a std::string containing the string "Lightning Spear". Return values are temporary objects that are destroyed at the end of the full expression containing the function call. We must either use this return value immediately, or copy it to use later.

But std::string_view doesn't make copies. Instead, it creates a view to the temporary return value, which is then destroyed. That leaves our std::string_view dangling (viewing an invalid object), and printing the view results in undefined behavior.

The following is a less-obvious variant of the above:

#include <iostream>
#include <string>
#include <string_view>

int main()
{
    using namespace std::string_literals;
    std::string_view weaponView { "Frost Blade"s }; // "Frost Blade"s creates a temporary std::string
    std::cout << weaponView << '\n'; // undefined behavior

    return 0;
}

A std::string literal (created via the s literal suffix) creates a temporary std::string object. So in this case, "Frost Blade"s creates a temporary std::string, which we then use as the initializer for weaponView. At this point, weaponView is viewing the temporary std::string. Then the temporary std::string is destroyed, leaving weaponView dangling. We get undefined behavior when we then use weaponView.

Warning
Do not initialize a `std::string_view` with a `std::string` literal, as this will leave the `std::string_view` dangling.

It is okay to initialize a std::string_view with a C-style string literal or a std::string_view literal. It's also okay to initialize a std::string_view with a C-style string object, a std::string object, or a std::string_view object, as long as that string object outlives the view.

We can also get undefined behavior when the underlying string is modified:

#include <iostream>
#include <string>
#include <string_view>

int main()
{
    std::string weapon { "Silver Dagger" };
    std::string_view weaponView { weapon }; // weaponView is now viewing weapon

    weapon = "Golden Axe";         // modifies weapon, which invalidates weaponView (weapon is still valid)
    std::cout << weaponView << '\n'; // undefined behavior

    return 0;
}

In this example, weaponView is again set to view weapon. weapon is then modified. When a std::string is modified, any views into that std::string are likely to be invalidated, meaning those views are now invalid or incorrect. Using an invalidated view will result in undefined behavior.

Advanced note: If the std::string reallocates memory to accommodate new data, it returns the old memory to the operating system, leaving the std::string_view dangling. If the std::string doesn't reallocate, it overwrites the old data at the same memory address. The std::string_view will view the new data but won't realize the length changed, potentially viewing a substring (if new string is longer) or superstring with garbage characters (if new string is shorter).

Key Concept
Modifying a `std::string` is likely to invalidate all views into that `std::string`.

Revalidating an invalid std::string_view

Invalidated objects can often be revalidated (made valid again) by setting them back to a known good state. For an invalidated std::string_view, we can do this by assigning the invalidated std::string_view object a valid string to view.

Here's the same example as prior, but we'll revalidate weaponView:

#include <iostream>
#include <string>
#include <string_view>

int main()
{
    std::string weapon { "Silver Dagger" };
    std::string_view weaponView { weapon }; // weaponView is now viewing weapon

    weapon = "Golden Axe";         // modifies weapon, which invalidates weaponView (weapon is still valid)
    std::cout << weaponView << '\n'; // undefined behavior

    weaponView = weapon;           // revalidate weaponView: weaponView is now viewing weapon again
    std::cout << weaponView << '\n'; // prints "Golden Axe"

    return 0;
}

After weaponView is invalidated by the modification of weapon, we revalidate weaponView via the statement weaponView = weapon, which causes weaponView to become a valid view of weapon again. When we print weaponView the second time, it prints "Golden Axe".

Be careful returning a std::string_view

std::string_view can be used as the return value of a function. However, this is often dangerous.

Because local variables are destroyed at the end of the function, returning a std::string_view that is viewing a local variable will result in the returned std::string_view being invalid, and further use of that std::string_view will result in undefined behavior. For example:

#include <iostream>
#include <string>
#include <string_view>

std::string_view getStatusText(bool hasWon)
{
    std::string victory { "Victory!" };  // local variable
    std::string defeat { "Defeat!" }; // local variable

    if (hasWon)
        return victory;  // return a std::string_view viewing victory

    return defeat; // return a std::string_view viewing defeat
} // victory and defeat are destroyed at the end of the function

int main()
{
    std::cout << getStatusText(true) << ' ' << getStatusText(false) << '\n'; // undefined behavior

    return 0;
}

In the above example, when getStatusText(true) is called, the function returns a std::string_view that is viewing victory. However, victory is destroyed at the end of the function. This means the returned std::string_view is viewing an object that has been destroyed. So when the returned std::string_view is printed, undefined behavior results.

Your compiler may or may not warn you about such cases.

There are two main cases where a std::string_view can be returned safely. First, because C-style string literals exist for the entire program, it's fine (and useful) to return C-style string literals from a function that has a return type of std::string_view.

#include <iostream>
#include <string_view>

std::string_view getStatusText(bool hasWon)
{
    if (hasWon)
        return "Victory!";  // return a std::string_view viewing "Victory!"

    return "Defeat!"; // return a std::string_view viewing "Defeat!"
} // "Victory!" and "Defeat!" are not destroyed at the end of the function

int main()
{
    std::cout << getStatusText(true) << ' ' << getStatusText(false) << '\n'; // ok

    return 0;
}

This prints:

Victory! Defeat!

When getStatusText(true) is called, the function will return a std::string_view viewing the C-style string "Victory!". Because "Victory!" exists for the entire program, there's no problem when we use the returned std::string_view to print "Victory!" within main().

Second, it is generally okay to return a function parameter of type std::string_view:

#include <iostream>
#include <string>
#include <string_view>

std::string_view selectBetterWeapon(std::string_view weapon1, std::string_view weapon2)
{
    if (weapon1 < weapon2)
        return weapon1;
    return weapon2;
}

int main()
{
    std::string primary { "Sword" };
    std::string secondary { "Axe" };

    std::cout << selectBetterWeapon(primary, secondary) << '\n'; // prints "Axe"

    return 0;
}

It may be less obvious why this is okay. First, note that arguments primary and secondary exist in the scope of the caller. When the function is called, function parameter weapon1 is a view into primary, and function parameter weapon2 is a view into secondary. When the function returns either weapon1 or weapon2, it is returning a view into primary or secondary back to the caller. Since primary and secondary still exist at this point, it's fine to use the returned std::string_view into primary or secondary.

There is one important subtlety here. If the argument is a temporary object (that will be destroyed at the end of the full expression containing the function call), the std::string_view return value must be used in the same expression. After that point, the temporary is destroyed and the std::string_view is left dangling.

Warning
If an argument is a temporary that is destroyed at the end of the full expression containing the function call, the returned `std::string_view` must be used immediately, as it will be left dangling after the temporary is destroyed.

View modification functions

Consider a window in your fortress, looking at a treasure chest sitting in the courtyard. You can look through the window and see the chest, but you can't touch or move the chest. Your window just provides a view to the chest, which is a completely separate object.

Many windows have curtains, which allow us to modify our view. We can close either the left or right curtain to reduce what we can see. We don't change what's outside, we just reduce the visible area.

Because std::string_view is a view, it contains functions that let us modify our view by "closing the curtains". This does not modify the string being viewed in any way, just the view itself.

  • The remove_prefix() member function removes characters from the left side of the view
  • The remove_suffix() member function removes characters from the right side of the view
#include <iostream>
#include <string_view>

int main()
{
    std::string_view weapon { "Flaming" };
    std::cout << weapon << '\n';

    // Remove 1 character from the left side of the view
    weapon.remove_prefix(1);
    std::cout << weapon << '\n';

    // Remove 2 characters from the right side of the view
    weapon.remove_suffix(2);
    std::cout << weapon << '\n';

    weapon = "Flaming"; // reset the view
    std::cout << weapon << '\n';

    return 0;
}

This program produces the following output:

Flaming
laming
lami
Flaming

Unlike real curtains, once remove_prefix() and remove_suffix() have been called, the only way to reset the view is by reassigning the source string to it again.

std::string_view can view a substring

This brings up an important use of std::string_view. While std::string_view can be used to view an entire string without making a copy, they are also useful when we want to view a substring without making a copy. A substring is a contiguous sequence of characters within an existing string. For example, given the string "thunderbolt", some substrings are "thunder", "bolt", and "under". "thorn" is not a substring of "thunderbolt" because these characters do not appear contiguously in "thunderbolt".

std::string_view may or may not be null-terminated

The ability to view just a substring of a larger string comes with one consequence of note: a std::string_view may or may not be null-terminated.

Consider the string "thunderbolt", which is null-terminated (because it is a C-style string literal, which are always null-terminated). If a std::string_view views the whole string, then it is viewing a null-terminated string. However, if std::string_view is only viewing the substring "under", then that substring is not null-terminated (the next character is a 'b').

Key Concept
A C-style string literal and a `std::string` are always null-terminated. A `std::string_view` may or may not be null-terminated.

In almost all cases, this doesn't matter - a std::string_view keeps track of the length of the string or substring it is viewing, so it doesn't need the null-terminator. Converting a std::string_view to a std::string will work regardless of whether or not the std::string_view is null-terminated.

Warning
Take care not to write any code that assumes a `std::string_view` is null terminated.

Tip: If you have a non-null-terminated std::string_view and you need a null-terminated string for some reason, convert the std::string_view into a std::string.

A quick guide on when to use std::string vs std::string_view

This guide is not meant to be comprehensive, but is intended to highlight the most common cases:

Variables

Use a std::string variable when:

  • You need a string that you can modify
  • You need to store user-inputted text
  • You need to store the return value of a function that returns a std::string

Use a std::string_view variable when:

  • You need read-only access to part or all of a string that already exists elsewhere and will not be modified or destroyed before use of the std::string_view is complete
  • You need a symbolic constant for a C-style string
  • You need to continue viewing the return value of a function that returns a C-style string or a non-dangling std::string_view

Function parameters

Use a std::string function parameter when:

  • The function needs to modify the string passed in as an argument without affecting the caller. This is rare.
  • You are using language standard C++14 or older and aren't comfortable using references yet

Use a std::string_view function parameter when:

  • The function needs a read-only string
  • The function needs to work with non-null-terminated strings

Use a const std::string& function parameter when:

  • You are using language standard C++14 or older, and the function needs a read-only string to work with (as std::string_view is not available until C++17)
  • You are calling other functions that require a const std::string, const std::string&, or const C-style string (as std::string_view may not be null-terminated)

Use a std::string& function parameter when:

  • You are using a std::string as an out-parameter
  • You are calling other functions that require a std::string&, or non-const C-style string

Return types

Use a std::string return type when:

  • The return value is a std::string local variable or function parameter
  • The return value is a function call or operator that returns a std::string by value

Use a std::string_view return type when:

  • The function returns a C-style string literal or local std::string_view that has been initialized with a C-style string literal
  • The function returns a std::string_view parameter
  • Writing an accessor for a std::string_view member

Use a std::string& return type when:

  • The function returns a std::string& parameter

Use a const std::string& return type when:

  • The function returns a const std::string& parameter
  • Writing an accessor for a std::string or const std::string member
  • The function returns a static (local or global) const std::string

Insights

Things to remember about std::string:

  • Initializing and copying std::string is expensive, so avoid this as much as possible
  • Avoid passing std::string by value, as this makes a copy
  • If possible, avoid creating short-lived std::string objects
  • Modifying a std::string will invalidate any views to that string
  • It is okay to return a local std::string by value

Things to remember about std::string_view:

  • std::string_view is typically used for passing string function parameters and returning string literals
  • Because C-style string literals exist for the entire program, it is always okay to set a std::string_view to a C-style string literal
  • When a string is destroyed, all views to that string are invalidated
  • Using an invalidated view (other than using assignment to revalidate the view) will cause undefined behavior
  • A std::string_view may or may not be null-terminated

Summary

  • Ownership vs viewing: std::string is an owner that makes expensive copies, while std::string_view is a viewer that creates inexpensive views
  • Sole ownership: An owner is responsible for acquiring, managing, and disposing of its data
  • View dependency: A std::string_view remains dependent on the object being viewed throughout its lifetime
  • Dangling views: A std::string_view viewing a destroyed string is called a dangling view and causes undefined behavior
  • Best use case: Use std::string_view as a read-only function parameter to avoid expensive copies
  • Invalidation: Destroying or modifying a string invalidates all views to that string
  • Common mistakes: Initializing std::string_view with temporary std::string objects (including std::string literals with s suffix), viewing local variables that go out of scope, or using views after the underlying string is modified
  • Revalidation: Invalidated views can be revalidated by assigning them a valid string to view
  • Return safety: Safe to return C-style string literals or function parameters as std::string_view, but dangerous to return local variables
  • View modification: remove_prefix() and remove_suffix() modify the view (not the underlying string) to show a substring
  • Null-termination: std::string_view may or may not be null-terminated, unlike std::string which is always null-terminated
  • Guideline: Use std::string when you need ownership or modification, use std::string_view for read-only access to existing strings

Understanding the ownership model and view lifetime is crucial for using std::string_view safely. When used correctly as a function parameter, std::string_view eliminates expensive string copies while maintaining flexibility.