C-PLUS-PLUS

Notes on C++

By Andres Jaimes

June 16, 2023 - 13 minutes read - 2559 words

All the examples use C++17

Sort vector by property in class

In this example, we defined a class called open_order that has a long long open_time field, and a constructor.

To sort the vector of open_order objects based on the open_time field, we define a comparison function compare_by_open_time that compares two open_order objects based on their open_time values. The function returns true if the open_time of the first object is less than the open_time of the second object, which results in sorting the vector in ascending order of open_time.

#include <iostream>
#include <vector>
#include <algorithm>

class open_order {
public:
    long long open_time;
    // Other fields and methods of the class

    open_order(long long time) : open_time(time) {}
};

bool compare_by_open_time(const open_order& order1, const open_order& order2) {
    return order1.open_time < order2.open_time;
}

int main() {
    // Create a vector of open_order objects
    std::vector<open_order> orders;
    
    // Add open_order objects with different open_time values
    orders.push_back(open_order(1623715800));
    orders.push_back(open_order(1623715700));
    orders.push_back(open_order(1623715900));
    orders.push_back(open_order(1623715600));
    orders.push_back(open_order(1623716000));

    // Sort the vector based on open_time
    std::sort(orders.begin(), orders.end(), compare_by_open_time);

    // Print the sorted vector
    std::cout << "Sorted orders by open_time:" << std::endl;
    for (const auto& order : orders) {
        std::cout << "Open Time: " << order.open_time << std::endl;
    }

    return 0;
}

Sorted orders by open_time:
Open Time: 1623715600
Open Time: 1623715700
Open Time: 1623715800
Open Time: 1623715900
Open Time: 1623716000

Smart pointers

unique_ptr

unique_ptr's own the object exclusively and ensure it is deleted when the std::unique_ptr goes out of scope or is explicitly reset. It provides a lightweight alternative to raw pointers when you need exclusive ownership.

#include <cassert>
#include <iostream>
#include <memory>
#include <string>

using namespace std;

void print(std::unique_ptr<std::string>& p) {
    cout << "ptr, from print: " << *p << std::endl;
}

void print_str(std::string& p) {
    cout << "ptr, from print_str: " << p << std::endl;
}

void modify(std::unique_ptr<std::string>& p) {
    *p += ", world";
}

void move_ownership(std::unique_ptr<std::string> p) {
    cout << "ptr, from move_ownership: " << *p << std::endl;
}

int main()
{
    std::unique_ptr<std::string> ptr = std::make_unique<std::string>("hi");
    cout << "ptr: " << *ptr << std::endl;
    // update value and pass as reference
    *ptr = "hello";
    print(ptr);
    // pass a reference to the string
    print_str(*ptr);
    // we still have access here, because we passed as reference
    cout << "ptr, after calling print: " << *ptr << std::endl;
    // pass as reference, and update the value
    modify(ptr);
    cout << "ptr, after calling modify: " << *ptr << std::endl;
    // transfer ownership
    move_ownership(move(ptr));
    // function main does not have ownership any more
    assert(ptr == nullptr);

    return 0;
}

ptr: hi
ptr, from print: hello
ptr, from print_str: hello
ptr, after calling print: hello
ptr, after calling modify: hello, world
ptr, from move_ownership: hello, world

shared_ptr

std::shared_ptr's pointers provide shared ownership of an object. They keep track of the number of std::shared_ptr instances that point to the same object. The object is deleted only when the last std::shared_ptr pointing to it is destroyed or reset.

#include <cassert>
#include <iostream>
#include <memory>
#include <string>

using namespace std;

void print(shared_ptr<string>& p) {
    cout << "ptr, from print: " << *p << ", ref count: " << p.use_count() << endl;
}

void print_str(string& s) {
    cout << "ptr, from print_str: " << s << endl;
}

void modify(shared_ptr<string>& p) {
    *p += ", world";
}

void share_ownership(shared_ptr<string> p) {
    cout << "ptr, from share_ownership: " << *p << ", ref count: " << p.use_count() << endl;
}

int main() {
    
    shared_ptr<string> ptr = make_shared<string>("hi");
    cout << "ptr: " << *ptr << ", ref count: " << ptr.use_count() << endl;
    // update value and pass as reference
    *ptr = "hello";
    print(ptr);
    // pass a reference to the string
    print_str(*ptr);
    // pass as reference, and update the value
    modify(ptr);
    cout << "ptr, after calling modify: " << *ptr << ", ref count: " << ptr.use_count() << endl;
    // share ownership
    share_ownership(ptr);
    // ref count after share_ownership's p argument is out of context
    cout << "ptr, after calling share_ownership: " << *ptr << ", ref count: " << ptr.use_count() << endl;
    // transfer ownership
    share_ownership(move(ptr));
    assert(ptr == nullptr);
    
    return 0;
}

ptr: hi, ref count: 1
ptr, from print: hello, ref count: 1
ptr, from print_str: hello
ptr, after calling modify: hello, world, ref count: 1
ptr, from share_ownership: hello, world, ref count: 2
ptr, after calling share_ownership: hello, world, ref count: 1
ptr, from share_ownership: hello, world, ref count: 1

std_weak_ptr

std::weak_ptr's pointers provide a non-owning, weak reference to an object managed by a std::shared_ptr. They allow you to access the object but do not contribute to its reference count. They are used to prevent circular dependencies and avoid potential memory leaks.

Create an object in function and return it

Question

What’s the preferred/recommended way to create an object in a function/method and return it to be used outside the creation functions scope?

Option 1 (return unique_ptr)

pros: function is pure and does not change input params
cons: is this an unnecessarily complicated solution?

std::unique_ptr<SomeClass> createSometing(){
    auto s = std::make_unique<SomeClass>();
    return s;
}

Option 2 (pass result as a reference parameter)

pros: simple and does not involve pointers
cons: input parameter is changed (makes function less pure and more unpredictable - the result reference param could be changed anywhere within the function and it could get hard/messy to track in larger functions).

void createSometing(SomeClass& result){
    SomeClass s;
    result = s;
}

Option 3 (return by value - involves copying)

pros: simple and clear
cons: involves copying an object - which could be expensive. But is this ok? - Note: This is not true anymore, see Copy Elision below in this article.

SomeClass createSometing(){
    SomeClass s;
    return s;
}

Answer

There isn’t a single right answer and it depends on the situation and personal preference to some extent. Here are pros and cons of different approaches.

Just declare it

SomeClass foo(arg1, arg2);

Factory functions should be relatively uncommon and only needed if the code creating the object doesn’t have all the necessary information to create it (or shouldn’t, due to encapsulation reasons). Perhaps it’s more common in other languages to have factory functions for everything, but instantiating objects directly should be the first pick.

Return by value

SomeClass createSomeClass();

The first question is whether you want the resulting object to live on the stack or the heap. The default for small objects is the stack, since it’s more efficient as you skip the call to malloc(). With Return Value Optimization usually there’s no copy.

Return by pointer

std::unique_ptr<SomeClass> createSomeClass();

SomeClass* createSomeClass();

Reasons you might pick this include being a large object that you want to be heap allocated; the object is created out of some data store and the caller won’t own the memory; you want a nullable return type to signal errors.

Out parameter

bool createSomeClass(SomeClass&);

Main benefits of using out parameters are when you have multiple return types. For example, you might want to return true/false for whether the object creation succeeded (e.g. if your object doesn’t have a valid “unset” state, like an integer). You might also have a factory function that returns multiple things, e.g.

void createUserAndToken(User& user, Token& token);

In summary, I’d say by default, go with return by value. Do you need to signal failure? Out parameter or pointer. Is it a large object that lives on the heap, or some other data structure and you’re giving out a handle? Return by pointer. If you don’t strictly need a factory function, just declare it.

References

Ari. (2022, January 4). “Function returning - unique_ptr vs passing result as parameter vs returning by value.” Stack Overflow. https://stackoverflow.com/a/70581632/2079513

Copy elision

In C++17, a feature called “guaranteed copy elision” was introduced. This feature allows the optimization of copy operations in certain scenarios, eliminating the need for unnecessary copies or moves.

some_class createSometing(){
    some_class s;
    return s;
}

Prior to C++17, this code would involve the creation of a temporary copy of s during the return statement. However, with guaranteed copy elision, the copy/move operation is elided, and s is directly constructed at the caller’s location. This optimization is possible when the object being returned is a named variable or a temporary that is being returned directly, as in this case.

The optimization applies not only to return statements but also to other situations where objects are being constructed, such as when passing arguments to functions by value or initializing variables, for example:

#include <iostream>

struct some_class {
    some_class() { std::cout << "constructor called" << std::endl; }
    some_class(const some_class&) { std::cout << "copy constructor called" << std::endl; }
};

void foo(some_class sc) {
    // do something
}

int main() {
    some_class s;
    foo(s);
    return 0;
}

Prior to C++17, this would involve invoking the copy constructor to create a copy of s when calling foo(s). However, with copy elision, the copy constructor is elided, and s is directly constructed at the call site of foo.

When you run this code in C++17 or a later version, you’ll see that only the constructor is called, indicating that the copy operation was eliminated:

constructor called

It’s important to note that this optimization is specific to C++17 and later versions.

Using references in for loops

In the first loop, number is a copy of each element in the vector, so modifying number will not affect the original elements. In the second loop, number is a reference to the elements, so any modifications to number will be reflected in the original vector.

Using references can be beneficial if we want to modify the elements of a container directly or if we are working with large objects where avoiding unnecessary copies is desired. However, if we only need to read the elements or if we are dealing with small objects where copying is cheap, using references may not be necessary.

std::vector<int> numbers {1, 2, 3, 4, 5};

// Without references
for (auto number : numbers) {
    // 'number' is a copy of the element
    // Modifications to 'number' won't affect the original element
}

// With references
for (auto& number : numbers) {
    // 'number' is a reference to the element
    // Modifications to 'number' will affect the original element
}

Futures

There are three possible launch policies that can be specified when calling std::async:

std::launch::async: This policy indicates that the function should be executed asynchronously in a separate thread. The exact behavior is implementation-dependent, but it typically results in the function running in a separate thread concurrently with the caller.
std::launch::deferred: This policy indicates that the function should be executed lazily when the std::future object’s get() or wait() member function is called. It may be executed in the same thread as the caller or deferred until explicitly requested. This policy defers the execution until the result is needed.
std::launch::async | std::launch::deferred: This policy allows the implementation to choose the most appropriate execution mode, either asynchronous or deferred. The decision is typically based on factors such as system load or other implementation-specific considerations.

By default, if no launch policy is specified, std::launch::async | std::launch::deferred is used.

#include <iostream>
#include <thread>
#include <future>

int calculateSum(int a, int b) {
    std::cout << "Calculating sum..." << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(3));  // Simulating a time-consuming task
    return a + b;
}

int main() {
    // Create a future that will hold the result of the asynchronous task
    std::future<int> futureSum = std::async(std::launch::async, calculateSum, 2, 3);

    std::this_thread::sleep_for(std::chrono::seconds(1));  // Simulating some additional processing
    // Do some other work while the calculation is in progress
    std::cout << "Performing other tasks..." << std::endl;

    // Wait for the future to be ready and retrieve the result
    int result = futureSum.get();

    // Print the result
    std::cout << "The sum is: " << result << std::endl;

    return 0;
}

Output:

Calculating sum...
Performing other tasks...
The sum is: 5

Use a future’s result in another future

The same principle applies if we have to use a future’s output in another future. For example,

#include <iostream>
#include <future>
#include <thread>

int some_task(int taskId, int result) {
    std::this_thread::sleep_for(std::chrono::seconds(2));
    return taskId * result;
}

int main() {
    // Create the first future
    std::future<int> dependency_future = std::async(std::launch::async, some_task, 2, 3);

    // Wait for the dependency future to complete and retrieve the result
    int dependency_result = dependency_future.get();
    std::cout << "Dependency result: " << dependency_result << std::endl;

    // Create the dependent futures using the dependency result
    std::future<int> dependent_fut_1 = std::async(std::launch::async, some_task, 4, dependency_result);
    std::future<int> dependent_fut_2 = std::async(std::launch::async, some_task, 5, dependency_result);

    // Wait for the dependent futures to complete
    int result1 = dependent_fut_1.get();
    int result2 = dependent_fut_2.get();

    std::cout << "Dependent task 1 result: " << result1 << std::endl;
    std::cout << "Dependent task 2 result: " << result2 << std::endl;

    return 0;
}

Output:

Dependency result: 6
Dependent task 1 result: 24
Dependent task 2 result: 30

Wait for multiple futures

We can leverage on vectors to wait for multiple futures to complete. For example,

#include <iostream>
#include <vector>
#include <future>
#include <thread>

int perform_task(int taskId) {
    std::this_thread::sleep_for(std::chrono::seconds(2));
    return taskId * taskId;
}

int main() {
    std::vector<std::future<int>> futures;

    // Create and store multiple futures
    for (int i = 0; i < 5; ++i) {
        futures.push_back(std::async(std::launch::async, perform_task, i));
    }

    // Wait for all the futures to complete
    for (auto& future : futures) {
        int result = future.get();
        std::cout << "Task result: " << result << std::endl;
        // we could collect the results into another vector for further processing
    }

    return 0;
}

Output:

Task result: 0
Task result: 1
Task result: 4
Task result: 9
Task result: 16

Directory organization

This is a common directory organization for C++ projects:

project/
├─ include/
│  ├─ module1/
│  │  ├─ header1.h
│  │  ├─ header2.h
│  │  └─ ...
│  ├─ module2/
│  │  ├─ header1.h
│  │  ├─ header2.h
│  │  └─ ...
│  └─ ...
├─ src/
│  ├─ module1/
│  │  ├─ implementation1.cpp
│  │  ├─ implementation2.cpp
│  │  └─ ...
│  ├─ module2/
│  │  ├─ implementation1.cpp
│  │  ├─ implementation2.cpp
│  │  └─ ...
│  └─ ...
├─ lib/
│  ├─ library1/
│  │  ├─ lib1.a
│  │  ├─ lib2.a
│  │  └─ ...
│  ├─ library2/
│  │  ├─ lib1.a
│  │  ├─ lib2.a
│  │  └─ ...
│  └─ ...
├─ bin/
│  ├─ executable1
│  ├─ executable2
│  └─ ...
├─ tests/
│  ├─ module1/
│  │  ├─ test1.cpp
│  │  ├─ test2.cpp
│  │  └─ ...
│  ├─ module2/
│  │  ├─ test1.cpp
│  │  ├─ test2.cpp
│  │  └─ ...
│  └─ ...
├─ docs/
├─ build/ (generated build files)
└─ Makefile

Common directories:

include: Contains header files (.hpp) used by the project. Headers for each module or library are organized into separate subdirectories.
src: Holds the implementation files (.cpp) for the project. Similar to the include directory, each module or library typically has its own subdirectory.
lib: Contains external libraries or pre-compiled libraries (.a files). Subdirectories are used to organize different libraries.
bin: Holds the compiled executable files generated by the project.
tests: Contains test files (.cpp) used for testing the project’s functionality. Organized similar to the src directory.
docs: Can be used to store documentation related to the project.
build: Generated directory where object files, dependency files, and other build-related files are stored. This directory is often created by build systems like Make or CMake.

Common questions

Should all header files (.h) be placed in the include folder?

Not all of them. Only those in the public domain. If a class or a function is specific to a module, it should be inside the module (src directory). The include folder should contain headers that can be included by any other module.