C++

Code Style

Always remember that code interpretability and maintainability trumps everything else.
Please conform to the Google C++ Style Guide.
Avoid using Class Inheritance or Friendship features in C++ codes. Inheritance and Friendship reduces code traceability.
In general macros should not be used.
Wherever possible, follow C style instead of C++, as it makes the code more aligned with the driving philosophies of Golang.

Naming

C++ files should end in .cc and header files should end in .h.

Naming guide:

Type	Format	Example
Filename	lowercase.cc	myusefulclass.cc
Namespace Names	lower_with_under	using namespace websearch::index_util
Typedef	CapWords	typedef struct _MyClass MyClass;
Enumerator	CapWords	enum Color {red, yellow};
Aliases	CapWords	using Calc = void (*) (int, int);
Constants	kCapWords	const int kDaysInWeek = 7;
Variables	lower_with_under	std::string table_name;
Function Names	CapWords	void MyFunction(){};
Function Parameters	lower_with_under	void my_function(int max_count){};
Structures	CapWords	struct UrlTableProperties{};
Struct Data Member	lower_with_under	int num_entries;
Classes	CapWords	class UrlTableTester{};
Class Data Member	lower_with_under_	std::string table_name_;

Code Documentation - Doxygen

Install Doxygen
```
 $ apt-get install doxygen
```

Folder structure

The following serves as a representative project folder structure.

cppprocessing                  # Main project directory 
├── apps                       # Folder for executable main files
|   ├── CMakeLists.txt            
|   └── tutorial.cpp           # Executable main file 
├── assets                     # Assets
|   ├── ...
|   ... 
├── bin                        # Binary files
├── build                      # CMake build system files 
├── include                    # Headers
|   ├── computepi.h
|   └── helloworld.h   
├── libs                       # Local libraries
|   ├── computepi
|   |   ├── CMakeLists.txt
|   |   └── computepi.cpp 
|   └── helloworld    
|       ├── CMakeLists.txt
|       └── computepi.cpp 
├── CMakeLists.txt 
├── docker-compose.yml
├── Dockerfile                            
└── README.md                  # Readme file

apps folder may contain one or more main files.
assets folder should contain miscellaneous files. For example, it can contain images used for explanation in Readme.md.
bin folder may contain one or more binary files, which correspond to the main files in apps folder.
build folder contain the CMake build system.
include folder contains the C/C++ headers.
libs contains the local libraries in individual folders.
CMakeLists.txt defines the top-level CMake commands.
README.md file shall contain a brief description of the following:
- Explanation of what the project is about
- Instructions to run a sample of the code
- Desired input and output of the code

Boost library

Install Boost library from repository

 $ cd /usr/include/
 $ wget https://dl.bintray.com/boostorg/release/1.72.0/source/boost_1_72_0.tar.bz2
 $ tar --bzip2 -xf ./boost_1_72_0.tar.bz2
 $ rm -r ./boost_1_72_0.tar.bz2

When using Boost libraries whch are header-only, we only need to “include directory” in the CMake build file.

 find_package(BOOST 1.72 REQUIRED)
 add_executable(example 
     ./example.cpp
     ./example.hpp)
 target_include_directories(example PRIVATE ${Boost_INCLUDE_DIR})

When using Boost libraries which requires to be built, e.g., chrono, we need to additionally “link_libraries” in the CMake build file.

 find_package(Boost 1.72 COMPONENTS chrono REQUIRED)
 add_executable(example 
     ./example.cpp
     ./example.hpp)
 target_include_directories(example PRIVATE ${Boost_INCLUDE_DIR})
 target_link_libraries(example PRIVATE ${Boost_LIBRARIES})

Header-only libraries and libraries which must be built are listed in the official website

Profiling

Several different tools for C++ code profiling are listed below.

Perf

Install

  $ sudo apt install linux-tools-common

Profile with call graph

  $ sudo perf record -g <program executable binary> <program arguments>
  # Example
  $ sudo perf record -g ./build/src/myexecutable

Analyse perf.data file generated by profiler
```
  $ perf report
  $ perf script 
```

Gprof

Series of events
- Compile code with -pg option
- Link code with -pg option
- Run program
- Program generates gmon.out file
- Run gprof program

Add profiling flags to CMakeLists.txt file

  SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pg")
  SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -pg")
  SET(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -pg")

Execute the program normally.

  $ <program executable binary> <program arguments>
  # Example
  $ ./build/src/myexecutable

Produces a gmon.out file

Analyse and pipe data to grpof.stats file

  $ gprof <program executable binary> > gprof.stats
  # Example
  $ gprof ./build/src/myexecutable > gprof.stats

Gperftools
- Website: https://github.com/gperftools/gperftools
- Downside: Old and not maintained.

Callgrind

Install

  $ sudo apt install valgrind kcachegrind 

Profile

  $ valgrind --tool=callgrind <program executable binary> <program arguments>
  # Example
  $ valgrind --tool=callgrind ./build/src/myexecutable

Analyse
```
  $ kcachegrind calgrind.out.12345
```
Downside: Increased code execution time.

Pprof
- For visualization and analysis
- Website: https://github.com/google/pprof

Serial program

Mathematically $\int_{0}^{1}\frac{4}{1+x^2}dx\approx\pi$. This integral can be approximated as a sum of rectangles:
\[\sum_{i=0}^N\frac{4}{1+((i+0.5)\Delta x)^2}\Delta x\approx\pi\]
where each rectangle has width $\Delta x$ and height $\frac{4}{1+((i+0.5)\Delta x)^2}$ at the middle of interval $i$. Here, $N = 1/\Delta x$.

A serial C++ program to compute $\pi$ is as follows:

 double runPiSerial(int numSteps){
     double x, pi, sum = 0.0;
     double step = 1.0/double(numSteps);

     for (int ii = 0; ii < numSteps; ii++){
         x = (ii + 0.5)*step;
         sum = sum + 4.0/(1.0+x*x);
     }
     pi = step * sum;
        
     printf("computePi.runPiSerial(). Pi = %f.\n", pi);
     return pi;
 }

Parallel programming in C++ may be achieved using OpenMP, MPI, and multithreading.

Open MP

OpenMP is a multi-threading, shared address model. Threads communicate by sharing variables.
To create multiple threads or parallel regions, use compiler directive: #pragma omp parallel.
To control race conditions, use synchronization to protect data conflicts.
Synchronization
- #pragma omp barrier: Each thread waits until all threads arrive.
- #pragma omp critical: Only one thread at a time can enter a critical region. Threads wait their turn – only one at a time calls code within the block.
- #pragma omp atomic: Provides mutual exclusion but only applies to the update of a memory location.

A parallel program with synchronization to compute $\pi$ is as follows:

 double runPiParallelSync(int numSteps, int nThreadsInput){
     double x;
     double step = 1.0/double(numSteps);
     double sumMaster = 0;
     int nThreadsMaster;

     omp_set_num_threads(nThreadsInput);
     #pragma omp parallel
     {
         int id = omp_get_thread_num(); //Thread ID
         int nThreadsSlave = omp_get_num_threads();
         if (id == 0){
             nThreadsMaster = nThreadsSlave;
         }
         int ii;
         double sumSlave = 0;
         for (ii = id; ii < numSteps; ii=ii+nThreadsSlave){
             x = (ii + 0.5)*step;
             sumSlave = sumSlave + 4.0/(1.0+x*x);
         }
         #pragma omp critical
         {
             sumMaster += sumSlave;
         }
     }
     double pi = sumMaster*step;
        
     printf("computePi.runPiParallelSync(). Pi = %f.\n", pi);
     return pi;
 }

WorkSharing Loop: #pragma omp for
- The loop worksharing construct splits up loop iterations (i.e., for loops) among the threads in a team. This essentially simplifies parallelization compared to using synchronization.
- Ensure the loop iterations independent so they can safely execute in any order without loop-carried dependencies.
- Combining values from different threads into a single accumulation variable is known as reduction. In OpenMP, use the reduction clause: reduction (op : list)
- Inside a work-sharing construct:
  - A local copy of each list variable is made and initialized depending on the op (e.g. 0 for +).
  - Updates occur on the local copy.
  - Local copies are reduced into a single value and combined with the original global value.
  - The variables in list must be shared in the enclosing parallel region.

A parallel program with worksharing loop to compute $\pi$ is as follows:

 double runPiWorkSharing(int numSteps){
     double pi, sum = 0.0;
     double step = 1.0/double(numSteps);

     #pragma omp parallel 
     {
         double x;
         #pragma omp for reduction(+:sum)
         for (int ii = 0; ii < numSteps; ii++){
             x = (ii + 0.5)*step;
             sum = sum + 4.0/(1.0+x*x);
         }
     }
     pi = step * sum;
        
     printf("computePi.runPiWorkSharing(). Pi = %f.\n", pi);
     return pi;
 }

Major OpenMP constructs

To create a team of threads
```
  #pragma omp parallel
```

To share work between threads:

  #pragma omp for
  #pragma omp single
  #pragma omp sections
  #pragma omp section

To prevent conflicts (prevent races)

  #pragma omp critical
  #pragma omp atomic
  #pragma omp barrier
  #pragma omp master

Data environment clauses

  private (variable_list)
  firstprivate (variable_list)
  lastprivate (variable_list)
  reduction(+:variable_list)   

Presence of implicit barriers at the end of OMP directives.

Directive	Barrier at the end
#pragma omp parallel	implicit
#pragma omp for	implicit
#pragma omp master	none
#pragma omp single	implicit
#pragma omp task	none
#pragma omp barrier	implicit
#pragma omp for	implicit
#pragma omp sections	implicit
#pragma omp section	none

Share on

Twitter Facebook LinkedIn