Code Style

  1. Always remember that code interpretability and maintainability trumps everything else.
  2. Please conform to the Google C++ Style Guide.
  3. Avoid using Class Inheritance or Friendship features in C++ codes. Inheritance and Friendship reduces code traceability.
  4. In general macros should not be used.
  5. Wherever possible, follow C style instead of C++, as it makes the code more aligned with the driving philosophies of Golang.


  • C++ files should end in .cc and header files should end in .h.
  • Naming guide:

    Type Format Example
    Filename lowercase.cc myusefulclass.cc
    Namespace Names lower_with_under using namespace websearch::index_util
    Typedef CapWords typedef struct _MyClass MyClass;
    Enumerator CapWords enum Color {red, yellow};
    Aliases CapWords using Calc = void (*) (int, int);
    Constants kCapWords const int kDaysInWeek = 7;
    Variables lower_with_under std::string table_name;
    Function Names CapWords void MyFunction(){};
    Function Parameters lower_with_under void my_function(int max_count){};
    Structures CapWords struct UrlTableProperties{};
    Struct Data Member lower_with_under int num_entries;
    Classes CapWords class UrlTableTester{};
    Class Data Member lower_with_under_ std::string table_name_;

Code Documentation - Doxygen

  1. Install Doxygen
     $ apt-get install doxygen

Folder structure

The following serves as a representative project folder structure.

cppprocessing                  # Main project directory 
├── apps                       # Folder for executable main files
|   ├── CMakeLists.txt            
|   └── tutorial.cpp           # Executable main file 
├── assets                     # Assets
|   ├── ...
|   ... 
├── bin                        # Binary files
├── build                      # CMake build system files 
├── include                    # Headers
|   ├── computepi.h
|   └── helloworld.h   
├── libs                       # Local libraries
|   ├── computepi
|   |   ├── CMakeLists.txt
|   |   └── computepi.cpp 
|   └── helloworld    
|       ├── CMakeLists.txt
|       └── computepi.cpp 
├── CMakeLists.txt 
├── docker-compose.yml
├── Dockerfile                            
└── README.md                  # Readme file
  1. apps folder may contain one or more main files.
  2. assets folder should contain miscellaneous files. For example, it can contain images used for explanation in Readme.md.
  3. bin folder may contain one or more binary files, which correspond to the main files in apps folder.
  4. build folder contain the CMake build system.
  5. include folder contains the C/C++ headers.
  6. libs contains the local libraries in individual folders.
  7. CMakeLists.txt defines the top-level CMake commands.
  8. README.md file shall contain a brief description of the following:
    • Explanation of what the project is about
    • Instructions to run a sample of the code
    • Desired input and output of the code

Boost library

  1. Install Boost library from repository
     $ cd /usr/include/
     $ wget https://dl.bintray.com/boostorg/release/1.72.0/source/boost_1_72_0.tar.bz2
     $ tar --bzip2 -xf ./boost_1_72_0.tar.bz2
     $ rm -r ./boost_1_72_0.tar.bz2
  2. When using Boost libraries whch are header-only, we only need to “include directory” in the CMake build file.
     find_package(BOOST 1.72 REQUIRED)
     target_include_directories(example PRIVATE ${Boost_INCLUDE_DIR})

    When using Boost libraries which requires to be built, e.g., chrono, we need to additionally “link_libraries” in the CMake build file.

     find_package(Boost 1.72 COMPONENTS chrono REQUIRED)
     target_include_directories(example PRIVATE ${Boost_INCLUDE_DIR})
     target_link_libraries(example PRIVATE ${Boost_LIBRARIES})

    Header-only libraries and libraries which must be built are listed in the official website


Several different tools for C++ code profiling are listed below.

  1. Perf
    • Install
        $ sudo apt install linux-tools-common
    • Profile with call graph
        $ sudo perf record -g <program executable binary> <program arguments>
        # Example
        $ sudo perf record -g ./build/src/myexecutable
    • Analyse perf.data file generated by profiler
        $ perf report
        $ perf script 
  2. Gprof
    • Series of events
      • Compile code with -pg option
      • Link code with -pg option
      • Run program
      • Program generates gmon.out file
      • Run gprof program
    • Add profiling flags to CMakeLists.txt file
    • Execute the program normally.
        $ <program executable binary> <program arguments>
        # Example
        $ ./build/src/myexecutable
    • Produces a gmon.out file
    • Analyse and pipe data to grpof.stats file
        $ gprof <program executable binary> > gprof.stats
        # Example
        $ gprof ./build/src/myexecutable > gprof.stats
  3. Gperftools
    • Website: https://github.com/gperftools/gperftools
    • Downside: Old and not maintained.
  4. Callgrind
    • Install
        $ sudo apt install valgrind kcachegrind 
    • Profile
        $ valgrind --tool=callgrind <program executable binary> <program arguments>
        # Example
        $ valgrind --tool=callgrind ./build/src/myexecutable
    • Analyse
        $ kcachegrind calgrind.out.12345
    • Downside: Increased code execution time.
  5. Pprof
    • For visualization and analysis
    • Website: https://github.com/google/pprof

Serial program

  1. Mathematically \(\int_{0}^{1}\frac{4}{1+x^2}dx\approx\pi\). This integral can be approximated as a sum of rectangles:

    \[\sum_{i=0}^N\frac{4}{1+((i+0.5)\Delta x)^2}\Delta x\approx\pi\]

    where each rectangle has width \(\Delta x\) and height \(\frac{4}{1+((i+0.5)\Delta x)^2}\) at the middle of interval \(i\). Here, \(N = 1/\Delta x\).

  2. A serial C++ program to compute \(\pi\) is as follows:

     double runPiSerial(int numSteps){
         double x, pi, sum = 0.0;
         double step = 1.0/double(numSteps);
         for (int ii = 0; ii < numSteps; ii++){
             x = (ii + 0.5)*step;
             sum = sum + 4.0/(1.0+x*x);
         pi = step * sum;
         printf("computePi.runPiSerial(). Pi = %f.\n", pi);
         return pi;

Parallel programming in C++ may be achieved using OpenMP, MPI, and multithreading.

Open MP

  1. OpenMP is a multi-threading, shared address model. Threads communicate by sharing variables.
  2. To create multiple threads or parallel regions, use compiler directive: #pragma omp parallel.
  3. To control race conditions, use synchronization to protect data conflicts.
  4. Synchronization
    • #pragma omp barrier: Each thread waits until all threads arrive.
    • #pragma omp critical: Only one thread at a time can enter a critical region. Threads wait their turn – only one at a time calls code within the block.
    • #pragma omp atomic: Provides mutual exclusion but only applies to the update of a memory location.
  5. A parallel program with synchronization to compute \(\pi\) is as follows:
     double runPiParallelSync(int numSteps, int nThreadsInput){
         double x;
         double step = 1.0/double(numSteps);
         double sumMaster = 0;
         int nThreadsMaster;
         #pragma omp parallel
             int id = omp_get_thread_num(); //Thread ID
             int nThreadsSlave = omp_get_num_threads();
             if (id == 0){
                 nThreadsMaster = nThreadsSlave;
             int ii;
             double sumSlave = 0;
             for (ii = id; ii < numSteps; ii=ii+nThreadsSlave){
                 x = (ii + 0.5)*step;
                 sumSlave = sumSlave + 4.0/(1.0+x*x);
             #pragma omp critical
                 sumMaster += sumSlave;
         double pi = sumMaster*step;
         printf("computePi.runPiParallelSync(). Pi = %f.\n", pi);
         return pi;
  6. WorkSharing Loop: #pragma omp for
    • The loop worksharing construct splits up loop iterations (i.e., for loops) among the threads in a team. This essentially simplifies parallelization compared to using synchronization.
    • Ensure the loop iterations independent so they can safely execute in any order without loop-carried dependencies.
    • Combining values from different threads into a single accumulation variable is known as reduction. In OpenMP, use the reduction clause: reduction (op : list)
    • Inside a work-sharing construct:
      • A local copy of each list variable is made and initialized depending on the op (e.g. 0 for +).
      • Updates occur on the local copy.
      • Local copies are reduced into a single value and combined with the original global value.
      • The variables in list must be shared in the enclosing parallel region.
  7. A parallel program with worksharing loop to compute \(\pi\) is as follows:
     double runPiWorkSharing(int numSteps){
         double pi, sum = 0.0;
         double step = 1.0/double(numSteps);
         #pragma omp parallel 
             double x;
             #pragma omp for reduction(+:sum)
             for (int ii = 0; ii < numSteps; ii++){
                 x = (ii + 0.5)*step;
                 sum = sum + 4.0/(1.0+x*x);
         pi = step * sum;
         printf("computePi.runPiWorkSharing(). Pi = %f.\n", pi);
         return pi;
  8. Major OpenMP constructs
    • To create a team of threads
        #pragma omp parallel
    • To share work between threads:
        #pragma omp for
        #pragma omp single
        #pragma omp sections
        #pragma omp section
    • To prevent conflicts (prevent races)
        #pragma omp critical
        #pragma omp atomic
        #pragma omp barrier
        #pragma omp master
    • Data environment clauses
        private (variable_list)
        firstprivate (variable_list)
        lastprivate (variable_list)
  9. Presence of implicit barriers at the end of OMP directives.
    Directive Barrier at the end
    #pragma omp parallel implicit
    #pragma omp for implicit
    #pragma omp master none
    #pragma omp single implicit
    #pragma omp task none
    #pragma omp barrier implicit
    #pragma omp for implicit
    #pragma omp sections implicit
    #pragma omp section none

Leave a comment