Python
Code style and linting
- Please conform to the Google Python Style Guide.
- A good code formatter is
autopep8
.autopep8
automatically formats Python code to conform to the PEP 8 style guide. Installautopep8
via pip:$ pip install --upgrade autopep8
Naming
- Python filenames must have a
.py
extension. - Avoid dashes (-) in any file/package/module/definition name.
- While Python supports making things private by using a leading double underscore __ (aka. “dunder”) prefix on a name, this is discouraged. Prefer the use of a single underscore.
-
Naming guide:
Type Public Internal Packages lower_with_under Modules lower_with_under _lower_with_under Classes CapWords _CapWords Exceptions CapWords Functions lower_with_under() _lower_with_under() Global/Class Constants CAPS_WITH_UNDER _CAPS_WITH_UNDER Global/Class Variables lower_with_under _lower_with_under Instance Variables lower_with_under _lower_with_under Method Names lower_with_under() _lower_with_under() Function/Method Parameters lower_with_under Local Variables lower_with_under
Machine learning folder structure
The following serves as a representative project folder structure.
emotion-recognition # Repository name / Project root
├── assets # Assets
| ├── images
| ...
| ...
├── dataset # Dataset
| ├── fer2013.csv
| └── happy-1.jpg
├── doc # Documents
| ├── lib # Library packages
| ...
| ...
├── example # Standalone example code
| └── emoRecStream.py
├── dev # Machine learning model folder
| ├── analysis # Package - Analysis
| | └── predictions.py # Module - Predictions
| ├── checkpoints # Checkpoints
| | ├── emotion_recognition_weights.h5 # Machine learning saved weights
| | └── emotion_recognition_structure.json # Machine learning saved structure
| ├── helper # Package - Helper
| | └── convert.py # Module - Convert
| └── emoRec.py # Code used during model development
├── lib # Libraries
| ├── emonetLabels.json # Labels
| ├── haarcascade_frontalface_default.xml # .xml file for ML models
| └── preprocess.py # Pre-processing functions
├── graph # TensorFlow graph folder
| └── cnn.py # Graph architecture
├── serving # TensorFlow Serving
| ├── cnn
| ├── ...
| ...
├── .env # Environment variables
├── README.md # Readme file
└── requirements.txt # Code dependencies
- [Non-Production Code]
assets
folder should contain miscellaneous files. For example, it can contain images used for explanation inReadme.md
. - [Non-Production Code]
dataset
folder should contain minimal amount of sample test data used in the project for testing and demonstration purposes. Complete dataset is advised to be stored in an external Hadoop cluster. - [Production Code]
doc
folder contains the documents generated from Python code comments. For details of how the documents are generated, refer to the code documentation steps. - [Non-Production Code]
example
folder contains a complete, runnable, well-abstracted Python code to illustrate the entire machine learning model. For example,emoRecStream.py
is a complete standalone example to analyse video stream. - [Non-Production Code]
dev
folder contains all peripheral code used in developing and testing the machine learning tensorflow graph.- Contains code files (i.e.,
emoRec.py
) used during training, testing, and building phase of the machine learning model. - Local code should be abstracted into packages and modules. For example,
analysis
andhelper
are local packages, whereaspredictions.py
andconvert.py
are local modules. - Package and module naming should be intuitive and non-repetitive. For example, reading an import statement in Python, such as
import model.analysis.predictions
, should clearly indicate the meaning or functionality of the code being imported. - A
checkpoints
sub-folder is desirable to keep track of previously trained architecture and weights.
- Contains code files (i.e.,
- [Production Code]
lib
folder is a top level folder containing well abstracted local (i.e., self-written) and third party libraries, which will be ported into production. - [Production Code]
graph
folder is a top level folder containing well abstracted and commented machine learning TensorFlow graph written in Python. The graph (e.g.,cnn.py
) is converted intoSavedModel
format and saved with an identical name in theserving
folder (e.g.,cnn
). Although the contents of this folder are not directly used in production, the Python TensorFlow graphs are needed to understand the correspondingserving
which was actually deployed. - [Production Code]
serving
folder. Please seeTensorFlow
wiki to learn more about the structure ofserving
folder. Theserving
model will be deployed using Docker containers in production. - [Production Code]
.env
file should contain the environment variables, e.g.,ROOT
variable, which will be ported into production. - [Non-Production Code]
README.md
file shall contain a brief description of the following:- Explanation of what the project is about
- Instructions to run a sample of the code
- Desired input and output of the machine learning model
- [Production Code]
requirements
file contains the dependent libraries and their version require to run this code. For details on how therequirements.txt
file is generated, refer to the dependency management section.
Note: All Production Code needs to undergo code review before being merged into the master branch. Whereas, coding standards of Non-Production Code need not necessarily be scrutinised.
Documentation
- Variables, functions, and methods, which are only meant for local use within a library, must be made private and non-exportable. Add “_” (single underscore) in front of the name to hide them when accessing them from out of class. For example,
_x
anddef _normFace(self, img, face):
represent a hidden variable and a hidden method, respectively. - Hidden variables, functions, and methods, will not be included in the documentation.
- Write docstrings in
numpy
format to document the Python code. - Docstrings must be written for
- Classes, and class methods
- Packages, and functions
- Scripts
- To do notes are written as part of docstrings.
""" TODO ---- Blah blah. """
- Documents are stored in the
./doc
directory at the root of the project repository. - Sphinx is the desired tool to generate code documentation.
- Install Sphinx
$ pip install sphinx $ pip install sphinx-rtd-theme
- Make doc directory
$ cd /path/to/project/root $ mkdir doc
- Setup Sphinx
$ sphinx-quickstart [options] path/to/project/root/doc [options] -q QUIET. Skips interactive wizard -p PROJECT Project name -a AUTHOR Author name -v VERSION Version of project --ext-autodoc Enable sphinx.ext.autodoc extension. --ext-todo Enable sphinx.ext.todo extension. --ext-coverage Enable sphinx.ext.coverage extension. --ext-mathjax Enable sphinx.ext.mathjax extension. --ext-viewcode Enable sphinx.ext.viewcode extension. --extensions='sphinx.ext.napoleon' Napoleon supports Google and NumPy styled docstrings.
For example:
$ cd /path/to/project/root $ sphinx-quickstart \ -q -p Emotion-Recognition -a Adaickalavan \ -v v0.0.1 \ --ext-autodoc \ --ext-todo \ --ext-coverage \ --ext-mathjax \ --ext-viewcode \ --extensions='sphinx.ext.napoleon' \ ./doc
- Change the following in
/doc/conf.py
file... ... # import os # import sys # sys.path.insert(0, os.path.abspath('.')) ... ... html_theme = 'alabaster' ... ...
to
... ... import os import sys sys.path.insert(0, os.path.abspath('..')) ... ... html_theme = 'sphinx_rtd_theme' ... ...
- Populate your
/doc/index.rst
file as needed. An exampleindex.rst
file for the Emotion Recognition template project is as follows.Welcome to Emotion-Recognition's documentation! =============================================== .. toctree:: :maxdepth: 2 :caption: Contents: .. automodule:: lib.preprocess :members: .. automodule:: lib.util :members: Example ======= .. automodule:: example.emoRecStream Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`
- Generate documents.
$ cd /path/to/project/root $ make -C ./doc html
- View the Sphinx documentation by browsing
/doc/_build/html/index.html
in your web browser.
- Install Sphinx
- Alternatively, for quick and easy document generation, pdoc3 tool may be used.
- Execute the following commands at the project’s root directory.
$ pip install pdoc3 $ cd /path/to/project/root $ mkdir doc1 $ pdoc --html --force --output-dir doc1 ./lib
- View the pdoc3 documentation by browsing
/doc1/lib/index.html
in your web browser.
- Execute the following commands at the project’s root directory.
Import paths
- Strictly do not perform relative import for any files or modules in Python.
- Never perform wildcard imports such as
from libraries import *
- Always perform absolute import.
- Assume the following project structure.
emotion-recognition # Main project directory ├── model # Machine learning model folder | ├── analysis # Package | | └── predictions.py # Module | ├── checkpoint # Checkpoint folder | | └── emotion_recognition_weights.h5 # Machine learning saved weights | └── emoRec.py # Python code ├── graph # TensorFlow graph folder | └── cnn.py # Graph architecture └── .env # Environment variables
- First, install the
dotenv
library.pip install -U python-dotenv
- Specify the
ROOT
environment variable, which refers to the directory path containing the project, in the.env
file as follows.# File: emotion-recognition/.env ROOT = /home/admin/src/github.com/emotion-recognition
- Place the following piece of code at the top of the Python code file. It will add the project directory given by
ROOT
tosys.path
and make it searchable by Python.# File: emotion-recognition/model/emoRec.py # Setup import os import sys from dotenv import load_dotenv, find_dotenv load_dotenv(find_dotenv()) ROOT = os.getenv("ROOT") sys.path.append(ROOT)
- Import local packages or modules as follows.
# File: emotion-recognition/model/emoRec.py # Import local packages from graph import cnn from model.analysis import predictions
- To open files in Python, create absolute paths by adding the file path within the project directory and the
ROOT
. An example is as follows.# File: emotion-recognition/model/emoRec.py model.load_weights(ROOT+"/model/checkpoint/emotion_recognition_weights.h5") # Load weights for TensorFlow model
- Assume the following project structure.
- Only import Python packages and modules. Never import Python functions, i.e.,
def
, directly into another Python file.
Dependencies
- To generate
requirements.txt
file for Python dependencies.pipreqs [options] /path/to/project/root [options] --force to overwrite existing file --proxy <url> when using behind a corporate proxy
Example command assuming the project is located at
home/admin/src/github.com/scalable-deployment/tfsemonet
.$ pipreqs --force /home/admin/src/github.com/scalable-deployment/tfsemonet
- To install dependencies, issue the following command.
$ pip install -r /path/to/requirements.txt
Profiling
- For profiling, install the following libraries.
$ pip install cProfile $ pip install pstats $ pip install line_profiler
- Include the following
profiler.py
file in your project.# Filename: profiler.py import os from dotenv import load_dotenv, find_dotenv load_dotenv(find_dotenv()) ROOT = os.getenv("ROOT") import cProfile import pstats import line_profiler import atexit profile_line = line_profiler.LineProfiler() atexit.register(profile_line.dump_stats, ROOT+"results/profile_line.prof") stream = open(ROOT+"results/profile_line.txt", 'w') atexit.register(profile_line.print_stats, stream) # Function profiling decorator def profile_function(func): def _profile_function(*args, **kwargs): pr = cProfile.Profile() pr.enable() result = func(*args, **kwargs) pr.disable() # save stats into file pr.dump_stats(ROOT+"results/profile_function.prof") stream = open(ROOT+"results/profile_function.txt", 'w') ps = pstats.Stats(ROOT+"results/profile_function.prof", stream=stream) ps.sort_stats('tottime') ps.print_stats() return result return _profile_function
- For profiling at function-call level, we use
cProfile
. Add the decorator@profile_function
above the function to be profiled. For line-by-line profiling, we useline_profiler
. Add the decorator@profile_line
above the function to be profiled.# Filename: main.py import profiler @profiler.profile_function def function_to_be_profiled_at_function_call_level(): ... @profiler.profile_line def function_to_be_profiled_at_line_by_line_level(): ... if __name__ == '__main__': function_to_be_profiled_at_function_call_level() function_to_be_profiled_at_line_by_line_level()
- Then, execute the python file normally. For example
$ python main.py
The function-level profile results will be written to
./results/profile_function.txt
and./results/profile_function.prof
files. The line-level profile results will be written to./results/profile_line.txt
and./results/profile_line.prof
files. - To view the function-level profile output interactively in the browser, install
cprofilev
and read theprofile_function.prof
file using cprofilev.$ pip install cprofilev $ cprofilev -f /path/to/profile_function.prof
Navigate to http://localhost:4000 to view the profile output.
- Note: If you want to profile several functions, only instantiate once the
LineProfiler()
and import it in the other files. Otherwise, profiler output might have some issues and have weird reporting.
Proxy
- Setup proxy configurations for conda/pip when using conda/pip behind a corporate proxy, e.g.,
http://10.0.0.0:8080/
. - For conda, run the following commands in a terminal
$ conda config --set proxy_servers.http http://10.0.0.0:8080/ $ conda config --set proxy_servers.https https://10.0.0.0:8080/
A
.condarc
file will be created with the proxy server details and placed at~/.condarc
. - For pip, create a
pip.conf
file at~/.pip/
such that~/.pip/pip.conf
file contains the following[global] proxy = http://10.0.0.0:8080/
Unit test
- For info see : https://realpython.com/python-testing/
- Run unit tests in python
$ cd /path/to/source/code/root $ python -m unittest discover -s <test folder>
unittest
will scan the<test folder>
for alltest*.py
files and execute them.
Leave a comment