Efficient Python for Data Scientists #1: Write Python Clean Code Using These 3 Principles
A Brief Guide To Write Python Clean Code
Writing clean code is an essential skill for every programmer, and it’s not as easy as you might think. Even experienced coders struggle to write clean code, and it often feels like a constant battle to keep things tidy and organized. But how do you go about doing that?
Clean code is substantially more than just removing all your commented lines or keeping the length of your functions to a minimum. It’s about making your code readable so that any other coder coming to your project in the future will know exactly what you meant with a given piece of code without having to dig through comments or documentation.
There are lots of principles, techniques, and best practices we can follow to write Python clean code. Given below are some tips that will help you get started and make the process easier the next time you write code again.
Table of contents:
Characteristics of High-Quality Production Code
Naming Convection
2.1. Variables
2.2. Functions
2.3. ClassesUsing Nice white space
3.1. Indentation
3.2. Maximum Line Length
3.3. Blank LinesComments & Documentation
4.1. In-line Comments
4.2. Docstrings
4.3. DocumentationReferences
Get All My Books, One Button Away With 40% Off
I have created a bundle for my books and roadmaps, so you can buy everything with just one button and for 40% less than the original price. The bundle features 8 eBooks, including:
1. Characteristics of High-Quality Production Code
In any software project, the code is one of the most important assets. The final production code must be clean and easy to understand in order to facilitate its maintenance.
Reusing parts of code, modularity, and object orientation are some of the techniques used to produce high-quality code.
In this section, I describe several characteristics that help identify high-quality production code.
These characteristics may not seem important at first glance, but they have a major impact on how efficiently developers can work with your project’s source code. Let’s take a look!
1. Production Code: software running on production servers to handle live users and data of the intended audience. Note this is different from production quality code, which describes code that meets expectations in reliability, efficiency, etc., for production. Ideally, all code in production meets these expectations, but this is not always the case.
2. Clean: readable, simple, and concise. A characteristic of production quality code that is crucial for collaboration and maintainability in software development. Clean code is a very important characteristic of high-quality production, and writing clean code will lead to:
Focused Code: Each function, class, or module should do one thing and do it well.
Easy to read code: According to Grady Booch, author of Object-Oriented Analysis and Design with Applications, clean code reads like well-written prose.
Easy to debug code: Clean code can be easily debugged and fix its errors as it is easy to read and follow.
Easy to maintain: That is, it can easily be read and enhanced by other developers.
3. Modular Code: logically broken up into functions and modules. Also, an important characteristic of production-quality code is that it makes your code more organized, efficient, and reusable. Modules allow code to be reused by encapsulating it into files that can be imported into other files.
4. Refactoring: Restructuring your code to improve its internal structure, without changing its external functionality. This gives you a chance to clean and modularize your program after you’ve got it working. Since it isn’t easy to write your best code while you’re still trying to just get it working, allocating time to do this is essential to producing high-quality code. Despite the initial time and effort required, this pays off by speeding up your development time in the long run.
So it is normal that at first, you write a code that works, then after that, you refactor it and make it clean. You become a much stronger programmer when you’re constantly looking to improve your code. The more you refactor, the easier it will be to structure and write good code the first time.
2. Naming Convection
Naming conventions are one of the most useful and important aspects of writing clean code. When naming variables, functions, classes, etc, you should use meaningful names that are descriptive and clear. And this means we would favor long descriptive names over short ambiguous names.
First, let’s start with the PEP 8 naming conventions:
class names should be CamelCase (
MyClass
)variable names should be snake_case and all lowercase (
first_name
)function names should be snake_case and all lowercase (
quick_sort()
)constants should be snake_case and all uppercase (
PI = 3.14159
)modules should have short, snake_case names and all lowercase (
numpy
)Single quotes and double quotes are treated the same (just pick one and be consistent)
Here is a more detailed guide on how to give descriptive and good naming conventions:
2.1. Variables
Use long descriptive names that are easy to read: This is very important to make the names easy and descriptive, and can be understood on their own. This will make it necessary to write comments:
# Not recommended
# The au variable is the number of active users
au = 105
# Recommended
total_active_users = 105
Use descriptive intention-revealing types: Your coworkers and developers should be able to figure out what your variable type is and what it stores from the name. In a nutshell, your code should be easy to read and reason about.
# Not recommended
c = [“UK”, “USA”, “UAE”]
for x in c:
print(x)
# Recommended
cities_list = [“UK”, “USA”, “UAE”]
for city in cities_list:
print(city)
Always use the same vocabulary: Be consistent with your naming convention. Maintaining a consistent naming convention is important to eliminate confusion when other developers work on your code. And this applies to naming variables, files, functions, and even directory structures.
# Not recommended
client_first_name = ‘John’
customer_last_name = ‘Doe;
# Recommended
client_first_name = ‘John’
client_last_name = ‘Doe’
# Another example:
# bad code
def fetch_clients(response, variable):
# do something
pass
def fetch_posts(res, var):
# do something
pass
# Recommended
def fetch_clients(response, variable):
# do something
pass
def fetch_posts(response, variable):
# do something
pass
Don’t use magic numbers. Magic numbers are numbers with special, hardcoded semantics that appear in code but do not have any meaning or explanation. Usually, these numbers appear as literals in more than one location in our code.
import random
# Not recommended
def roll_dice():
return random.randint(0, 4) # what is 4 supposed to represent?
# Recommended
DICE_SIDES = 4
def roll_dice():
return random.randint(0, DICE_SIDES)
2.2. Functions
5. Long names != descriptive names — You should be descriptive, but only with relevant information. For example, good function names describe what they do well without including details about implementation or highly specific uses.
DICE_SIDES = 4
# Not recommended
def roll_dice_using_randint():
return random.randint(0, DICE_SIDES)
# Recommended
def roll_dice():
return random.randint(0, DICE_SIDES)
6. Be consistent with your function naming convention: As seen with the variables above, stick to a naming convention when naming functions. Using different naming conventions would confuse other developers and colleagues.
# Not recommended
def fetch_user(id):
# do something
Pass
def get_post(id):
# do something
pass
# Recommended
def fetch_user(id):
# do something
Pass
def fetch_post(id):
# do something
pass
7. Do not use flags or Boolean flags. Boolean flags are variables that hold a Boolean value — true or false. These flags are passed to a function and are used by the function to determine its behavior.
text = "Python is a simple and elegant programming language."
# Not recommended
def transform_text(text, uppercase):
if uppercase:
return text.upper()
else:
return text.lower()
uppercase_text = transform_text(text, True)
lowercase_text = transform_text(text, False)
# Recommended
def transform_to_uppercase(text):
return text.upper()
def transform_to_lowercase(text):
return text.lower()
uppercase_text = transform_to_uppercase(text)
lowercase_text = transform_to_lowercase(text)
2.3. Classes
8. Do not add redundant context. This can occur by adding unnecessary variables to variable names when working with classes.
# Not recommended
class Person:
def __init__(self, person_username, person_email, person_phone, person_address):
self.person_username = person_username
self.person_email = person_email
self.person_phone = person_phone
self.person_address = person_address
# Recommended
class Person:
def __init__(self, username, email, phone, address):
self.username = username
self.email = email
self.phone = phone
self.address = address
3. Using Nice white space
3.1. Indentation
Organize your code with consistent indentation. The standard is to use 4 spaces for each indent. You can make this a default in your text editor. When using a hanging indent, the following should be considered: there should be no arguments on the first line, and further indentation should be used to clearly distinguish it as a continuation line:
# Correct:
# Aligned with opening delimiter.
foo = long_function_name(var_one, var_two,
var_three, var_four)
# Add 4 spaces (an extra level of indentation) to distinguish arguments from the rest.
def long_function_name(
var_one, var_two, var_three,
var_four):
print(var_one)
# Hanging indents should add a level.
foo = long_function_name(
var_one, var_two,
var_three, var_four)
# Wrong:
# Arguments on first line forbidden when not using vertical alignment.
foo = long_function_name(var_one, var_two,
var_three, var_four)
# Further indentation required as indentation is not distinguishable.
def long_function_name(
var_one, var_two, var_three,
var_four):
print(var_one)
3.2. Maximum Line Length
Try to limit your lines to around 79 characters, which is the guideline given in the PEP 8 style guide. In many good text editors, there is a setting to display a subtle line that indicates where the 79-character limit is.
3.3. Blank Lines
Adding blank lines to your code will make it better, cleaner, and easier to follow. Here is a simple guide on how to add blank lines to your code:
Surround top-level function and class definitions with two blank lines.
Method definitions inside a class are surrounded by a single blank line.
Extra blank lines may be used (sparingly) to separate groups of related functions. Blank lines may be omitted between a bunch of related one-liners (e.g., a set of dummy implementations).
Use blank lines in functions, sparingly, to indicate logical sections.
4. Comments & Documentation
No matter how hard we try to write clean code, there are still going to be parts of our program that need additional explanation. Comments allow us to quickly tell other developers (and our future selves) why we wrote it in the manner that we did. However, be careful that too many comments can make your code messier than it would be without them.
4.1. In-line Comments
In-line comments are text following hash symbols throughout your code. They are used to explain parts of your code, and really help future contributors understand your work.
One way comments are used is to document the major steps of complex code to help readers follow. Then, you may not have to understand the code to follow what it does. However, others would argue that this is using comments to justify bad code and that if code requires comments to follow, it is a sign that refactoring is needed.
Comments are valuable for explaining when the code cannot explain why it was written like this or why certain values were selected. For example, the history behind why a certain method was implemented in a specific way. Sometimes, an unconventional or seemingly arbitrary approach may be applied because of some obscure external variable causing side effects. These things are difficult to explain with code.
Here are some tips to write good comments:
1. Don’t comment on bad code; rewrite it
Commenting on bad code will only help you in the short term. Sooner or later, one of your colleagues will have to work with your code, and they’ll end up rewriting it after spending multiple hours trying to figure out what it does. Therefore, it is better to rewrite the bad code from the beginning instead of just commenting on it.
2. Do not add comments when there is no need to
If your code is readable enough, you don’t need comments. Adding useless comments will only make your code less readable. Here’s a bad example:
# This checks if the user with the given ID doesn't exist.
if not User.objects.filter(id=user_id).exists():
return Response({
'detail': 'The user with this ID does not exist.',
})
As a general rule, if you need to add comments, they should explain why you did something rather than what is happening.
3. Don’t leave commented-out, outdated code
The worst thing you can do is to leave code comments out in your programs. All the debug code or debug messages should be removed before pushing to a version control system; otherwise, your colleagues will be scared of deleting it, and your commented code will stay there forever.
4.2. Docstrings
Docstrings, or documentation strings, are valuable pieces of documentation that explain the functionality of any function or module in your code. Ideally, each of your functions should always have a docstring. Docstrings are surrounded by triple quotes.
The first line of the docstring is a brief explanation of the function’s purpose. The next element of a docstring is an explanation of the function’s arguments. Here you list the arguments, state their purpose, and state what types the arguments they should be.
Finally, it is common to provide some description of the output of the function. Every piece of the docstring is optional; however, docstrings are a part of good coding practice.
Below are two examples of docstrings for a function. The first one will use a single-line docstring, and in the second one, we will use multiple-line docstrings:
def population_density(population, land_area):
"""Calculate the population density of an area."""
return population / land_area
def population_density(population, land_area):
"""Calculate the population density of an area.
Args:
population: int. The population of the area
land_area: int or float. This function is unit-agnostic, if you pass in values in terms of square km or square miles the function will return a density in those units.
Returns:
population_density: population/land_area. The population density of a
particular area.
"""
return population / land_area
4.3. Documentation
Project documentation is essential for getting others to understand why and how your code is relevant to them, whether they are potential users of your project or developers who may contribute to your code.
A great first step in project documentation is your README file. It will often be the first interaction most users will have with your project. Whether it’s an application or a package, your project should absolutely come with a README file.
At a minimum, this should explain what it does, list its dependencies, and provide sufficiently detailed instructions on how to use it. You want to make it as simple as possible for others to understand the purpose of your project and quickly get something working.
Translating all your ideas and thoughts formally on paper can be a little difficult, but you’ll get better over time, and it makes a significant difference in helping others realize the value of your project.
Writing this documentation can also help you improve the design of your code, as you’re forced to think through your design decisions more thoroughly. This also allows future contributors to know how to follow your original intentions.
5. References
This newsletter is a personal passion project, and your support helps keep it alive. If you would like to contribute, there are a few great ways:
Subscribe. A paid subscription helps to make my writing sustainable and gives you access to additional content.*
Grab a copy of my book Bundle. Get my 7 hands-on books and roadmaps for only 40% of the price
Thanks for reading, and for helping support independent writing and research!