Refactoring

Widyanto H Nugroho
6 min readMay 10, 2022

Do you want simpler Python code? You always start a project with the best intentions, a clean codebase, and a nice structure. But over time, there are changes to your apps, and things can get a little messy.

If you can write and maintain clean, simple Python code, then it’ll save you lots of time in the long term. You can spend less time testing, finding bugs, and making changes when your code is well laid out and simple to follow.

What is Refactoring?

Refactoring is a disciplined technique for structuring existing code by changing its internal structure without changing its external behavior.

Refactoring is not synonymous with the following terms:

  • rewriting code
  • fixing bugs
  • improve observable aspects of software such as its interface

Code Complexity in Python

The complexity of an application and its codebase is relative to the task it’s performing. If you’re writing code for NASA’s jet propulsion laboratory (literally rocket science), then it’s going to be complicated.

The question isn’t so much, “Is my code complicated?” as, “Is my code more complicated than it needs to be?”

In this article, I would like to review 10 idiomatic cases that you can consider using to refactor your Python code.

Metrics for Measuring Complexity

Much time and research have been put into analyzing the complexity of computer software. Overly complex and unmaintainable applications can have a very real cost.

The complexity of software correlates to the quality. Code that is easy to read and understand is more likely to be updated by developers in the future.

Here are some metrics for programming languages. They apply to many languages, not just Python.

Lines of Code

LOC, or Lines of Code, is the crudest measure of complexity. It is debatable whether there is any direct correlation between the lines of code and the complexity of an application, but the indirect correlation is clear. After all, a program with 5 lines is likely simpler than one with 5 million.

When looking at Python metrics, we try to ignore blank lines and lines containing comments.

Why are lines of code used to quantify the amount of code in your application? The assumption is that a line of code roughly equates to a statement. Lines is a better measure than characters, which would include whitespace.

In Python, we are encouraged to put a single statement on each line. This example is 9 lines of code:

x = 5
value = input("Enter a number: ")
y = int(value)
if x < y:
print(f"{x} is less than {y}")
elif x == y:
print(f"{x} is equal to {y}")
else:
print(f"{x} is more than {y}")

If you used only lines of code as your measure of complexity, it could encourage the wrong behaviors.

Python code should be easy to read and understand. Taking that last example, you could reduce the number of lines of code to 3:

x = 5; y = int(input("Enter a number:"))
equality = "is equal to" if x == y else "is less than" if x < y else "is more than"
print(f"{x} {equality} {y}")

But the result is hard to read, and PEP 8 has guidelines around maximum line length and line breaking. You can check out How to Write Beautiful Python Code With PEP 8 for more on PEP 8.

Cyclomatic Complexity

Cyclomatic complexity is the measure of how many independent code paths there are through your application. A path is a sequence of statements that the interpreter can follow to get to the end of the application.

Let’s explore this example:

x = 1

There is only 1 way this code can be executed, so it has a cyclomatic complexity of 1.

If we add a decision, or branch to the code as an if statement, it increases the complexity:

x = 1
if x < 2:
x += 1

Even though there is only 1 way this code can be executed, as x is a constant, this has a cyclomatic complexity of 2. All of the cyclomatic complexity analyzers will treat an if statement as a branch.

This is also an example of overly complex code. The if statement is useless as x has a fixed value. You could simply refactor this example to the following:

x = 2

That was a toy example, so let’s explore something a little more real.

main() has a cyclomatic complexity of 5. I’ll comment each branch in the code so you can see where they are:

# cyclomatic_example.py
import sys
def main():
if len(sys.argv) > 1: # 1
filepath = sys.argv[1]
else:
print("Provide a file path")
exit(1)
if filepath: # 2
with open(filepath) as fp: # 3
for line in fp.readlines(): # 4
if line != "\n": # 5
print(line, end="")
if __name__ == "__main__": # Ignored.
main()

There are certainly ways that code can be refactored into a far simpler alternative. We’ll get to that later.

In the following examples, we will use the radon library from PyPI to calculate metrics. You can install it now:

$ pip install radon

To calculate cyclomatic complexity using radon, you can save the example into a file called cyclomatic_example.py and use radon from the command line.

The radon command takes 2 main arguments:

  1. The type of analysis (cc for cyclomatic complexity)
  2. A path to the file or folder to analyze

Execute the radon command with the cc analysis against the cyclomatic_example.py file. Adding -s will give the cyclomatic complexity in the output:

$ radon cc cyclomatic_example.py -s
cyclomatic_example.py
F 4:0 main - B (6)

The output is a little cryptic. Here is what each part means:

  • F means function, M means method, and C means class.
  • main is the name of the function.
  • 4 is the line the function starts on.
  • B is the rating from A to F. A is the best grade, meaning the least complexity.
  • The number in parentheses, 6, is the cyclomatic complexity of the code.

The TDD cycle has three phases:

  1. The red phase — is when you write a test case for the functionality you want to add. When your test fails, your code is incomplete, and you have to implement the missing functionality.
  2. The green phase — is when you write the minimal amount of code needed to make the test pass. As soon as your test passes, the implementation of this feature is done (at least for now).
  3. The refactoring phase — is the final step in the TDD cycle. It’s when you change or improve existing code without changing its external behavior. In this phase, you improve the design without breaking any functionality. You will know if you have broken a functionality if an existing test fails during refactoring. TDD naturally leads to high code coverage.

In terms of productivity and deliverability, refactoring often comes at the end of TDD. We use TDD to maintain the correctness and quality of code, then we refactor the code but still fulfill the green phase of TDD.

The refactoring phase — is the final step in the TDD cycle. It’s when you change or improve existing code without changing its external behavior. In this phase, you improve the design without breaking any functionality. You will know if you have broken a functionality if an existing test fails during refactoring.

Benefits of Refactoring

  • Refactoring improves the objective attributes of the code. The components included are length, duplication, coupling and cohesion, and cyclomatic complexity which will have an impact on ease of maintenance
  • refactoring helps other people understand the code created
  • refactoring encourages every developer to think about and understand code design, especially in developing a sense of ownership of the code that has been created
  • refactoring will be very helpful when there are reusable design elements as well as modules

Sources

--

--