Comparing Strings using Levenshtein Distance

by Christoph Schiessl on Python

Levenshtein distance is an easy-to-understand algorithm that measures the similarity of a pair of strings. Imagine that you have received one input string, x, and your job is to find the string most similar to x from a list of candidates. To do this, you need a "distance" function to determine if one candidate is more or less similar to x than some other candidate. This is precisely what Levenshtein distance does. It's certainly not the only such algorithm, but it is very popular because it is easy to implement. The idea behind the algorithm can be explained as a four-step process:

  1. If a is blank (i.e., len(a) == 0), then its distance to b is len(b).
  2. If b is blank (i.e., len(b) == 0), then its distance to a is len(a).
  3. If a and b have a common first character (i.e., a[0] == b[0]), then their distance is equal to the distance between a[1:] and b[1:] (ignore first character). Consequently, the distance is 0, if the two strings are equal (i.e., a == b).
  4. Otherwise, their distance is 1 plus the minimum of the following three options:
    • The distance between a[1:] and b (delete the first character of a).
    • The distance between a and b[1:] (delete the first character of b).
    • The distance between a[1:] and b[1:] (the first character is replaced by the first character of the other string).

This was the hard part — the implementation is the easy part because we can simply translate the four steps above 1:1 into Python code:

def levenshtein_distance(a: str, b: str) -> int:
    # (1) Early exit if `a` is blank ...
    if len(a) == 0:
        return len(b)

    # (2) Early exit if `b` is blank ...
    if len(b) == 0:
        return len(a)

    # (3) Ignore the common first character of `a` and `b` ...
    if a[0] == b[0]:
        return levenshtein_distance(a[1:], b[1:])

    # (4) Otherwise, `1` plus the minimum of the following ...
    return 1 + min(
        levenshtein_distance(a[1:], b),  # Delete first char of `a`
        levenshtein_distance(a, b[1:]),  # Delete first char of `b`
        levenshtein_distance(a[1:], b[1:]),  # Replace first character
    )

In case it is not yet clear, Levenshtein distance is counting the number of characters that have to be deleted/inserted/replaced to transform the string a into the string b.


When you implement an algorithm like this, it's always a good idea to first write a few test-cases to ensure that your implementation is actually correct. For the sake of completeness, here are the test-cases that I have created for myself:

assert levenshtein_distance("", "") == 0              # equal
assert levenshtein_distance("abcxyz", "abcxyz") == 0  # equal

assert levenshtein_distance("", "abcxyz") == 6        # nothing
assert levenshtein_distance("abcxyz", "") == 6        # nothing

assert levenshtein_distance("a", "abcxyz") == 5       # beginning
assert levenshtein_distance("ab", "abcxyz") == 4      # beginning

assert levenshtein_distance("cx", "abcxyz") == 4      # middle
assert levenshtein_distance("bcxy", "abcxyz") == 2    # middle

assert levenshtein_distance("yz", "abcxyz") == 4      # ending
assert levenshtein_distance("z", "abcxyz") == 5       # ending

assert levenshtein_distance("az", "abcxyz") == 4      # fractured
assert levenshtein_distance("acxz", "abcxyz") == 2    # fractured
assert levenshtein_distance("bcy", "abcxyz") == 3     # fractured

assert levenshtein_distance("123", "abcxyz") == 6     # different chars
assert levenshtein_distance("ab123", "abcxyz") == 4   # different chars
assert levenshtein_distance("1a2c3z", "abcxyz") == 4  # different chars

As you can see, I decided against using a testing framework like pytest, which would have been overkill, and instead used simple assert statements. Anyway, thank you for reading, and see you soon!

Christoph Schiessl

Hi, I'm Christoph Schiessl.

I help you build robust and fast Web Applications.


I'm available for hire as a freelance web developer, so you can take advantage of my more than a decade of experience working on many projects across several industries. Most of my clients are building web-based SaaS applications in a B2B context and depend on my expertise in various capacities.

More often than not, my involvement includes hands-on development work using technologies like Python, JavaScript, and PostgreSQL. Furthermore, if you already have an established team, I can support you as a technical product manager with a passion for simplifying complex processes. Lastly, I'm an avid writer and educator who takes pride in breaking technical concepts down into the simplest possible terms.

Continue Reading?

Here are a few more Articles for you ...


Calculating Levenshtein Distance on the Word Level

This article explains how to use the Levenshtein algorithm to compare sentences by looking for word differences instead of character differences.

By Christoph Schiessl on Python

Force Index Usage by Manipulating the Query Planner

Learn how to manipulate PostgreSQL's query planner to force it to use your indexes while working on optimizing the performance of your queries.

By Christoph Schiessl on PostgreSQL

How to Recursively Create Directories

Creating directories in Python using the pathlib module, including normal and recursive creation, handling existing directories, and file system permissions.

By Christoph Schiessl on Python

Web App Reverse Checklist

Ready to Build Your Next Web App?

Get my Web App Reverse Checklist first ...


Software Engineering is often driven by fashion, but swimming with the current is rarely the best choice. In addition to knowing what to do, it's equally important to know what not to do. And this is precisely what my free Web App Reverse Checklist will help you with.

Subscribe below to get your free copy of my Reverse Checklist delivered to your inbox. Afterward, you can expect one weekly email on building resilient Web Applications using Python, JavaScript, and PostgreSQL.

By the way, it goes without saying that I'm not sharing your email address with anyone, and you're free to unsubscribe at any time. No spam. No commitments. No questions asked.