Basic fuzzy matching function

Fuzzy matching is comparing the similarity of two strings to determine if they are an approximate match. Code editors like VS Code use this when doing a file or symbol search, where you can type some characters (not necessarily in perfect order) and it gives you a list of suggestions.

In your own apps, you might use fuzzy matching to implement a search feature or likewise a list of suggestions based on user input.

In this post, I’m going to present (and explain) a very simple fuzzy matching function. This is not a robust solution. There are algorithms that are way more advanced and provide better suggestions than what this simple function can do. However, with all that power comes a trade-off in efficiency.

A simple use-case

The fuzzy matching function I’m going to present is good for simple use-cases where you want something a bit more flexible than exact string comparison without going all-in on a heavier solution.

For example, say you have an input field where the user enters their country, and you want to show them a list of suggestions as they type. Do you really need to use a Levenshtein distance or Soundex algorithm for such basic functionality? I’d argue no, and that something that’s fast and can handle very basic misspellings may be a better solution in cases such as these.

The code

Here’s the entirety of the function (in TypeScript), which I’ll cover in more detail:

function fuzzyMatch(searchFor: string, inTarget: string): number {
  const search = (searchFor || "").trim().toLowerCase();
  const target = (inTarget || "").trim().toLowerCase();
  const searchLength = search.length;
  const targetLength = target.length;

  // early return conditions
  if (searchLength > targetLength) return 0; // search can't be longer than target
  if (search === target) return 1.0; // search same as target is an exact match

  // convert search and target to arrays
  const searchChars = search.split("");
  const targetChars = target.split("");

  let hits = 0; // count of matching chars
  let hitIdx = -1; // tracks index of last hit char

  for (let i = 0; i < searchLength; i += 1) {
    const searchChar = searchChars[i];

    // start searching target at next char (previous hit index + 1, or 0)
    for (let j = hitIdx + 1; j < targetLength; j += 1) {
      const targetChar = targetChars[j];

      if (searchChar === targetChar) {
        // match; increment hit, record index and move onto the next search char
        hits += 1;
        hitIdx = j;
        break;
      }

      // if final char of target is not a match, we now know that all chars
      // cannot be matched in the target; abort (no match)
      if (j === targetLength - 1) {
        return 0;
      }
    }
  }

  // weight is the ratio of hits to target length
  return hits / targetLength;
}

How fuzzyMatch() works

This function takes two inputs: a string to search for, a target to search in, and returns a weight between 0 and 1 indicating the overall “strength” of the match (1 being an exact match and 0 being no match).

If all the characters are included in the target string consecutively, then it’s considered a match. If there are characters in the search string that are not found in the target or matches are not consecutive, then it’s not a match (returns 0).

So for example, given the search string “ldybg” and the target “ladybug, here’s how it would play out:

ladybug
l-dyb-g

Since all the characters in “ldybg” are found consecutively within “ladybug”, it’s considered a match with a weight of ~0.7. If you were to swap the positions of “b” and the “g” (ldygb), it would not be a match because while all characters are found in the target, they are not consecutive matches.

The function is case-insensitive and will trim any white-space from the beginning and end of both the search and target strings before any comparisons are made (this functionality can be easily removed if you don’t want it).

As mentioned previously, the algorithm is very basic. However, it’s a lot lighter than other (more robust) solutions and can give better results than exact comparisons via ===, includes(), or indexOf() for a number of use-cases.

Sanitizing and early returns

Starting from the top of the function, we have this block of code:

// A
const search = (searchFor || "").trim().toLowerCase();
const target = (inTarget || "").trim().toLowerCase();
const searchLength = search.length;
const targetLength = target.length;

// B
if (searchLength > targetLength) return 0;
if (search === target) return 1.0;

White-space is removed from the ends and the strings are lower-cased to prevent mismatches due to white-space or casing, and the lengths are stored (A).

Next, some quick checks are made to see if the function can return early (B). If the search string has more characters than the target, it’s an automatic no-match. If both strings are the same, we can go ahead and return a full-match. No use doing extra work if it’s not needed!

Iterating over the characters

// A
const searchChars = search.split("");
const targetChars = target.split("");

// B
let hits = 0;
let hitIdx = -1;

// C
for (let i = 0; i < searchLength; i += 1) {
  const searchChar = searchChars[i];

  // D
  for (let j = hitIdx + 1; j < targetLength; j += 1) {
    const targetChar = targetChars[j];

In the next chunk of code, we do some initial setup by converting the strings to arrays (A) for iteration and initialize a couple of variables to track the number of “hits” (matched characters) and the index of the last hit (B).

The main iteration is going over each character of the search string (C) and then for each of these characters, it iterates over the target string (D) to perform the checks. In the second loop, you’ll notice that instead of starting at the beginning every time, we start it from the character after the last matched index (I’ll explain why in the next section). On the very first run, since hitIdx is initialized to -1, it will start at the first character of the target string (-1 + 1 = 0).

I’m using for-loops here (rather than forEach) so we can break and return out of them early when needed.

Checking for hits

if (searchChar === targetChar) {
  hits += 1;
  hitIdx = j;
  break;
}

Within the second for-loop, the first check is made to see if the current character in the search string (defined in the outer loop) matches the current character of the target string.

If there’s a match, hits is incremented and the current index is stored (hitIdx) so the next comparison will begin from the next character of the target instead of the beginning, which prevents duplicate characters from hitting on the same index more than once. Since this is a match, break is called because we no longer need to keep searching the target for this character anymore.

Handling misses

if (j === targetLength - 1) {
  return 0;
}

When there is a character that does not match, a check is made to see if this is iteration is the last character of the target string and return 0 if so (no-match scenario). The reason for this is because if this is a non-matching character and there are no more characters left to search, we know that not all of the characters will be matched and can safely exit the function at this point (no-match scenario).

Calculating the weight

return hits / targetLength;

If we’ve successfully made it past the loops without returning early, then we have a match. All that’s left is to return the weight, which is the ratio of hits to the length of the target string. This will always resolve to a number between 0 and 1.0 which you can use for sorting results.

Filter and sort a list

One of the most common ways to use this function is for filtering and sorting a list of items (like in the aforementioned country suggestion feature).

The following function will take a search string, a list of target objects, and an array of keys to be searched (on each object). Using fuzzyMatch(), it will return a new array of matching items sorted by weight (descending order; highest weight first):

type FuzzyMatch = {
  weight: number;
  obj: any;
};

function fuzzyFilter(search: string, targets: any[], keys: string[]): any[] {
  const matches: FuzzyMatch[] = [];

  for (let i = 0; i < targets.length; i += 1) {
    const obj = targets[i];
    let highestWeight = 0;
    let matchIdx = -1;

    // call fuzzyMatch() on the value of each key to determine weight
    for (let j = 0; j < keys.length; j += 1) {
      const key = keys[j];

      // allow string and number values to be searched)
      if (typeof obj[key] === "string" || typeof obj[key] === "number") {
        const target = `${obj[key]}`;
        const weight = fuzzyMatch(search, target);

        if (weight) {
          if (matchIdx < 0) {
            // first key match was found; add obj to the list of matches
            matchIdx = matches.length;
            matches.push({ weight, obj });
          }

          // in case match was found in another key, check weight to see if
          // greater and update the weight if so
          if (weight > highestWeight) {
            matches[matchIdx].weight = weight;
            highestWeight = weight;
          }
          break;
        }
      }
    }
  }

  // sort matches; highest weight first
  matches.sort((mA: FuzzyMatch, mB: FuzzyMatch) => {
    const a = mA.weight;
    const b = mB.weight;
    if (a > b) return -1;
    if (a < b) return 1;
    return 0;
  });

  return matches.map((match) => match.obj);
}

To show this function in action, given an array of cats:

type Cat = {
  name: string;
  breed: string;
};

const cats: Cat[] = [
  {
    name: "Dwayne",
    breed: "Ragdoll",
  },
  {
    name: "Tabytha",
    breed: "Saimese",
  },
  {
    name: "Tom",
    breed: "Unknown",
  },
  {
    name: "Zelda",
    breed: "Tabby",
  },
];

If we filter the list of cats by the search string “taby”, we’ll get the following result:

fuzzyFilter("taby", cats, ["name", "breed"]);
[
  {
    "name": "Zelda",
    "breed": "Tabby"
  },
  {
    "name": "Tabytha",
    "breed": "Saimese"
  }
]

Only two of the items were returned, with the first match on the breed (Tabby) and another match on the name (Tabytha). The order has Zelda first because the ratio of matching characters to string length (weight) is greater than that of Tabytha’s.

And that’s it! There’s a lot to unpack here, but I hope you learned something new and find some use for these functions in your projects.

The code for both functions are available as a TypeScript module here: https://gist.github.com/jonbeebe/b8e573216f21581d0784fc172267f37d

And the same module, but in JavaScript: https://gist.github.com/jonbeebe/bb6e6df956c9c6a16f6b2f5a6492954a