jazzpeh

Longest Common Subsequence

Jazz Peh, April 30, 2020

Divide & Conquer algorithm can be used to solve Longest Common Subsequence problem . Let’s explore how it works below.

Problem Statement

  • We are given two strings s1 and s2.
  • We need to find the length of the longest subsequence which is common in both the strings.
  • Subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.

Example 1:

Input: "elephant", "erepat"
Output: 5
Explanation: The longest substring is "eepat".

Example 2:

Input: "houdini", "hdupti"
Output: 3
Explanation: The longest substring is "hui".

Solution

Using Example 1 as a reference, we need to determine the probablity. Typically when comparing 2 characters, there are 2 probablities:

Matches

Say, if we check the first characters of each string. Since they matches, we can say that 1 + F(2, 2).

Let’s go in depth on the subproblem above. There is an additional of 1 because we have a match on the first character. The second part F(2, 2) means that for the next iteration where we’re moving to the second character for each string.

Doesn’t match

Since it doesn’t match, we can say that 0 + F(3, 2), and also 0 + F(2, 3).

As the characters doesn’t match, we need to stay at the same character for one of the string and move to the next character for the other.

Getting the answer

Hence, if we manage to solve the subproblems, the end result would be the maximum value of these 3 subproblems.

Code algorithm

function lcs(str1: string, str2: string, i1: number, i2: number) {
  if (i1 === str1.length || i2 === str2.length)
    return 0

  let result1 = 0;
  if (str1[i1] == str2[i2]) {
    result1 = 1 + lcs(str1, str2, i1 + 1, i2 + 1);
  }

  const result2 = lcs(str1, str2, i1 + 1, i2);
  const result3 = lcs(str1, str2, i1, i2 + 1);

  return Math.max(result1, result2, result3);
}

Optimisation

So the solution is not optimised. Why would I say that? This is because, we are running certain calculation twice or more. Let’s take a look at the recursion tree below:

              lcs(0,0)
    /             |             \
lcs(0,1)        lcs(1,0)        lcs(1,1)
  |               |               |
  |-lcs(0,2)      |-lcs(1,1)      |-lcs(1,2)
  |-lcs(1,1)      |-lcs(2,0)      |-lcs(2,1)
  |-lcs(1,2)      |-lcs(2,1)      |-lcs(2,2)

We are actually running lcs(1,1) at least 3 times, lcs(1,2) at least twice, etc. This is a case of overlapping subproblems. Hence in order to optimise this solution we can use Dynamic Programming.

Top Down Approach

Top Down Approach is also known as Memoization. Below shows the code algorithm to achieve this approach.

type HashTable = {
  [x: number]: {
    [y: number]: number;
  };
};

const dp: HashTable = {};

function lcs(str1: string, str2: string, i1: number, i2: number): number {
  if (i1 === str1.length || i2 === str2.length)
    return 0;

  if (!(i1 in dp) && !(i2 in dp[i1])) {
    if (str1[i1] === str2[i2]) {
      result1 = 1 + lcs(str1, str2, i1 + 1, i2 + 1);
    }

    const result2 = lcs(str1, str2, i1 + 1, i2);
    const result3 = lcs(str1, str2, i1, i2 + 1);

    if (!dp[i1]) {
      dp[i1] = {};
    }

    dp[i1][i2] = Math.max(result1, result2, result3);
  }

  return dp[i1][i2];
}

As we can see above, it is almost the same as Divide & Conquer except for the additonal variable to store already-solved subproblems. This would ensure that we don’t run the recursion again to solve something that was already solved.

Bottom Up Approach

Bottom Up Approach is also known as Tabulation. This approach is more complex as compared to Top Down Approach. Hence, let’s explore the solution.

Firstly, let’s start from looking at the Top Down Approach using a matrix table with a sample test input.

B1B2B3B4B5B6B7B8
HOUDINI
A1Hmax(1+A2B2, A1B2, A2B1)max(A1B3, A2B2)max(A1B4, A2B3)max(A1B5, A2B4)max(A1B6, A2B5)max(A1B7, A2B6)max(A1B8, A2B7)0
A2Umax(A2B2, A3B1)max(A2B3, A3B2)max(1+A3K4, A2B4, A3B3)max(A2B5, A3B4)max(A2B6, A3B5)max(A2B7, A3B6)max(A2B8, A3B7)0
A3Imax(A3B2, A4B1)max(A3B3, A4B2)max(A3B4, A4B3)max(A3B5, A4B4)max(1+A4B6, A3B6, A4B5)max(A3B7, A4B6)max(A3B8, A4B7)0
A4Nmax(A4B2, A5B1)max(A4B3, A5B2)max(A4B4, A5B3)max(A4B5, A5B4)max(A4B6, A5B5)max(1+A5B7, A4B7, A5B6)max(A4B8, A5B7)0
A5Dmax(A5B2, A6B1)max(A5B3, A6B2)max(A5B4, A6B3)max(1+A6B5, A5B5, A6B4)max(A5B6, A6B5)max(A5B7, A6B6)max(A5B8, A6B7)0
A600000000

We then need to find the cell which doesn’t have any dependency as our starting point as we will have our first answer there. That would be the last cell A5B7. And we slowly move our way up to the rest of the cells until we reach the first cell A1B1 which will give us our final answer.

B8, A6 rows and columns are our base conditions where we return 0.

type HashTable = {
  [x: number]: {
    [y: number]: number;
  };
};

function lcs(str1: number, str2: number) {
  const dp: HashTable = {};

  for (let i = str1.length; i  > 0; --i) {
    if (!dp[i1]) {
      dp[i1] = {};
    }

    for (let j = str2.length; j > 0; --j) {
      if str1[i - 1] == str2[j - 1] {
        dp[i1][i2] = Math.max(1 + dp[i - 1][j - 1], dp[i][j + 1], dp[i + 1][j]);
      } else {
        dp[i1][i2] = Math.max(dp[i][j + 1], dp[i + 1][j]);
      }
    }
  }

  return dp[0][0];
}

Explore more like this

© 2020 Jazz Peh, Built with Gatsby