Difficulty: Medium, Asked-in: Google, Amazon, Uber, Hike
Key takeaways
Given two array strings X and Y of size m and n, write a code to find the length of the longest common sequence.
Example 2
Input: X[ ] = [A, B, C, B, D, A, B], Y[ ]= [B, D, C, A, B, A]
Output: 4
Explanation: There are many common subsequences of X and Y. For example, the sequence [B, C, A] is a common subsequence of length 3 but it is not the longest common subsequence X and Y. Similarly, sequence [B, C, B, A], which is also common to both X and Y, has length 4. The sequence [B, C, B, A] is an LCS of X and Y, as is the sequence [B, D, A, B]. In other words, X and Y have no common subsequence of length 5 or greater.
Example 2
Input: X[] = [E, B, T, B, C, A, D, F], Y[] = [A, B, B, C, D, G, F]
Output: 5, Explanation: The longest common subsequence is [B, B, C, D, F].
Solution Idea and Steps
This is an optimization problem where solution space is very large i.e. there can be so many common subsequences of both strings, but we need to find a common subsequence with the longest length. So, the one basic idea would be to explore all the subsequences of both strings and find the longest among them. When we need to do an exhaustive search or explore all possibilities, we can try to use the idea of recursion. Think! Now the critical question would be: how do we explore all the subsequences recursively? Let's think!
Here we have one obvious choice at the start: the last characters of both strings are equal or not! If both are equal, then that common character will be part of the longest subsequences. Think! Let's assume function lcs(X, Y, m, n) returns the length of the longest subsequence.
Case 1: If the last characters of strings are equal i.e. if(X[m-1] == Y[n-1])
In this situation, the problem gets reduced to finding the longest subsequence of the remaining substring of input size m-1 and n-1.
lcs(X, Y, m, n) = 1 + lcs(X, Y, m - 1, n - 1)
Case 2: If the last characters of strings are not equal i.e. if (X[m-1] != Y[n-1])
In this situation, there are two possibilities of smaller sub-problems, and we need to find the maximum of them:
lcs(X, Y, m, n) = max ( lcs(X, Y, m , n - 1), lcs(X, Y, m - 1 , n) )
Base case: recursion will terminate when any one of the string sizes gets reduced to 0 i.e. if (m == 0 || n == 0), we return 0.
Solution Pseudocode
Time and space complexity analysis
Space complexity depends on the size of the call stack, which is equal to the height of the recursion tree. Here input parameters m and n are at max decreasing by 1, so the height of the recursion tree = max(m, n). Space complexity = max(m, n).(Think!)
Now the critical question would be: why is time complexity exponential? The answer would be simple: we have overlapping sub-problems i.e. we are solving the same sub-problems again and again during the recursion. This can be better visualized by drawing the recursion tree.
In the following diagram of the recursion tree, the subproblem of size (m-2, n-2) is coming twice. We can observe other repeating subproblems if we draw the recursion tree further.
Solution Idea
The problem with the recursive solution is that the same subproblems get called many times. A subproblem consists of two parameters, m and n, which is at-max decreasing by 1 during each recursive call, so there are exactly (m+1)(n+1) possible subproblems. Some of these subproblems must be being solved over and over.
So one efficient idea would be: use the memoization approach of dynamic programming and store the solution of each sub-problems into an extra memory. When we encounter the same sub-problem during the recursion, we look up the extra memory and return the already calculated solution directly instead of computing it again. This could help us to reduce a lot of computation and improve the time complexity significantly. Think!
Note: The dynamic programming solution is to check whenever we want to solve a subproblem, whether we've already done it before. If so, we look up the solution instead of recomputing it. Implemented in the most direct way, we just add some code to our recursive algorithm to do this look-up. This "top-down" recursive version of dynamic programming is known as "memoization".
Solution Pseudocode
Time and space complexity analysis
Solution Idea and Steps
Since we have identified that it is a dynamic programming problem, we can solve it efficiently using the bottom-up approach. Our aim is to calculate the solution of the smaller problems in an iterative way and store their result in a table. But the critical question is : how do we build the solution in a bottom-up manner? Here are the important steps:
Iterative structure to fill the table: Now, we need to define the iterative structure to fill the table LCS[][] i.e. a relation by which we build the solution of the larger problem using the solution of smaller problems in a bottom-up manner. We can easily define the iterative structure by using the recursive structure of the recursive solution.
LCS[i][j] = 1 + LCS[i-1][j-1], if(X[i-1] == Y[j-1])
LCS[i][j] = max (LCS[i][j-1], LCS[i-1][j]), if(X[i-1] != Y[j-1])
Solution Pseudocode
Time and space complexity analysis
Time complexity = time complexity of initializing the table + time complexity of filling the table into a bottom-up manner + time complexity of returning the final solution = O(m + n) + O(mn) + O(1) = O(mn)
Space complexity = O(mn) for storing the table size (m+1)*(n+1).
If we observe the previous 2-D solution of the problem, we are only using the adjacent indexes in the table to build the solution in a bottom-up manner. In other words, we are using LCS[i-1][j-1], LCS[i][j-1] and LCS[i-1][j] to fill the position LCS[i][j]. So there are two basic observations:
So there is no need to store all rows in our DP matrix and we can just store two rows at a time. In other words, we can use a two-row 2D array LCS[2][n+1] to update the values and get the output.
Solution Pseudocode
Time and space complexity analysis
There is no change in time complexity in comparison to the previous approach (as we are using two nested loops similar to the previous approach). So time complexity = O(mn). Space complexity gets reduced to O(n) because the total number of rows is constant.
Solution Idea and Steps
Now the critical question would be: can we optimize the space complexity further? The idea is: In the last approach, we are using a two-row 2-D table and storing values in a row-wise fashion using two values from the previous row. So can we store one row of data and track two previous rows values using two variables? Let's think!
Now we run nested loops similar to the last approach. Here at the ith iteration of the outer loop, we store ith row values in table LCS[n] using the inner loop. Suppose after the (i-1)th iteration of the outer loop, we have all the values of (i-1)th row stored in the table LCS[n]. Now we try to store the values of the ith row using the inner loop.
We highly recommend visualizing this approach on paper by drawing the flow of storing values in the table by updating variables. Think!
Solution Pseudocode
Time and space complexity analysis
There is no change in time complexity as we use two nested loops similar to the previous approach. So time complexity = O(mn). Space complexity is still O(n), but it is the optimized version of the previous approach because we only use a 1-D table to store one row.
Why might we want to solve the longest common subsequence problem? There are several motivating applications.
Longest Palindromic Subsequence
Maximum Length of Repeated Subarray
Enjoy learning, Enjoy problem-solving!
Get well-designed application and interview centirc content on ds-algorithms, machine learning, system design and oops. Content will be delivered weekly.