Remove the minimum number of invalid parentheses in order to make the input string valid. Return all possible results.

Note: The input string may contain letters other than the parentheses ( and )

Example 1:

Input: "()())()"
Output: ["()()()", "(())()"]

Example 2:

Input: "(a)())()"
Output: ["(a)()()", "(a())()"]

Example 3:

Input: ")("
Output: [""]

Solution:

The trivial, brute-force solution would be to do the following:

  1. Generate all of the \(2^n\) possible strings that are generated from removing some number of parentheses from the original string (where \(n\) is the length of the input string).
    • \(O(2^n)\) complexity
  2. Filter out any strings that are not valid
    • \(O(n)\) complexity for each of the \(O(2^n)\) generated strings
  3. Sort the valid strings by length
    • \(O(2^n n)\) to sort the \(O(2^n)\) generated strings
  4. Return all the valid strings of maximum length
    • \(O(2^n)\) to find all the strings of maximum length

The issue with this approach is that we do a ton of work generating the \(2^n\) possible strings even if we don’t need to generate or check all of them. If we could instead traverse the removals in some way that allows us to generate and validate the minimum number of strings possible, that would be better.

I propose that we can do this using a BFS approach in which we treat the input string as the root of the graph, and the edge \((a, b)\) exists if and only if there is some parenthesis within the string \(a\) that can be deleted to yield \(b\). Thus, level \(i\) of the graph (i.e., the set of nodes with a distance of \(i\) from the root node) is the set of strings that result from removing exactly \(i\) parentheses from the input string. It follows, then, that this graph has a single leaf node containing the input string, but without any parentheses.

For example, the graph for the input ()) would look like this:

   LEVEL 0                         ())
                                  /  \  
                                 /    \
                                /      \
                               /        \
                              /          \
                             /            \
                            /              \
                           /                \
   LEVEL 1                ))                ()
                            \              /  \                          
                             \            /    |
                              \          /     |
                               \        /      |
                                \      /       |
                                 \    /        |
                                  \  /         |
   LEVEL 2                          )          (
                                     \        /
                                      \      /
                                       \    /
                                        \  /
   LEVEL 3                               '' 

The brute-force approach outlined above produces every string within this graph; however, with the BFS approach, we can stop at the first level which contains a valid string, reducing the amount of work we need to do in every case except for the one in which we need to remove every parenthesis.

def removeInvalidParentheses(s):
  """
  :type s: str
  :rtype: List[str]
  """
  level = {s}
  while True:
    valid_strs_within_level = filter(is_valid, level)
    if valid_strs_within_level:
      return valid_strs_within_level
    else:
      level = {
        s[:i] + s[i+1:]
        for s in level
        for i in xrange(len(s))
        if s[i] in ['(', ')']
      }


def is_valid(s):
  paren_sum = 0
  for c in s:
    if c == '(':
      paren_sum += 1
    elif c == ')':
      paren_sum -= 1
    
    if paren_sum < 0:
      return False
  
  return paren_sum == 0

Complexity:

  • Time: \(O(2^n)\)
    • In the worst case, we need to traverse every level of the BFS graph, in which case we generate all \(2^n\) strings, assuming that all \(n\) characters in the string are parentheses and each removal yields a distinct string.
  • Space: \(O(2^n)\)
    • The largest layer of the BFS graph is the middle layer. Again assuming that all \(n\) characters in the string are parentheses and each removal yields a distinct string, there are exactly \(\binom{n}{n/2}\) nodes within this layer. A quick look at Stirling’s Approximation and some simple algebra makes it clear that this quantity is asymptotically bounded by \(O(2^n)\). Since level holds all the distinct strings within a given level of the BFS graph, the space consumed by level is \(O(2^n)\).