Skip to content Skip to sidebar Skip to footer

Find The Shortest Substring Whose Replacement Makes The String Contain Equal Number Of Each Character

I have a string of length n composed of letters A,G,C and T. The string is steady if it contains equal number of A,G,C and T(each n/4 times). I need to find the minimum length of t

Solution 1:

The minimal substring can be found in O(N) time and O(N) space.

First count a frequency fr[i] of each character from the input of length n. Now, the most important thing to realise is that the necessary and sufficient condition for a substring to be considered minimal, it must contain each excessive character with a frequency of at least fr[i] - n/4. Otherwise, it won't be possible to replace the missing characters. So, our task is to go through each such substring and pick the one with the minimal length.

But how to find all of them efficiently?

At start, minLength is n. We introduce 2 pointer indices - left and right (initially 0) that define a substring from left to right in the original string str. Then, we increment right until the frequency of each excessive character in str[left:right] is at least fr[i] - n/4. But it's not all yet since str[left : right] may contain unnecessary chars to the left (for example, they're not excessive and so can be removed). So, we increment left as long as str[left : right] still contains enough excessive elements. When we're finished we update minLength if it's larger than right - left. We repeat the procedure until right >= n.

Let's consider an example. Let GAAAAAAA be the input string. Then, the algorithm steps are as below:

1.Count frequencies of each character:

['G'] = 1, ['A'] = 6, ['T'] = 0, ['C'] = 0 ('A'is excessive here)

2.Now iterate through the original string:

Step#1: |G|AAAAAAAsubstr = 'G' - no excessive chars (left = 0, right = 0) 
Step#2: |GA|AAAAAAsubstr = 'GA' - 1 excessive char, we need 5 (left = 0, right = 1)
Step#3: |GAA|AAAAAsubstr = 'GAA' - 2 excessive chars, we need 5 (left = 0, right = 2)
Step#4: |GAAA|AAAAsubstr = 'GAAA' - 3 excessive chars, we need 5 (left = 0, right = 3)
Step#5: |GAAAA|AAAsubstr = 'GAAAA' - 4 excessive chars, we need 5 (left = 0, right = 4)
Step#6: |GAAAAA|AAsubstr = 'GAAAAA' - 5 excessive chars, nice but can we remove something from left? 'G' is not excessive anyways. (left = 0, right = 5)
Step#7: G|AAAAA|AAsubstr = 'AAAAA' - 5 excessive chars, wow, it's smaller now. minLength = 5 (left = 1, right = 5)   
Step#8: G|AAAAAA|A
    substr = 'AAAAAA' - 6 excessive chars, nice, but can we reduce the substr? There's a redundant 'A'(left = 1, right = 6)
Step#9: GA|AAAAA|Asubstr = 'AAAAA' - 5 excessive chars, nice, minLen = 5 (left = 2, right = 6)
Step#10: GA|AAAAAA|substr = 'AAAAAA' - 6 excessive chars, nice, but can we reduce the substr? There's a redundant 'A'(left = 2, right = 7)
Step#11: GAA|AAAAA|
    substr = 'AAAAA' - 5 excessive chars, nice, minLen = 5 (left = 3, right = 7)
Step#12: That's it as right >= 8

Or the full code below:

from collections import Counter

n =int(input())
gene = raw_input()
char_counts = Counter()
for i inrange(n):
    char_counts[gene[i]] +=1

n_by_4 = n /4
min_length = n
left=0right=0

substring_counts = Counter()
while right< n:
    substring_counts[gene[right]] +=1right+=1

    has_enough_excessive_chars =Truefor ch in "ACTG":
        diff = char_counts[ch] - n_by_4
        # the char cannot be used to replace other items
        if (diff >0) and (substring_counts[ch] < diff):
            has_enough_excessive_chars =False
            break

    if has_enough_excessive_chars:
        while left<rightand substring_counts[gene[left]] > (char_counts[gene[left]] - n_by_4):
            substring_counts[gene[left]] -=1left+=1

        min_length =min(min_length, right-left)

print (min_length)

Solution 2:

Here's one solution with limited testing done. This should give you some ideas on how to improve your code.

from collections import Counter
import sys
import math

n =int(input())
s1 = input()
s = Counter(s1)

if all(e <= n/4for e in s.values()):
    print(0)
    sys.exit(0)

result= math.inf
out=0for mnum inrange(n):
    s[s1[mnum]] -=1
    while all(e <= n/4for e in s.values()) andout<= mnum:
        result=min(result, mnum -out+1)
        s[s1[out]] +=1out+=1

print(result)

Post a Comment for "Find The Shortest Substring Whose Replacement Makes The String Contain Equal Number Of Each Character"