Archive

Monthly Archives: February 2012

Some time ago I was interviewed as python developer to appropriate position in Cogniance company. There were several test tasks:

# 1. get dict keys in descending order by values
from operator import itemgetter
d = {‘aa’: 12, ‘bb’: 32, ‘cc’: -45, ‘dd’: 43}
print [key for key, value in sorted(d.items(), key=itemgetter(1), reverse=True)]
# or
items = d.items()
items.sort(key=itemgetter(1), reverse=True) 

# 2. function to calculate sum provided as string like “i1+i2+…+in”
def calculate(str):
return reduce(lambda sum, x: sum + int(x), str.split(‘+’), 0)

“”” 3.
You are given ‘n’ strings w1, w2, ……, wn. Let Si denote the set of strings formed by considering all unique substrings of the string wi. A substring is defined as a contiguous sequence of one or more characters in the string. More information on substrings can be found here. Let ‘S’ = {S1 U S2 U …. Sn} .i.e ‘S’ is a set of strings formed by considering all the unique strings in all sets S1, S2, ….. Sn. You will be given many queries and for each query, you will be given an integer ‘k’. Your task is to output the lexicographically kth smallest string from the set ‘S’.

Input:The first line of input contains a single integer ‘n’, denoting the number of strings. Each of the next ‘n’ lines consists of a string. The string on the ith line (1<=i<=n) is denoted by wi and has a length mi. The next line consists of a single integer ‘q’, denoting the number of queries. Each of the next ‘q’ lines consists of a single integer ‘k’.

Note: The input strings consist only of lowercase english alphabets ‘a’ – ‘z’.

Output:
Output ‘q’ lines, where the ith line consists of a string which is the answer to the ith query. If the input is invalid (‘k’ > |S|), output “INVALID” (quotes for clarity) for that case.

Constraints:
1<=n<=50
1<=mi<=2000
1<=q<=500
1<=k<=1000000000

Sample Input:
2
aab
aac
3
3
8
23

Sample Output:
aab
c
INVALID

Explanation:

For the sample test case, we have 2 strings “aab” and “aac”.
S1 = {“a”, “aa”, “aab”, “ab”, “b”} . These are the 5 unique substrings of “aab”.
S2 = {“a”, “aa”, “aac”,  “ac”, “c” } . These are the 5 unique substrings of “aac”.
Now, S = {S1 U S2} = {“a”, “aa”, “aab”, “aac”, “ab”, “ac”, “b”, “c”}. Totally, 8 unique strings are present in the set ‘S’. 
The lexicographically 3rd smallest string in ‘S’ is “aab” and the lexicographically 8th smallest string in ‘S’ is “c”. Since there are only 8 distinct substrings, the answer to the last query is “INVALID”.

“””

import sys

def getline():
“”” interface for getting lines from some source
“””
return sys.stdin.readline()

def substrings(str):
 “”” generator: all substrings
 “””
 n_s = len(str)
 for start in xrange(n_s):
  for end in xrange(start + 1, n_s + 1):
   yield str[start:end]

# get input data and construct S

unique_set = set()
n = int(getline())
for i in xrange(n):
for substr in substrings(getline().strip()):
  unique_set.add(substr)
S = sorted(unique_set)

# get queries and output result strings for them
for i in xrange(int(getline())):
try:
  print S[int(getline()) – 1]
except IndexError:
  print ‘INVALID’

 

One year ago I’ve posted short note about imperative vs. functional approaches in python. Well, that one was incorrect 😉

The task was to get all unique tokens from several string separated by some delimiter.

regions_strs = (“Some, ST / More, HERE”, “Some, ST / More2, HERE2”)

So, result should be: (‘Some, ST’, ‘More, HERE’, ‘More2, HERE2’)

# 1. classic imperative solution
regions = set()
for s in regions_strs:
  for token in s.split(‘/’):
    regions.add(token.strip())

# 2. very bad still imperative ugly solution
regions = reduce(
  lambda regions_set, regions_str:
    reduce(
        lambda regions_set, region:
            regions_set | set((region.strip(),)),
        regions_str.split(‘/’),
        regions_set),
    regions_strs,
    set())

# 3. really functional way
regions = iunique(token.strip() for token in
    itertools.chain(*(s.split(‘/’) for s in regions_strs))
)

where iunique is a special generator that used as filter for abstract iterator. It’s a pure tool like those from itertools and collections standard python toolsets.

def iunique(iterable, key=lambda x: x):
  “”” unique filter “””
  seen = set()
  for elem, ekey in ((e, key(e)) for e in iterable):
    if ekey not in seen:
      yield elem
      seen.add(ekey)

2’nd implementation is still imerative because it provides the result as data. 3-rd approach provides iterator instead. So, real functional programming way is providing not data result but function. Result is something that can be evaluated/calculated to obtain actual result.

Besides, 2-nd code operates by data in its implementation and 3-rd functional code uses only source data (tuple of source strings and separator).

2-nd solution uses functional sugar like lambda, reduce, split, but those ones only made code ugly and huge. Keep in mind, the code should be as small as possible and very clear.

Python has powerful functional tools like iterators and generators; standard packages itertools and collections. So, it’s ok to do some functional programming in python that would be clear, correct and robust.