Archive

Tag Archives: python

Twitter API application for deleting latest tweets

About 2 weeks ago I had some bad accident with my twtitter account. So, I need to delete a lot of annoying pure spam tweets. I create simple console tool by python. It uses Twitter REST API to browse the tweets and delete some of them.

Advertisements

There is not a function for restoring several deleted files in standard dropbox interface. Fortunately, there is dropbox API. I created simple script to restore all dropbox files recursively.

  1. You need to register dropbox application at https://www.dropbox.com/developers with Full Dropbox access type.
  2. Then you need to put your App key and App secret to the dropbox_restore_all.py script.
  3. Run it under python 2.x
  4. Then it will ask you for authorization via provided URL.
  5. After authorization press Enter to start restoring process.

Some time ago I was interviewed as python developer to appropriate position in Cogniance company. There were several test tasks:

# 1. get dict keys in descending order by values
from operator import itemgetter
d = {‘aa’: 12, ‘bb’: 32, ‘cc’: -45, ‘dd’: 43}
print [key for key, value in sorted(d.items(), key=itemgetter(1), reverse=True)]
# or
items = d.items()
items.sort(key=itemgetter(1), reverse=True) 

# 2. function to calculate sum provided as string like “i1+i2+…+in”
def calculate(str):
return reduce(lambda sum, x: sum + int(x), str.split(‘+’), 0)

“”” 3.
You are given ‘n’ strings w1, w2, ……, wn. Let Si denote the set of strings formed by considering all unique substrings of the string wi. A substring is defined as a contiguous sequence of one or more characters in the string. More information on substrings can be found here. Let ‘S’ = {S1 U S2 U …. Sn} .i.e ‘S’ is a set of strings formed by considering all the unique strings in all sets S1, S2, ….. Sn. You will be given many queries and for each query, you will be given an integer ‘k’. Your task is to output the lexicographically kth smallest string from the set ‘S’.

Input:The first line of input contains a single integer ‘n’, denoting the number of strings. Each of the next ‘n’ lines consists of a string. The string on the ith line (1<=i<=n) is denoted by wi and has a length mi. The next line consists of a single integer ‘q’, denoting the number of queries. Each of the next ‘q’ lines consists of a single integer ‘k’.

Note: The input strings consist only of lowercase english alphabets ‘a’ – ‘z’.

Output:
Output ‘q’ lines, where the ith line consists of a string which is the answer to the ith query. If the input is invalid (‘k’ > |S|), output “INVALID” (quotes for clarity) for that case.

Constraints:
1<=n<=50
1<=mi<=2000
1<=q<=500
1<=k<=1000000000

Sample Input:
2
aab
aac
3
3
8
23

Sample Output:
aab
c
INVALID

Explanation:

For the sample test case, we have 2 strings “aab” and “aac”.
S1 = {“a”, “aa”, “aab”, “ab”, “b”} . These are the 5 unique substrings of “aab”.
S2 = {“a”, “aa”, “aac”,  “ac”, “c” } . These are the 5 unique substrings of “aac”.
Now, S = {S1 U S2} = {“a”, “aa”, “aab”, “aac”, “ab”, “ac”, “b”, “c”}. Totally, 8 unique strings are present in the set ‘S’. 
The lexicographically 3rd smallest string in ‘S’ is “aab” and the lexicographically 8th smallest string in ‘S’ is “c”. Since there are only 8 distinct substrings, the answer to the last query is “INVALID”.

“””

import sys

def getline():
“”” interface for getting lines from some source
“””
return sys.stdin.readline()

def substrings(str):
 “”” generator: all substrings
 “””
 n_s = len(str)
 for start in xrange(n_s):
  for end in xrange(start + 1, n_s + 1):
   yield str[start:end]

# get input data and construct S

unique_set = set()
n = int(getline())
for i in xrange(n):
for substr in substrings(getline().strip()):
  unique_set.add(substr)
S = sorted(unique_set)

# get queries and output result strings for them
for i in xrange(int(getline())):
try:
  print S[int(getline()) – 1]
except IndexError:
  print ‘INVALID’

 

One year ago I’ve posted short note about imperative vs. functional approaches in python. Well, that one was incorrect 😉

The task was to get all unique tokens from several string separated by some delimiter.

regions_strs = (“Some, ST / More, HERE”, “Some, ST / More2, HERE2”)

So, result should be: (‘Some, ST’, ‘More, HERE’, ‘More2, HERE2’)

# 1. classic imperative solution
regions = set()
for s in regions_strs:
  for token in s.split(‘/’):
    regions.add(token.strip())

# 2. very bad still imperative ugly solution
regions = reduce(
  lambda regions_set, regions_str:
    reduce(
        lambda regions_set, region:
            regions_set | set((region.strip(),)),
        regions_str.split(‘/’),
        regions_set),
    regions_strs,
    set())

# 3. really functional way
regions = iunique(token.strip() for token in
    itertools.chain(*(s.split(‘/’) for s in regions_strs))
)

where iunique is a special generator that used as filter for abstract iterator. It’s a pure tool like those from itertools and collections standard python toolsets.

def iunique(iterable, key=lambda x: x):
  “”” unique filter “””
  seen = set()
  for elem, ekey in ((e, key(e)) for e in iterable):
    if ekey not in seen:
      yield elem
      seen.add(ekey)

2’nd implementation is still imerative because it provides the result as data. 3-rd approach provides iterator instead. So, real functional programming way is providing not data result but function. Result is something that can be evaluated/calculated to obtain actual result.

Besides, 2-nd code operates by data in its implementation and 3-rd functional code uses only source data (tuple of source strings and separator).

2-nd solution uses functional sugar like lambda, reduce, split, but those ones only made code ugly and huge. Keep in mind, the code should be as small as possible and very clear.

Python has powerful functional tools like iterators and generators; standard packages itertools and collections. So, it’s ok to do some functional programming in python that would be clear, correct and robust.

In game development performance is very important factor, especially in case of hi-load on-line multi player games in social networks. As we know one of useful tricks is to hard code as much as possible to avoid huge reading or initialization.

In the other hand, it’s hard to support such hard-coded solution (they even call it anti-pattern). For example, we have list of items in the game. It should be the same on servers side and on client side. If list is pretty long then it’s easy to make a fatal mistake, so, lists would be different. Better approach is to hold all the data in single place handy to modify/support and automation system to synchronize the data with all other places in the source code. There are a lot of benefits. As it’s single place then it’s easy to support and avoid difference mistakes. As it automated then we don’t need developer to spend his time to update it.

I created such system. I called it transistor. There are 3 parts:

  • The python script transistor.py
  • Data composed into CSV
  • Source code with hard coded section and template to fill it

We can use one or more CSV tables. One table may have reference(s) as index(es) to another. CSV content should be available via HTTP or locally. I use Google Docs Spreadsheets for that. First line of each CSV table should have case-sensitive fields (columns) names. Other rows have the data. Later the script will parse those data into the list of dictionaries.

My first try was using of pure python string formatting as template. It means that the template and its logic were in python code on the script side. There were bad design and huge ugly implementation. Well, if you have a lot of code then it is almost always bad architecture. There should be very small amount of code. So, I declined that prototype. The second (and the best) design approach was to make that script universal and move all specific template data and logic to the template situated in the same source code file (in comments).

/*// gunTypes begin{% set gunTypes = tables.gunTypes -%}{%- for g in gunTypes -%}{%- if g.shoot_width -%}{%- if loop.index0 > 0 %},{% endif -%}new GunType({{g.id}}, new Shape({{g.base_width}}, {{g.base_height}}, new byte[]{                        {{g.base_shape|replace('n', 'n                                                        ')}}        }), new Shape({{g.shoot_width}}, {{g.shoot_height}}, new byte[]{                        {{g.shoot_shape|replace('n', 'n                                                        ')}}        })){%- endif %}{%- endfor %}// gunTypes template end */        // ... here would be generated content ...// gunTypes end

Hard-coded section in the source code file has header, footer and template delimiter. There are the template between header and template delimiter and the generated content between delimiter and the footer. Usage example:

python transistor.py --file=GunTypesDict.cs --section=gunTypes                --table=gunTypes                --url="https://docs.google.com/spreadsheet/pub?...&output=csv"
or more complicated with 2 tables:
python transistor.py --file=LocationsDict.cs --section=locations                --table=fields --table=locations                --url="https://docs.google.com/spreadsheet/pub?...&output=csv"                --url="https://docs.google.com/spreadsheet/pub?...&output=csv"
After such data publishing we have the same source code like:
/*// gunTypes begin{% set gunTypes = tables.gunTypes -%}{%- for g in gunTypes -%}{%- if g.shoot_width -%}{%- if loop.index0 > 0 %},{% endif -%}new GunType({{g.id}}, new Shape({{g.base_width}}, {{g.base_height}}, new byte[]{                        {{g.base_shape|replace('n', 'n                                                        ')}}        }), new Shape({{g.shoot_width}}, {{g.shoot_height}}, new byte[]{                        {{g.shoot_shape|replace('n', 'n                                                        ')}}        })){%- endif %}{%- endfor %}// gunTypes template end */new GunType(0, new Shape(1, 2, new byte[]{                        1,1        }), new Shape(1, 1, new byte[]{                        5        })),new GunType(1, new Shape(2, 1, new byte[]{                        1,1        }), new Shape(2, 2, new byte[]{                        0,5,                        5,0        }))// gunTypes end

If we take some text and shuffle a bit the letters in the words like this one below then we still be able to read it. Try:

If we tkae smoe txte and suhffel a bit the tletesr in the wodrs ilke htis one bwloe thne we sltil be albe to raed it.

It’s just one more magic feature of our mind. Well, all people can’t do that but about 50% can.

I implemented python script text_distor.py to shuffle the text in this way. It handles punctuation correctly, so, you have variant of the text exactly as it needs for such cognitive test.

Reverse algorythm to restore original text more interesting as it’s image recognition task. So, I’m going to do it as well when I have more free time.