If you are just starting with Python I hope the small problems in this blog are helpful. All you need to solve or follow them is very basic knowledge of Python and curiosity.
Each post uses trinkets to run python on the browser. You can modify and run the code on the blog post itself.

Saturday, May 23, 2015

Run Length Encoding a Swan

Run Length Encoding (RLE) is a simple data compression scheme.
This scheme compresses strings by replacing consecutive characters by the number of occurrences of the character followed by that character.
For example aaaBBccD@@@ would be represented as 3a2B2c1D3@ or 3a2B2cD3@, i.e. the number 1 can be omitted when, like above with D, there is only one character. In this post we will leave the 1 out.
Each sequence of a repeated character is called a run thus the name Run Length Encoding.

RLE or more complicated data compression schemes  make modern communication possible: by reducing the size of images and videos by 20 times or more, the compressed image/video is much faster and cheaper to transmit.
Consider the following swan "image"


The image consists of only the 11 symbols

#, ?, %, . , S, +, ., *, :, , , @

adding up to a total of 6400 characters (100 characters on each of 64 rows).
We can easily see that compressing this image using RLE will achieve a significant compression rate.


How do we perform such compression?
First in the trinket code below notice how we have 2 files: one is the usual main.py and the other is named data.py.
This data.py file contains a single list called swan. Each element in the list is a row of the swan image above. So this list has 64 elements (64 rows) each with a string 100 characters long.

Access to this list from main.py is achieved through the line

import data

at the top of main.py. Then whenever we want to access the swan list we do

swan = data.swan 

We then loop through every element of the swan list. For each element we loop through each one of its characters and keep a running counter of how many consecutive identical characters there are. Once we detect a change of character we append the count and the character it counts to a string variable (rowEncoded) representing the RLE of the row, reset the counter to 1 and keep counting how many of the new character there are. If there is only one character we do not prepend the count to the character as mentioned above.

This becomes perhaps clearer looking at the code

import data

swan = data.swan

Given an image (represented by a list of its rows)
Run Length Encodes each row of the image and returns
the result as a list of the encoded image rows
def encode(image):
    imageEncoded = []
    # loop through every image row
    for row in image:
        rowEncoded = ''
        count = 0
        char = row[0]
        # loop through every character in the row
        # and count how many consecutive elements of 
        # that character are on that image row
        for character in row:
            # switching to a new character
            if  char != character:
               # dont add 1 
               if count == 1:
                   rowEncoded += char
                   rowEncoded += str(count) + char
               char = character
               count = 1
               count += 1
        # encode the last character in the row
        if count == 1:
            rowEncoded += char
            rowEncoded += str(count) + char
        # add encoded row to list holding the image encoding
    return imageEncoded    
imageEncoded = encode(swan)

With this encoding the first 5 rows of the above swan image become

Notice how the characters in each line add up always to 100 which is the number of characters in each row of the original image.

In total there are now 1196 characters instead of the 6400 in the original image, i.e.  1196/6400 = 0.186 or 18.6 percent of the original characters.


To decode an image encoded with RLE (the inverse of what we just did) we again loop through each element of the list holding the encoded rows. For each such element we loop through its characters.
If the character is a digit we hold it in a counter variable. Otherwise we append the character following the digit as many times as the digit. If we find a character without a digit preceding it we assume it only appears once and thus we only append it once.

Some important details are in order in this process:  a character might appear more than 9 times in a row, i.e. 2 or more digits. For example above the second row is 18@3,79@. That is 18 @(s) followed by 3 , followed by 79 @(s). As we scan each of the characters we come across the 1 and store it in our count variable. The next character is also a digit, 8 so we want to store in our count variable

count = count*10+8 

we need to remember though that 1 and 8 will be represented by their ASCII characters (49 and 56).
Therefore we should instead do, in python,

count = count*10 + ord(character) - ord('0')

since the python function ord will return the ascii (more properly the unicode) value of the character we pass it to it.

Also because of the fact that the count can be greater than 9, we will assume for simplicity that our string does not have any numbers (otherwise a string like 222d that would be converted to 32d would seem to imply that d appears 32 times).

The decode code in python is then

Given a run length encoded image (represented by a list of its rows)
decode each row of the image and return
the result as a list of the unencoded image rows. It also
returns the total number of characters in both the 
original encoded image and the unencoded image
def decode(encodedImage):
    imageDecoded = []
    # loop through every image row
    for row in encodedImage:
        rowDecoded = ''
        count = 0
        char = row[0]
        # loop through every character in the row
        # and count how many consecutive elements of 
        # that character are on that image row
        for character in row:
            # switching to a new character
            if  character.isdigit():
               count = count*10 + ord(character)-ord('0')
               if count == 0:
                   count = 1
               rowDecoded += character*count
               count = 0
        # add decoded row to list holding the image decoding      
    return imageDecoded
Note how we do

rowDecoded += character*count

This will append count number of character to the string rowDecoded. It is a very short way of doing what we want rather than the more verbose way of looping count times and adding character each time to rowDecoded.

Assembling the encode and decode methods in a class RunLengthEncoding and a few tests in a Test class we end this post with the following python code

Friday, May 15, 2015

At how many intersections can the candy shop be?

The Reminiscences of Dr. Watson

During the first week or so we had no callers, and I had begun to think that my companion was as friendless a man as I was myself.
Presently, however, I found that he had many acquaintances, and those in the most different classes of society.
There was one little sallow rat-faced, dark-eyed fellow who was introduced to me as Mr. Lestrade, and who came three or four times in a single week.
One morning a young girl called, fashionably dressed, and stayed for half an hour or more.
The same afternoon brought a grey-headed, seedy visitor,  who appeared to me to be much excited, and who was closely followed by a slipshod elderly woman.
On another occasion an old white-haired gentleman had an interview with my companion; and on another a railway porter in his velveteen uniform.

When any of these nondescript individuals put in an appearance, Sherlock Holmes used to beg for the use of the sitting-room, and I would retire to my bed-room.
He always apologized to me for putting me to this inconvenience. "I have to use this room as a place of business," he said, "and these people are my clients."
Again I had an opportunity of asking him a point blank question, and again my delicacy prevented me from forcing another man to confide in me.
I imagined at the time that he had some strong reason for not alluding to it, but he soon dispelled the idea by coming round to the subject of his own accord.
It was upon the 4th of March, as I have good reason to remember, that I rose somewhat earlier than usual, and found that Sherlock Holmes had not yet finished his breakfast.
The landlady had become so accustomed to my late habits that my place had not been laid nor my coffee prepared.
With the unreasonable petulance of mankind I rang the bell and gave a curt intimation that I was ready. Then I picked up a magazine from the table and attempted to while away the time with it, while my companion munched silently at his toast.
Suddenly several strong knocks on the door arose me from my stupor. My companion sprang up quickly and opened the door. A very old gentleman stood at the entrance.
Mr. Sherlock Holmes motioned the visitor to the sitting-room and turning to me said "Watson would you be so kind as to follow us?".
 A bit intrigued, I inspected the old man carefully; despite his age, he moved confidently and still looked as if he was at the top of his faculties. There was, however, a sadness in his gaze.
"This is Mr. Radnoor" introduced my companion, "and this is Dr. Watson." "If am not mistaken Mr. Radnoor's story is of the particularly unusual sort". Sherlock pointed the visitor to one of the chairs, and we all sat. "Thank you Mr. Holmes." After a short pause and a sigh the old man started his story.
"It all started many many years ago when I was a small boy growing up in Stadford.
 I did small jobs for Dr. Cavetar that often carried me all across town.
During one of these assignments I came across a candy shop with the most unusual assortment of candy I had ever seen.
Behind the counter was the prettiest girl I had ever seen as well. Timidly at first, then growing in confidence I ordered this and that candy and soon I had enough for a month. The young lady neatly packed my order and handed it to me with the most beautiful of smiles.
Stuttering, I thanked her and walked away.
In the next few days whenever I ate one of the candies I couldn't dispel the memory of the girl's smile.
I returned often to the store and in due time the beautiful lady became Mrs. Radnoor.
We moved across country, then to India and lived a wonderful life together." 
At this point, the old man pulled his handkerchief and wiped away a tear. 
"Mrs. Radnoor died a week ago." he explained. 
"I have not had a night's sleep since then. I would like to return to the candy shop of my youth were it all started. It has been many years though and my memory is not what it used to be. I do not remember where exactly the shop is located and I thought you could help me."
Sherlock Holmes bent forward and prompted the gentleman "Please do tell us everything you still remember about that shop."

Mr. Radnoor looked pensive before saying "These are the facts I remember:"
  • "I do remember that from any given point (x,y) in the city of Statford I could move to the four neighboring (x-1,y), (x,y-1), (x+1,y),(x,y+1) points in a quarter hour.
  • I also recall that the ith time I went to the candy shop I went from point (x[i],y[i]) and drove for not more than a quarter hour * L[i]. 
  • The candy shop was located at a point with integer coordinates. 
  • I never walked more than 5 hours-this happened on the day I asked Mrs. Radnoor to marry me- (i.e. L[i] was at most 20).
  • The coordinates (x[i],y[i]) are within the rectangle (-20, 20), i.e. -20<=x[i], y[i]<= 20).
  •  Dr. Cavetar's office was at (0,0).
 I am afraid that is all I can tell you".

Sherlock Holmes rose and guaranteed to Mr. Radnoor that he would have a solution with the number of possible locations as well as corresponding paths from the starting position to those candy shop locations by next day. The gentleman's eyes glinted with gratitude and he thanked my companion profusely.

"What do you make of it, Watson?". 
"Well, lets say he did one visit to the candy shop and walked for at most a quarter hour. We know that he either didn't move, in which case the candy shop would be at Dr. Cavetar's office or he moved once in which case the candy shop could be at (1,0),(0,1),(-1,0),(0,-1). I.e. the candy shop could be at 5 different locations.
If he walked for at most half an hour, then besides these 5 locations he could have from each of the one step locations reached 8 more making it 13 locations. That is an awful lot of locations. How do you think we can help him?".
Dr. Watson second example: 13 locations for the candy shop. The paths in red take half an hour to reach from (0,0). The paths in black take only a quarter of an hour.

"Do not be so pessimistic Watson. The case can be less complicated. Lets imagine Mr. Radnoor starts at point (2,1) in his first visit and walks at most half an hour. Then we have 13 possible locations as you pointed out. In his second visit lets say he starts at (3,-1) and also walks at most half an hour. Then among the 13 possible locations from (2,1) only those that are reachable from (3,-1) are possible locations for the shop. There are only 4 locations like that: (2,-1), (3,0), (3,1) and (2,0)."
Sherlock example: There are only 4 possible locations for the candy shop. (3,1) and (2,0)  takes a quarter of an hour to reach from (2,1) but half an hour from (3,-1). On the contrary (2,-1)  and (3,0) takes a quarter of an hour to reach from (3,-1) but half an hour from (2,1).
"Interesting example Holmes. If he went a third time to the shop from (5,0) and walked at most three quarters of an hour, there would then be only 3 possible locations: (3,0), (3,1) and (2,0). He could not reach (2,-1) from (5,0) in at most three steps." 

"Exactly my dear Watson. How to solve the problem in general you may ask. To find the number of possible locations of the candy shop is elementary. Here it is the thought process:

  • Let (csx, csy) be the location of the candy shop (these are integers!). The maximum values of csx and csy are (x[i],y[i]) + L[i] since at most the gentleman moves L[i] units. Since x[i],y[i] and L[i] can at most be 20, this comes to a maximum value of 40. 
  • Similarly the minimum value of csx,csy will be -40 (when x[i]= -20 or/and  y[i]=-20 and L[i]=20 and we move 20 steps in the negative direction).
  • So we can  consider every single point inside or on the boundary of the rectangle -40, 40 as a candidate for the candy shop location. 
  • For each such point (csx,csy) in the rectangle, lets define the distance between the candy shop and (x[i],y[i]) as abs(x[i]-csx)+abs(y[i]-csy). This is really the number of quarter hours Mr. Radnoor walks from (x[i],y[i]). We know this number is at most L[i].
  • (csx,csy) will only be a valid location if this distance (which is called the Manhattan distance) is smaller or equal than L[i] for every i."
This translates to Python as 
def countPossibleLocations(self, P, L):
        count = 0
        candy_shop_locations = []
        for csx in range(-41,41):
           for csy in range(-41, 41):
               possible = True
               # For each possible (csx,csy) compute the Manhattan distance to 
               # each point (x[i],y[i]). If that distance is <= than L[i]
               # for every (x[[i],y[i]) then (csx,csy) is a possible location
               for i in range(0, len(P)):
                  manhattan = abs(P[i][0] - csx) + abs(P[i][1] - csy)
                  possible = possible and (manhattan <= L[i])
               if possible:
                  count += 1
        return count,candy_shop_locations

Notice how we store the coordinates of the possible candy shop locations in a list that is also returned by this method.

To find the paths that lead from each starting point (x[i],y[i]) to each (csx,csy) candy shop possible location in at most L[i] number of steps is also elementary: The Manhattan distance between (csx,csy)  and a starting point is the shortest distance between those two points. We know from the discussion above that this distance is always <=L[i].
A possible path from (x[i],y[i]) with this length is to move along say the x direction until the x coordinate matches csx and then move along the y direction until the y coordinate matches csy.
The only other detail we need to pay attention is that if csx is to the left of x[i] we need to move left otherwise we move right. Similarly for csy and y[i] (up or down).
  In Python
    def findPaths(self,Cs,P):       
        paths = []
        for i in range(0,len(P)):           
            for j in range(0,len(Cs)):
                path = []
                # move left
                if Cs[j][0]< P[i][0]:
                    stepx = -1
                # move right
                    stepx = 1
                stepsx = 0
                #go along the x axis until Cs[j][0] and P[i][0] match
                for x in range(0,abs(Cs[j][0]-P[i][0])):
                    stepsx += stepx
                # move down
                if Cs[j][1]< P[i][1]:
                    stepy = -1
                # move up
                    stepy = 1
                stepsy = 0
                #go along the y axis until Cs[j][1] and P[i][1] match
                for y in range(0,abs(Cs[j][1]-P[i][1])):
                    stepsy += stepy
                    path.append([P[i][0] + stepsx,P[i][1]+stepsy])
        return paths
For example for the example that you talked about before Watson where the starting locations (x[[i],y[i]) and the maximum number of steps L[i] are 
[(2, 1), (3, -1), (5, 0)] num of steps [2, 2, 3]
The possible candy shop locations are
Locations [(2, 0), (3, 0), (3, 1)]
And the possible shortest paths as computed by the method above will be
Possible paths 
[[2, 1], [2, 0]]
[[2, 1], [3, 1], [3, 0]]
[[2, 1], [3, 1]]
[[3, -1], [2, -1], [2, 0]]
[[3, -1], [3, 0]]
[[3, -1], [3, 0], [3, 1]]
[[5, 0], [4, 0], [3, 0], [2, 0]]
[[5, 0], [4, 0], [3, 0]]
[[5, 0], [4, 0], [3, 0], [3, 1]]
The first, third and fifth paths takes only a quarter hour (one step). 
The second, fourth, sixth and eighth paths take half an hour (2 steps).
The seventh and ninth paths take three quarters of an hour (3 steps).
One final point is that these paths are not the only possible shortest paths. 
They always follow first along the x axis then along the y axis. 
Another possibility would be to always start along the y axis and once the y coordinates match start moving along the x axis. 
And many other shortests paths exist that result from moving sometimes along the x axis sometimes along the y axis in a non predictable way (Mr. Radnoor could move in a random direction each step as long as he goes from (x[i],y[i]) to (csx,csy). 

Gathering all the pieces we have the following Python code

Monday, May 11, 2015

Blackjack wins and losses

Blackjack, also known as twenty-one, is the most widely played casino game in the world.
The basic rules of Blackjack are (there are several variations/extensions of these rules)
  • The goal of blackjack is to beat the dealer's hand without going over 21.
  • Face cards are worth 10. Aces are worth 1 or 11, whichever makes a better hand. All other cards are worth their numeric value.
  • Each player starts with two cards, one of the dealer's cards is hidden until the end. The player and dealer add the value of their two cards to figure out the value they have.
  • To 'Hit' is to ask for another card. To 'Stand' is to hold your total and end your turn.
  • If a player goes over 21 he/she busts, and the dealer wins regardless of the dealer's hand.
  • If a player scores lower than the dealer then he/she loses his/her bet.
  • If the player scores higher than the dealer or the dealer goes over 21, and the player does not go over 21, then the player wins his bet (in addition to keeping his original wager).
  • If the player is dealt 21 from the start (Ace & 10), he/she has a blackjack.
  • Blackjack usually means the player wins 1.5 the amount of his/her bet. Depends on the casino.
  • If both dealer and player have a blackjack, then the hand is a push. Note that if the dealer has a blackjack, and the player has a 21 (but does not have blackjack), then the dealer wins, and the player loses his wager.
  • Dealer will hit until his/her cards total 17 or higher.
Imagine we would like to know the amount of money a player wins or loses in a hand. We know the bet the player put on, the dealer and player's score and whether the dealer and the player have a blackjack or not.
There are only four possibilities based on the rules above: 

  • If the player has a blackjack and the dealer does not have a blackjack the player wins 3/2*bet.
  • If the player and the dealer both have a blackjack the player keeps his wager and neither side loses, so the player wins 0.
  • If the player goes over 21, he or she loses -bet.
  • If the player scores higher than the dealer or the dealer goes over 21 (but the player does not go over 21) the player wins +bet.

To avoid rounding issues when the player wins with blackjack, lets assume that the bet is always an even number of dollars. Then the four possible outcomes can be translated to a set of if statements in Python as follows
       def blackjackWinsAndLosses(self, bet, dealerScore, playerScore, dealerHasBlackjack, playerHasBlackjack):
            # player loses
            if playerScore > 21 or (playerScore < dealerScore and dealerScore<=21):
                return -bet
            # neither player nor dealer win
            if dealerHasBlackjack and playerHasBlackjack:
                return 0
            # dealer has blackjack but player does not, player loses
            if dealerHasBlackjack and not playerHasBlackjack:
                return -bet
            # player has blackjack but dealer does not
            if playerHasBlackjack and not dealerHasBlackjack:
                return 3/2.0*bet
            # dealer has more than 21 or player's score is higher than dealer
            if dealerScore > 21 or playerScore > dealerScore:
                return bet
            return 0
The only tricky part of this problem is taking into account all the conditions correctly.
For example, if we exchanged the last if statement with the first if statement we would get an incorrect result if the dealer had a blackjack (resulting in a loss for the player instead of a win).
Another potential source of confusion is if both player and dealer have more than 21. In that case the player loses even if its score is higher than the dealer's score.
The following Python code gathers all the pieces and tests the method above.

Tuesday, May 5, 2015

How tall is my tower?

Little Nicole just turned one. 
She can stand for the first time. 
She wonders the heights she can reach.
She peers through her dad's bookshelves.
She pulls the books one by one and stacks them up. 
She asks her mom for water and tries to grab the cup from the cupboard.
At night she stands on her crib and demands that a story be told or else she will climb out of it.
But her favorite pastime is to build a tower with her wooden blocks.
Her blocks have numbers 0,1,2,...painted on them and the most curious thing is that they have different heights. The heights are always an integer number of inches.
Little Nicole always piles the blocks in a single column and in order with the bottom block having the lowest number (0) and the top block having the highest number n-1. 

She stacks the blocks one by one and becomes ever more careful as she approaches the  highest she can reach. 
Sometimes there are accidents halfway. She gets too excited picking and placing the blocks, tumbles and the whole enterprise comes crashing down with Nicole.
Sometimes there are accidents because the tower gets too tall and little Nicole cant reach higher. When she tries to put one more while standing on her tip toes she loses her balance and it is a big disaster. 
Sometimes she will climb on top of her little chair and keep adding blocks. But with each new block the tower gets more and more tilted and suddenly boom the top half falls down.
One day her dad splits the blocks into two groups. One of the groups has only blocks with even heights. The other has only blocks with odd height.
He then gives Nicole a challenge: 
Following her rules (see above) build the tallest possible tower without ever putting a block of even height on top of a block of odd height. 
For example if block 0 has height 4 and block 1 has height 7 the tallest tower would be 11 with 4 at the bottom. But if we reverse the heights, i.e. block 0 has height 7 and block 1 has height 4, the tallest tower would be 7 because block 1 has an even height and cant go on top of block 0 which has an odd height.

Little Nicole starts eagerly alternating blocks from either group on top of each other. Her dad promptly tells her this will not work. 
He gives her a few hints: 

  • If she picks an odd height block first then all the blocks will have to be odd. Depending on the blocks' heights this might (or not) make the tallest tower.
  • If she picks an even block first it is possible that the tallest tower might be made entirely of even height blocks.
  • In general if she starts with an even block the pattern will be even, even,...,even, odd, odd,...,odd, i.e. once she switches from even to odd she can not switch back to even.
  • There might be several towers all with the same height but different placement of the even and odd blocks. 

He then shows her some examples with 2 and 3 blocks.
Nicole's dad realizes that it is not obvious how to find the sequence that yields the tallest tower when dealing with four or more blocks and decides to write some code to help him obtain this sequence given a set of blocks' heights.

A simple strategy to find the tallest tower is 
a) Compute how tall is the height of a tower with only n odd height blocks. 
b) Then compute the height of a tower with only one even height block at the bottom and n-1 odd height blocks above it. 
c) Then compute the height of a tower with only two even height blocks at the bottom and n-2 odd height blocks above these two. 
d) And so on until all the blocks have an even height. 
If we keep track of the tallest tower of these we will get the answer.

We can ask ourselves how does this compare to the tallest tower built under the reverse rule that an odd height block can not go on top of an even height block.  

And what if little Nicole simply put the blocks in order from 0 (at the bottom) to n (at the top)? We can convince ourselves that this will always result in a tower as tall as possible. That's because this tower will use all the blocks while the two other procedures might not.

The following code computes, for a given number of blocks with differing heights, what the tallest tower is under each of the three approaches: even blocks can never go on top of odd ones, odd blocks can never go on top of even ones or simply stack the blocks from label 0 to label n-1. It also displays the respective towers. 
For example in the last test the blocks are (from label 0 to label 10)

[44, 3, 44, 3, 44, 47, 2, 47, 2, 47, 2]

The highest tower with no even height block on top of an odd height block is 44,44,44,47,47,47. The tallest tower with no odd height block on top of an even height block is 3,3,47,47,47,2. But notice that the tower 47,47,47,3,3,2 has the same height as does the tower 3,47,3,47,47,2.
Basically given a tallest tower, any permutation of the even height blocks and any permutation of the odd height blocks will lead to a tower with the same (tallest) height.