I picked up the two sheets that lay at the bottom of the table I was eating.
It looked like a story or some sort of reminiscence. Curious, I began to read it.

###

###

###

Besides the three previous files there is now a fourth file QueueClass.py which contains a simple implementation of a Queue in python. A Queue (implemented using a list) is a data structure that accepts new elements at its end (back) but removes them from its front.*Everybody knows who the five of us are: Phoebe, Diane, CJ, Charlotte and Sydney. Our hair is blonde or brown or black or red, straight or curly, always movie-perfect. When we run our fingers through the silky strands or fat curls and hold it away from our faces you can just catch a glimpse of our striking eyes. When you do, you get shivers at the mere sight of them.*

*We sit on the benches outside school after a long day, seeing everyone who walks past us, in and out of our 200-year-old school. We sweep over you with our eyes as if you were an uninteresting landscape. We've seen everything the world has to offer, and we've dismissed it.*

*Our book bags spill into the corridor in front of us. They are our prized possessions. We reach into them to refold twenties into our Coach leather wallets, to idly rearrange a silk sweater that matches our socks, to lift and complain about that all-around too heavy bio textbook. We mention the biology teacher's name and flutter our lashes, rolling our eyes. We also discuss the theater teacher, and that one English teacher.*

*You can't get enough of us. You've seen girls like us every step of the way through school. We're way out of your league. We know we are royalty among a sea of plebeians.*

*We walk in the formation of migrating geese. Phoebe is at the head, with Sydney and Diane on her left and right, Charlotte and CJ last. Only Charlotte cares that she's last. We haven't figured out what CJ cares about; we don't spend much brain time on the subject.*

*Phoebe is in the student association board and today she is agitated. It seems in the board meeting tasked with dividing student's fees among different school groups, the geek girls claimed their programming club should get more money then our fashion group. Phoebe rightly claimed that our fashion soirees attracted far more people than the geeky programmer club could ever dream. Words flew and Phoebe at some point tossed out "We are far more popular than you little geeks, and that is enough to make our case".**But the geeks retorted "You might think that but the facts do not support your thesis."**A committee was setup to decide the popularity question. It was agreed that if indeed we are the most popular our club will receive more money than the programming club, and otherwise they will get it.*###
**Greatest Number of 2-Friends**

**Greatest Number of 2-Friends**

*The popularity measure we all agreed on was the number of friends or friends of friends of each student.**If our group had a higher number of 2-friends than the other group, we would win. Phoebe tasks Sydney and Diane to work with two girls from the programming club, Cathleen and Mira, to find out the number of 2-friends of each student of our school.*

*In their talks with**Cathleen and Mira,**Sydney and Diane spent some time learning about ways of representing friendships.**The first way is by means of an undirected graph.**Below is the friendship graph for the five of us, Cathleen and Mira, and Jasmine, Roberta and Michaela.*

*An edge means the two girls connected are friends. Absence of an edge means absence of friendship.**At the top of the graph there is our group. All girls are friends with each other. CJ surprisingly also became friends with Cathleen. Roberta and Mira are also friends with Cathleen.**Jasmine has only one direct friend which is Roberta. However because Roberta is friends with Michaela and Cathleen, Jasmine has three 2-friends.**CJ has five direct friends but because Mira and Roberta are friends with Cathleen she has seven 2 friends. Sydney, Daphne, Diane and Phoebe have four direct friends but because CJ is friends with Cathleen they all have five 2-friends.**Cathleen has 3 direct friends but through CJ she gets 4 more 2-friends and through Mira and Roberta she gets 2 more.**Based on this set of students, she would have the most 2-friends (9 = 3 direct + 6 friends of friends). CJ would be second with 7.**Here it is the ranking*Student | friends | friends of friends | 2-friends |
---|---|---|---|

Cathleen | 3 | 6 | 9 |

CJ | 5 | 2 | 7 |

Sydney, Daphne, Diane and Phoebe | 4 | 1 | 5 |

Roberta | 3 | 2 | 5 |

Mira, Michaela | 2 | 2 | 4 |

Jasmine | 1 | 2 | 3 |

*To decide if student A is a 2-friend of B the key is to check whether A and B are separated by at most two edges (i.e. whether there is a path between A and B with one or two edges). If there is they are 2-friends otherwise they are not.**According with the above table our team has a total of 27 two-friends. The remaining five girls have 21 two-friends. So if we based the results on just these 10 people our team would win.*

*Graphs are a nice visual representation of friendships but to compute the 2-friends of the hundreds of students required using a different representation called an adjacency matrix. In it, each student is numbered from 0 to n-1 with n being the number of students.**Student i (0 <= i<n) friends are represented by a string of n characters. Character j is T(True) if students i,j are friends and F(False) otherwise.**This string is element i in a list we will call**friends.**Lets illustrate this by translating the graph to this representation.**We number Daphne as 0, Diane as 1, Sydney as 2, Phoebe as 3, CJ as 4, Cathleen as 5, Mira as 6, Roberta as 7, Michaela as 8 and Jasmine as 9.**Then from the graph above we can see that the string representation for Daphne, is**friends[0]='FTTTTFFFFF'.**For Diane it would be friends[1]='TFTTTFFFFF'.**For Jasmine it would be friends[9]='FFFFFFFTFF'.**The complete list would be**friends = ['FTTTTFFFFF', # Daphne as 0*'TFTTTFFFFF', # Diane as 1'TTFTTFFFFF', # Sydney as 2'TTTFTFFFFF', # Phoebe as 3'TTTTFTFFFF', # CJ as 4'FFFFTFTTFF', # Cathleen as 5'FFFFFTFFTF', # Mira as 6'FFFFFTFFTT', # Roberta as 7'FFFFFFTTFF', # Michaela as 8'FFFFFFFTFF'] # Jasmine as 9

*Note how in this representation we don't consider a student to be friends with itself and so friends[0][0]='F', friends[1][1]='F', etc. (This is because, in this problem, we are only interested in 2 or higher friendship and a person being friends with herself is 1-friends).*

*Clearly the adjacency matrix friends[i][j] is symmetric (i.e. friends[i][j]=friends[j][i]).**It is a somewhat memory wasteful representation because most of the entries will be 'F'. In practice we could only keep the entries that are 'T' since those are the only relevant for the 2-friendship computation.*

*Given the list friends, a simple python method to compute the largest number of 2 friends, as well as the number of 2 friends of each person is*def mostTwoFriends(self,friends): best = 0 self.nFriends = {} for i in range(0,len(friends)): count = 0 for j in range(0,len(friends[i])): if i == j: continue # direct friends if friends[i][j] == 'T': count += 1 else: # check whether i and j have a friend k in common for k in range(0, len(friends)): if k == i or k == j: continue if friends[i][k] == 'T' and friends[j][k] == 'T': count += 1 break self.nFriends[i] = count best = max(best, count) return best

*nFriends is a dictionary whose keys are the person number and whose values are the number of 2-friends for each person.**A brief analysis of the code: We are trying to determine if i and j are 2-friends, i.e. if they are related directly or through a k common friend.**So for each pair of people i,j, we check first if they are direct friends. If they are we count it and move to the next pair. If they are not we check whether i,j have a friend k in common (the k loop).**If they are we count it and bail out**, i.e. we break out of the k loop above.**Notice that it is possible that i,j have more than one k friend in common.**For example Cathleen and Michaela have Mira and Roberta as common friends. The method above only counts one of these because once we figure out that Cathleen and Michaela are related through say Mira we know they are 2-friends count it and skip the other k friend.*###
**Greatest Number of 3-Friends**

**Greatest Number of 3-Friends**

*It turns out that the two groups tied up under the 2-friends popularity measure. It was proposed that we go one step further, measure the 3-friends number for each student and use it to untie.**Lets say you are person 0.**3-friends counts person 1 as your friend if you are her friend, or there is a person 2 that is your friend and is friends with person 1 or there is a person 3 that is friends with person 2 which in turn is friends with person 1.**Looking at the graph above the number of 3-friends for each girl is now*Student | friends | friends of friends | friends of friends of friends | 3-friends |
---|---|---|---|---|

Cathleen | 3 | 6 | 0 | 9 |

CJ | 5 | 2 | 2 | 9 |

Roberta | 3 | 2 | 4 | 9 |

Mira | 2 | 2 | 5 | 9 |

Sydney, Daphne, Diane and Phoebe | 4 | 1 | 2 | 7 |

Michaela | 2 | 2 | 1 | 5 |

Jasmine | 1 | 2 | 2 | 5 |

*To decide if student A is a 3-friend of B the key is to check whether A and B are separated by at most three edges (i.e. whether there is a path between A and B with one, two or three edges). If there is they are 3-friends otherwise they are not.**For this particular group of 10 students there is now a tie: our group has 37 three-friends (versus 27 two-friends) but the other 5 girls also have 37 three-friends (versus 21 two friends).**The maximum number of two and three friends is however the same (9) although there are now more girls with that maximum.**Whereas before Cathleen was the only one with 9 two-friends, now besides her, also CJ , Roberta and Mira have 9 three-friends.**It cant be higher since there are only 10 people and a person can at most have 9 2 and 3-friends.*

*Note that the number of 3-friends for each person will always be at least equal to the number of 2-friends because we are keeping all the 2-friends counts and adding an extra possibility: that i and j can be connected via a path of length 3.*

*We could extend the above method, which checks for paths between i and j of length 1 and 2, to also check for paths of length 3.**We would have to add a fourth nested loop and end up with this convoluted code*def mostThreeFriendsPoor(self,friends): best = 0 self.nFriends = {} for i in range(0,len(friends)): count = 0 for j in range(0,len(friends[i])): if i == j: continue # direct friends if friends[i][j] == 'T': count += 1 else: doneCounting3 = False # check whether i and j have a friend k in common for k in range(0, len(friends)): if doneCounting3: break if k == i or k == j: continue if friends[i][k] == 'T' and friends[j][k] == 'T': count += 1 break else: # check whether i has a friend p which is a friend of k which in turn is a friend of j for p in range(0, len(friends)): if p == k or p == i: continue if friends[i][p] == 'T' and friends[p][k] == 'T' and friends[k][j] == 'T': count += 1 doneCounting3 = True break self.nFriends[i] = count best = max(best, count) return best

*This works but**1) It is quite confusing and messy and most importantly**2) This pattern of nesting more and more loops inside the k loop will become very inefficient and slow if we use it for finding the number of four,five,six friends.**For n-friends it would lead to n+1 nested loops.**Lets try to understand the logic we do in loops k and p:**We are trying to determine if i and j are 3-friends, i.e. if they are related directly, through a k friend or through a k and p friends.**So if they are related through a k friend we count it and bail out (we dont care if k additionally is related to some other friend p which is friends with j).**If however k is not friends with i or j directly we want to see if there is a path of length 3 connecting i and j via k and p.**As soon as we find one we want to bail out from the k and p loops.**Note in particular the extra variable doneCounting3 whose only purpose is to break out of loop k if we find a 3-friend in loop p.**Python break statement only allows us to break from one loop at a time.**In this particular case we would like not just to break from loop p but also from loop k and one (bad) way of doing this is to signal through the boolean variable doneCounting3 loop k so that the upper loop can break out.**To summarize: We are trying to find if i and j are related through a path of length 1,2, or 3. As soon as we find one such path we bail out and move on to the next i,j pair. That is the purpose of the break statements.*

*To further illustrate this consider the above graph.**There are two paths of length 2 between Cathleen and Michaela.**One goes through Mira and the other goes through Roberta.**We only want to count one of them, so as soon as we find one (in the k loop) we get out.**Similarly there are two paths of length 3 between CJ and Michaela one through Mira and the other through Roberta. Once we find one we break from both the k and p loops.*

*We can make the code less messy by breaking it into two methods:*def countPathsOfLengthTwoOrThree(self,count,friends,i,j): # check whether i and j have a friend k in common for k in range(0, len(friends)): if k == i or k == j: continue if friends[i][k] == 'T' and friends[j][k] == 'T': count += 1 return count # check whether i has a friend p which is a friend of k which in turn is a friend of j, # i.e. look for a path of length 3 between i and j for p in range(0, len(friends)): if p == k or p == i: continue if friends[i][p] == 'T' and friends[p][k] == 'T' and friends[k][j] == 'T': count += 1 return count return count def mostThreeFriends(self,friends): best = 0 self.nFriends = {} for i in range(0,len(friends)): count = 0 for j in range(0,len(friends[i])): if i == j: continue # direct friends if friends[i][j] == 'T': count += 1 else: count = self.countPathsOfLengthTwoOrThree(count,friends,i,j) self.nFriends[i] = count best = max(best, count) return best

*Notice how instead of a break when we find a path of length 2 or instead of using the variable doneCounting3 when we find a path of length 3, we simply return from countPathsOfLengthTwoOrThree.**The code is now more readable.*

*Lets assemble the code to compute 2 and 3 friends we have so far.*

There are 3 files:There are 3 files:

*NFriendsClass.py has the class NFriends with the methods described previously.**TestClass.py has the class Test with 22 tests. The very first test corresponds to the graph we have used through this post.**Only the first 7 tests are run, otherwise your browser would freeze.**Press the run button to run those first 7 tests (be prepared to wait a minute).**You can run more or different tests by uncommenting them in the method run_tests.**Running the 22 tests will take 3 to 5 minutes. Your browser might freeze two or three times.**Just let it run and you will be rewarded by the wait.**The third file main.py merely creates a Test object and runs its run_tests method.*###
**Finding the greatest number of k-friends using Breadth First Search (BFS)**

**Finding the greatest number of k-friends using Breadth First Search (BFS)**

*Another tie ensues and again the answer is to go to the computation of the 4-friends of each student.**We should at this point pursue a different strategy that let us compute the k-friends of each student with a single method for any k>=2.**Lets go back to the graph above. Suppose we want to find the k-friends of person 0.**Starting from 0 we visit all the nodes that are directly connected to person 0: person 1,2,3,4.**This gives us 4 k-friends of person 0.**We next visit all the nodes that can be connected to 0 by a path of length 2: we find person 5.**Next we visit all the nodes that can be connected to 0 by a path of length 3: we find persons 6 and 7.**Next we visit all the nodes that can be connected to 0 by a path of length 4: we find persons 8 and 9.**Thus person 0 has 9 four-friends and in fact 9 k-friends for k>=4.**As a curiosity all people in the graph will have 9 k-friends for k>=4.**The graph traversal we just described is known as Breadth First Search (BFS) and gives a way of computing the number of k-friends for every person/node. We just have to loop through every node and traverse the graph starting at that node.**We can, like before, compute the maximum among all such counts.**One final remark: It is possible that besides the number of k-friends for each person we might be interested in knowing how each pair of people i,j is related through a k-friend.**The BFS graph traversal gives us these paths automatically as part of the traversal.**Lets take a look at the code (click the run button to see the number of 2,3,4,5-friends for two graphs: the 10 person graph we started this post and the 50 graph person we end this post with, see below; processing the 50 person graph might take 3-4 minutes and might freeze the browser, just be patient and you should see the result)*

The following two methods in NFriendsClass.py do BFS of a given graph.

def BFS(self,graph,start,end,q): # if start is not connected to any other node # there is nothing to do just return if not graph.has_key(start): return 0 temp_path = [start] q.enqueue(temp_path) while q.IsEmpty() == False: tmp_path = q.dequeue() last_node = tmp_path[len(tmp_path)-1] if last_node == end: if len(tmp_path)-1 <= self.n: self.paths.setdefault(start,[]).append(tmp_path) return 1 else: return 0 for link_node in graph[last_node]: if link_node not in tmp_path: new_path = [] new_path = tmp_path + [link_node] q.enqueue(new_path) return 0 def mostNFriendsBFS(self, friends): # convert list of friends to adjacency list # as a dictionary graph = {} self.nFriends = {} self.paths = {} for i in range(0,len(friends)): for j in range(0,len(friends)): if i == j: continue if friends[i][j]== 'T': if graph.has_key(i): graph[i].append(j) else: graph[i]=[] graph[i].append(j) # empty dictionaries evaluate to false # if the graph is empty there are no paths # so return right away if not graph: return 0 best = 0 for i in range(0,len(friends)): count = 0 for j in range(0,len(friends)): if i == j: continue pathQueue = Queue() count += self.BFS(graph,i, j,pathQueue) self.nFriends[i] = count best = max(best,count) return best

*In the method mostNFriendsBFS we first convert the graph from the adjacency matrix representation we have been using up to now to an adjacency list representation. This representation is more compact for these sparse graphs and it also makes the search slightly faster.*

*Then we loop through every pair of people i,j considering i as the start node and j as the end node of the path in the graph of length at most k (if finding k-friends).*

*For each pair, the method BFS then does the BFS graph traversal from i to j.*

*It places the start node (i.e. i) in a queue, then visits all nodes connected to it and puts the path from i to hose nodes in the queue (paths of length 1).*

*In the second round the method removes one of those paths of length 1 from the queue and visits every node connected to the last node in the path: i.e. visits all paths of length 2 from i.*

*The method exits if the last node in the path it just got from the queue is the same as the end node (i.e. j).*

*Additionally if that end node is on a path of length smaller or equal to k (k-friends search) it adds one to the count of k-friends of i and stores the path from i to j in a dictionary with the start nodes as keys.*

*Otherwise it returns 0 for the count (since i and j are connected by a path of length greater than k and therefore are not k-friends-instead they are k+1 or k+2, etc friends).*

*A look at the TestClass.py and in particular at its __init__ method, reveals that for each test graph we are computing 2,3,4,5-friends. For most test graphs going above 3-friendship does not change the max number of friends.*

*For example as we mentioned before, the max number of 3 and above friends for the 10 graph we have been using is always 9.*

*But lets look at the graph from test_15. It consists of 50 people (0 to 49) and is depicted below.*

*Some of the nodes are disconnected from the main graph (like 31-38 or 6-37-30).*

*Running the code above, we learn that the max number of 2-friends is 12. The code tells us that only person 9 has that many 2-friends and gives us the following paths (of length 2 or less) from 9 to 12 other people.*

person 9: 12 -------- Path from 9 to 1:[9, 35, 1] Path from 9 to 3:[9, 18, 3] Path from 9 to 12:[9, 35, 12] Path from 9 to 18:[9, 18] Path from 9 to 19:[9, 24, 19] Path from 9 to 21:[9, 18, 21] Path from 9 to 22:[9, 22] Path from 9 to 24:[9, 24] Path from 9 to 25:[9, 24, 25] Path from 9 to 34:[9, 24, 34] Path from 9 to 35:[9, 35] Path from 9 to 47:[9, 18, 47]

*These match with what we see in the graph below.*

Next up it tells us that the max number of 3-friends is 19. There are a few people with that many 3-friends: 15, 21 and 33 (person 9 has 17 3-friends).

Next up the max number of 4-friends is 27 with only person 21 achieving that number.

Finally the max number of 5-friends is 31 with persons 15 and 21 having this number of 5-friends.

Next up it tells us that the max number of 3-friends is 19. There are a few people with that many 3-friends: 15, 21 and 33 (person 9 has 17 3-friends).

Next up the max number of 4-friends is 27 with only person 21 achieving that number.

Finally the max number of 5-friends is 31 with persons 15 and 21 having this number of 5-friends.

*You can, like before, run other tests by uncommenting them in the run_tests method of TestClass.py.*

## No comments:

## Post a Comment