Addition of 2 big numbers (represented by 2 int arrays, assuming each integer can only be 0-9) ======== find the median of million rows in each of the 1000 servers. ============== What was the most challenging aspect of your previous job? ============== Write a class that iterates through a list of lists with methods next and hasNext. Next returns the next item in the list, and hasNext returns True or False depending on whether or not we are at the end of the list. Example: i = Iterator([[1, 2], [3, 4]]) i.next() returns 1 i.next() returns 2 i.hasNext() returns True i.next() returns 3 i.next() returns 4 i.hasNext returns False ================= Given an unlimited stream of input, output the median value at any moment. ============ What was the most challenging issue/bug you came across and how did you resolve it? ============== Millions of cache entries, each with a unique key, stored in several servers. how to sort all entries whose key falls within a specified range using Map Reduce? What if one server has a very small range of keys (say 1 to 100) and another server has a very large range of keys (say 100 to 1000000)? Sorry I don't remember the answer. ================= BST running time (creation/updating/traversal) ===================== Given a string and a list of character mappings of the form (a=>j,k), (d=>r), (t=>r,q,e,y), print out all the possible variants by applying the mapping to the original string. example: string="foo", mappings = "o=>1,2", "k=>d,l,w" output: foo fo1 fo2 f1o f11 f12 f2o f21 f22 =============== Describe a current project. What was challenging about it? What did you learn from it? ================== A quad tree is used to represent a black/white image. If you are provided with two such image representations, write a function to create a third tree that represents the merged image. (Black overrides white, mixed; mixed overrides white) ===================== Implement a random number generator such that the random number generated is always in a particular range. Perform through time complexity analysis of it. How would you improve the solution. ======================= If a person dials a sequence of numbers on the telephone, what possible words/strings can be formed from the letters associated with those numbers? ===================== Please tell us about the most satisfying project you worked on in the last year? ================= Please tell us about a personal challenge you had with someone who managed you? ====================== Find a repeated number in a list of numbers Naive way: brute force O(n^2) time Batter way: punch it into a hashtable/hashmap/dictionary, O(n) time ==================== How do you check if a URL is bad really fast in Google server. The point is for the user not to notice the lag in the checking Use Bloom Filter algorithm. I don't really know how to do this i don't think a bloom filter works here.. it works as an approximate cache, but when you're talking about arbitrary URLs, you'll end up missing the cache on a huge % of time... hence not valuable ================================ I was asked to determine whether two nodes were connected in a graph. use union-find data structure (also called disjoint sets) ================================= design a structure have these two functions: insert, getMedian, discuss the time complexity use a self balancing binary search tree (BST) that has the same size (or off by one) left and right subtrees from the root getMedian: O(1) since the tree is balanced the median will always be the root node insert: O(logN) BST gives us logN insertion, balancing the tree is O(logN) as well and works in the following way: (this is only the most difficult case) if upon insertion the left subtree has two more nodes then the right subtree we need to rebalance the BST, that is take the largest value on the left subtree (rightmost node) and make that the new root, that works fine as this value is greater than everything on the left subtree and less than everything on the right subtree, then take the old root value and insert it in the right subtree, then the tree is balanced again and the median value is the root, ta-da! :) I am not sure that the median is in root. You can draw a tree that is balanced ( the depth of the left child - the depth of the right child <= 1). Then it is very hard to see which is the element. You can solve this problem by using 2 heaps: - one max-heap for the 1/2 lowest elements; - one min-heap for the 1/2 highest elements; On insertion, you can insert in heap and balance the heap (if the difference of the sizes is greater than 1, then remove the maximum (minimum) from one and insert to another). This is done in O(log n). When you get the median if the sizes of these 2 heaps are the same, then take the average of the max (min) heaps, otherwise, return the element from top of the heap with the highest number of elements. =========================== Try to figure out the top unique search queries from a database with 100 billion entries Use a hash table then sort the counting results Not sure if this is a trick question. Are you trying to find the most queried search term? If so, then how is it "unique"? Assuming it's not unique, this is a typical application of the map/reduce paradigm. Assign each query a value of 1 (map) and then count the number of occurrences (reduce). Google sort of popularized the shift towards map/reduce anyway. ================== Sample uniformly on a circle centered at zero with radius 1. randomly generate an integer X. we know that cos(X)^2 + sin(X)^2 = 1, so just use xx = cos(X) and yy = sin(X) I think that was the whole point of the question (and I don't like it very much) ======================= Write code to check whether every element of an array has been visited (you are given a starting position and each position stores a pointer to a subsequent position) ================ You have a genealogy: 1) Describe a data structure to represent it. 2) Given any two people within the genealogy, describe an algorithm to determine if they share a common ancestor. You just need to return true/false, not all ancestors. 1) Each person is represented by an object, with two pointers: "mom" and "dad" that point to their respective parent. 2) Choose one of the two people arbitrarily. Perform depth-first traversal and create a set of ancestors reachable from that person. Perform depth-first traversal on the 2nd person and for each node visited check if they are in the set; if yes, return true. Use a hash-set for best performance. If you've optimizing for worst case, hash set is O(n) for search. You'd do better in worst case with a sorted structure. You'd do even better if you store a "visited" flag at each node. Alternately you can number each node and keep an array of visited flags. since depth first seach might find a longer link through dad before checking mom, you're better off with breadth first search. Anything reachable in one hop would be seen before anything reachable at best in 2 hops Yes. Good points. The second traversal should be breadth first. The first traversal it doesn't matter, as you need to visit all ancestors anyways. The use of a visited flag is a good optimization. Based upon the way the question was worded, it wasn't clear if you could design the data structure to optimize the algorithm or not. ===================== Print mth to last elements of the linkedlist ============================= Create a binary tree based on a sorted array If you already have a sorted array, then you could say that it is already a minimum heap and that is a binary tree. Use binary search to construct a balanced tree =================================== assume you are writing number in ascending order to an array of constant size. once you reach the end of the array, you start writing from the beginning, thus writing over the oldest entries. write an algorithm for finding a specific number in this array. Start with begining of array. If n 0: print k x,y,ptr1,ptr2 = v(x,y,ptr1,ptr2) break Well, we can make several observation: * a number multiplied by 8 (2^3) is always larger than that multiplied by 5. * if we seed the list with 2 consecutive multiple of 2 (e.g. 2 and 4), we do not need to multiply the number by 2 (2*2 = 4 is already in the list) and only need to multiply by 4 (see that 2*4 = 8, then 4*2 = 8, oops already in the list) * if we have a number that is divisible by 5, we simply need to multiply it by 5 since the results of multiplying the number by 2 or 4 would have been generated previously (we note that 5n * 4 is equivalent to 4n * 5, obviously 4n < 5n; similar argument for multiplication by 2). Hence, here is a very simple algorithms to generate the first n numbers in the sequence: 1. we start with a seed of [1, 2] 2. we iterate through the list until length of list = n, for each number k we see, (a) if k is divisible by 5, append to the list 5k, otherwise (b) append 4k and 5k def createPowerList(n): lst = [1, 2]; i = 0 while len(lst) < n: currentValue = lst[i] if currentValue % 5 == 0: lst.append(currentValue * 5) else: lst.append(currentValue * 4) lst.append(currentValue * 5) i = i + 1 return lst Forget that. It makes a small mistake. Instead this simpler version would work much better. :) def createPowerList2(n): lst = [1] power_of_2 = 2 multiply_by_5 = 0 while len(lst) < n: if power_of_2 < (lst[multiply_by_5] * 5): lst.append(power_of_2) power_of_2 *= 2 else: lst.append(lst[multiply_by_5] * 5) multiply_by_5 += 1 return lst ============================ Find the longest word in a dictionary, such that the word can be built one character at a time, and all substrings are words in the dictionary. A new character can be added anywhere. ============================= Deep search binary tree, what is the worst case memory requirement using Queue. =========================== Given a large web query log file, how to find the most frequent K queries ============================ Easier questions 1) Two variations of a program exist, A and B. When run on machine 1, A runs twice as fast as B. When run on machine 2, A runs 10x as fast as B. How would you investigate this? -same datasets -same versions of OS and libraries -same hardware 2) Had to code a method that calculated the factorial, first simply, then a second one recursively. One reason is The process has some random function, and it affects the speed. Perhaps second machine had more memory, so no swapping was necessary. Perhaps it had more CPUs, and one program was able to take care of that (using multi-threading) but the other one could not. Perhaps second machine had SSE instructions, and the programs running were doing something like decoding movies, perhaps the second machine had a more powerful GPU and one program could take advantage of that. Answer 1: Processor cache was smaller on slower machine ========================================= trickier question, code a method given the following method signature that will print out any numbers that intersect both arrays of numbers //Example arrays // 4, 18, 25, 40, 411 // 20, 25, 40, 320, 1009, 1100 void intersect(int[] arr1, int len1, int[] arr2, int len2) { void intersect(int[] arr1, int len1, int[] arr2, int len2) { int startIndex = 0; //have an outer loop of one of the arrays for( int i = 0; i < len1; i ++){ //have an inner loop of the other array for( int j = startIndex; j < len2; j++){ if(array1[i] == array2[j]){ System.out.println(array1[i] + "\t"); startIndex = j + 1; break; } else if(array1[i] < array2[j]){ //skip because less than the number in the second array break; } else{ startIndex = j++; } } } } }}} Even if they're not sorted, the first commenter's answer is suboptimal. Assume array sizes are n and m with n > m. Then sorting both arrays is O(n log n). Once sorted, you just need to make one pass over each array --- O(n + m) --- to find common elements. Just have two pointers at the head of each list, and use these to move in sorted order through the union of lists. Whenever both pointers point to the same value, you have an intersection. If you want a very simple answer, use hashtable to store the contents of the first list (O(n)). Then, for each member of the second list, check whether the value is in the hashtable (each check is O(1)). We have O(n + m) algorithm that is extremely simple to code. if sorted then you can avoid the nsquare with binary search ============================================ Give 2 coding solutions on returning an array by removing duplicates. One solution with O(n^2) and the other Linear. Create a new array. Run through the original array, on each element insert to new array in sorted order. O(n^2) hash select is O(n). What if the hash function is "return 0"? it degenerates into a linked list. Not sure how you can do this in O(n). If the values are integers of a set size, you can use a lookup table or sort in O(size*n) . Otherwise sort O(nlogn) You can use a hash. Either assume that you have a good hash function or write one (I doubt the interviewer would ask you to do a very complex hashing, but he'd appreciate you telling him about the assumption you made). Instead of the solution by venk T, I'd suggest another easier one. For every values you have, push it to the hashtable, usually a standard hashtable will return true or false depending on whether the value is already in the hashtable. If the value is not already in hashtable, output it in your answer, otherwise don't output. This will cut the last step of getting all the values off the hashtable. ====================================== Describe the implementation (along with data structures) involved in making a program that: 1. inserts a number 2. returns the median of all unique numbers ===================================== a very long single linked list, how to find the n-th entry from the end. this is a frequently asked question. The answer is to use two pointers to maintain the locations, but i didn't realize why this is better than another method: Scan the whole list once and calculate the length, N. Then move a pointer from the head by N-n steps. Answer is the IO efficiency. If n is small, an n-entry list can be saved in cache and in the first method, the 2nd pointer can directly read cache. set two pointers first and second 1. second pointer step forward for n step, first point stay at the head of the list 2. first and second pointer step forward at the same time until second point hit tail 3. first point is your answer Interview Candidate is right.. it's the same thing. in the 2-pointer method, you are basically iterating over the entire list with that 2nd pointer, and the 1st pointer is iterating N-n times. ==================================== Difference between function overloading and overriding. explained one uses polymorphism and other inheritence.He said right thats what he wanted to know. ============================= how would you find maximum element in an array of numbers. 2 loops and comparing each element with all other as its not sorted array. Honestly, something like this is probably the best (assuming an unsorted array): int getBest(int[] nums){ int max; for(int i = 0; i < nums.length; i++){ if(i == 0){ max = nums[0]; // ensures it's always initialized } int curr = nums[i]; if(curr > max) max = curr; } return max; } You have to look at every element once, because if you skip an element you can't be sure it isn't the biggest. The lowest bound we can do is O(n). However, two loops makes this an O(n^2) problem, doing much more work without helping us solve the problem. sort the list. quicksort gives average of O(nlogn) complexity. return the last element in the list O(1) complexity. =============================== How would you find maximum element in a list of items whcih implements comparator. Unless we want to know something else, this is the same as the array question. Iterate through the list, keeping track of a "max" item, and compare each element to the max so far. The comparator is a red herring, probably meant to get you to think about sorting. Sorting before we search is not the answer here. For a comparison based sort, the best you can do is O(n log n), which isn't as good as linear search over the unsorted array. Sorting wins if we need to make arbitrary queries after the first. I guess then that there is no one, correct answer, since we are defining the question. I'd define the "best" element as the element which was scored as "better" than the most others, and compare each element with each other one. For that, the best you can do is O(n^2). You could, though, redefine this - for instance, the "best" element is the element which beat the most other elements that it lost to. It's a more vague problem than originally described. I think the analogy I'd use would be sports teams over the course of a regular season, where team A might beat every other team except team B, who beats team A but loses several other games. I would definitely ask what the heck this guy was talking about, though - because using words like "maximum", "best" and "comparison" definitely imply some sort of logical ranking. ============================= Base class has one method takes argument as Object and inherited class overrides it to pass String as parameter.If called with argument of Object whcih method will be called functions calls go from instance class first and then if no match found then towards base class based on arguments ...so anwer is base class method will be called becuase base class only has method matching argument of Object. He was happy. =========================== Give us an example where you solved and handled some complex issue in current project I explained one scenerio for a design change.I felt he was looking for something else but based on my current job thats what i do. ==================== Find if there are two members of an array that sum to 10 (10 and 0 count, but 10 alone does not). My first guess is something like the following, where 10 would be passed as the sum parameter: int[] findSum(int[] nums, int sum){ int[] pair = null; for(int i = 0; i < nums.length; i++){ int a = nums[i]; int target = sum - a; for(int j = 0; j < nums.length; j++){ if(nums[ j ] == target){ pair = new int[2]; pair[0] = a; pair[1] = target; return pair; } } } return pair; // returns null if no suitable pair found } } I would use a hash to count how often the numbers between 0 and sum/10 show up. then go through the has to see if we have two that match. it is O(n^2) also, and doesn't work if sum is large. int[] findSum(int[] nums, int sum) { int[] hash = new int[sum+1]; //count how many numbers for (int i=0; i 0) && (hash[sum-i]>0)) { return new int[] {i, sum-i}; } } return null; }} Basically sort the elements in n log n . Then for every element k search for (Sum - k) in the array using a binary search. Hence total complexity is n log n. You can use a hashmap from value to # of occurrences of the number in the hashmap. Enter all numbers to a the hashmap (O(n)), go through the array and calculate 10 - n for each member and determine whether that number has occurrences > 0 in the hashmap. If so, you have a match, don't forget to decrease the occurrence by 1 (O(n)). Total O(n). Note that if duplicate is allowed (such that 5+5=10 is possible), you may need to check whether n = Sum - n, if so you must make sure the # of occurrences is > 1 and then reduce the occurrences by 2 instead. ================================ What cool things did you do in your previous job. Basically the interviewer wants to know if there can be a good fit between me and the team. ================================ You're given a binary tree and pointers to two nodes in the tree. Describe the fastest algorithm you can come up with to determine the closest common ancestor. This one is covered a lot on the web and in technical books, but every one I've seen assumes it's a binary search tree. Here he didn't want me to assume it was a binary search tree. I think I would go with a modified dfs which terminates early. I would need a global variable to indicate the closest ancestor; //return 0 - none found, 1 - a found, 2 - b found, 3 - both found public int modifiedDFS(Node node, Node a, Node b, int ) { int res=0; if (node) { if ((node.left == a) || (node.right == a)) { res += 1; } if ((node.left == b) || (node.right == b)) { res += 2; } if (res == 3) { lowest = node; } else { int l = modifiedDFS(node.right, a, b); int r = modifiedDFS(node.left, a, b); if ((l==3) || (r==3)) { //lower node is the one res = 3; } else { if ((3 == (l+r)) || (3 == (l+res)) || (3 == (r+res)) { //i am the lowest lowest = node; res = 3; } else { if ((l==1)||(r==1)) res = 1; if ((2==l)||(2==r)) res = 2; } } } } return res; } )}}}} That may or may not work (I didn't trace through the code), but there's a much simpler and faster way to do it. Worst-case running time is O(lg n). With yours, worst-case running time is O(n). Remember: you're given a pointer to the nodes in question. To traverse from a leaf node to the root is worst-case O(lg n) operations. Think of something you could do to take advantage of that. If there is parent pointer, the answer is straightforward. The simplest of which is just to traverse up from 1 node and record all the nodes, then traverse up from the other node and find the first node that match the recorded nodes. This is O(lg n) in a balanced tree, O(n) in worst-case. If there is no parent pointer, I'd use tree traversal (pre-, in-, or post- doesn't matter much). The closest common ancestor would be the only node that find one node on its left children (or itself) and the other node on its right children (or itself). This is O(n). ============================= You're writing an application that receives a stream of individual items of data. The stream may be very long or very short, but you have no way of knowing how long it is (i.e. there's no trick to figuring out the size of the stream of data). How would you go about choosing m items such that any subset of m items was equally likely? (Not an even distribution of values, but just that any m items are equally likely to be chosen). So for example, m=1000, and the number of items in the stream, n, may be 1000, or 10000, or 100000000, or much much larger; there is no way to know how many. The algorithm is rather straightforward: 1. Pick the first m members in the stream (if there is less than m, just return the entire list). 2. For each of the subsequent member of the list, pick the member with m/n probability, where m is the number of elements already seen. If the member is picked, replace 1 member at random from the already selected m members with the new member. We can prove that this is random via induction. If there is m members (or less), each member has probability 1, which satisfy the requirement that probability is m/N. For the induction step, we assume that for the first k-1 members, we pick them with probability m/k-1, so in step k, we have 2 cases to consider: (a) the k-th member has m/k probability of being picked, so the induction step holds for k-th member. (b) for all the previous member, the probability of being picked is now P(k-th member not picked)*P(original) + P(k-th member picked)*P(original)*P(member is not dropped) = (k-m)/k * m/(k-1) + m/k * m/(k-1) * (m-1)/m = m/(k-1) * [(k-m)/k + (m-1)/k] = m/k, which is the required probability. One idea is for each member, we run random to get a number associate with it. We also keep the minimum m members at any time. For worst case, the algorithm take O(n*log(m)) runtime. But considering the kept m members has very small numbers associated with them after a long run. So the random generated for subsequent members has large probability to be larger, so the comparison can be very quickly. ================================= Given a stream of integers of unknown (possibly large) length, how would you pick one at random? Now prove its random. Question was not worded very well, however we eventually reached mutual understanding. Answer: Store one integer at most. For each pop, decide whether or not to store it in the one slot with probability 1/N (N is current count of integers encountered). proof is easy: probability of an object appearing (being chosen) to be in some previous slot K starts with 1/K as per the algorithm. To "survive" to the Nth slot, the probability is (let M be distance of K to N: M = N - K or N = K + M) 1/K * K/(K+1) * (K+1)/(K+2) * ... * (K+M-1)/(K+M) , the last denominator simplifies to N All the numerators and denominators cancel except 1/N. You can prove by induction, probably simpler. If the list only contains 1 member, probability is 1/1 = 1, which is correct. For the induction step, consider what happen when k-th member is encountered: case(a): for the k-th member, probability of being picked is 1/k, which is what we want. case(b): for each of the previous (k-1) members, probability of being picked become P(k-th not picked) * P(original, i.e. 1/(k-1)) = (k-1)/k * 1/(k-1) = 1/k, which is what we want. [] It is more fun if we were to picked m members at random, then the prove becomes less trivial (though almost equally as straightforward). ====================== Define binary search tree. Develop a procedure to verify a binary search tree. @Bala Recursively: Check node.left < node.value and node.right > node.value. Also, that all subvalues of node.left < node.value and that the values of the subtree of node.right > node.value. Make sure you do checking for edge cases (ie. leaf nodes, etc). recursive solution is a good option i guess.. its definitely easy to code. However i think we can have it done more efficiently if we just perform a BFS on the given tree. and every time just check tht the left child is smaller and right chiled is greater than or equal to the current node. If we successfully search the entire tree then it is a proper BST otherwise its not. public static boolean checkTree(BSTNode node, Integer leftLimit, Integer rightLimit) { if(node == null) return true; if(leftLimit != null && leftLimit > node.info) { return false; } if(rightLimit != null && rightLimit < node.info) { return false; } return checkTree(node.left,leftLimit,node.info) && checkTree(node.right,node.info,rightLimit); } ================================== Write a function to find intersection of 2 sorted arrays. void intersection(int* arr1, int* arr2, int len1, int len2, int** res, int& len) { *res = NULL; int i1 = 0; int i2 = 0; len = 0; while(1) { if(arr1[i1] < arr2[i2]) { while((arr1[i1] < arr2[i2]) && (i1 < len1)) i1++; if(i1 == len1) return; } if(arr1[i1] == arr2[i2]) break; if((arr1[i1] > arr2[i2]) && (i2 < len2)) { while(arr2[i2] < arr1[i1]) i2++; if(i2 == len2) return; } if(arr1[i1] == arr2[i2]) break; } // i1, i2 - pos for arr1, arr2 where (arr1[i1] == arr2[i2]) int i3, i4; i3 = i1; i4 = i2; while((arr1[i3] == arr2[i4]) && (i3 < len1) && (i4 < len2)) { i3++; i4++; len++; } if(len > 0) { *res = new int[len]; for(int i=0; (i < len) && (i1+i < len1); i++) (*res)[i] = arr1[i1+i]; } }}}}}}}}}}}}}}}}}}}}}}}}}}}}}} void intersection(int *a, int la, int *b, int lb, int **res, int *lres) { int min_n = la < lb ? la:lb; (*res) = new int[ min_n ]; int ca = 0; int cb = 0; *lres = 0; while (ca < la && cb < lb) { while (a[ca] < b[cb] && ca < la) ca++; if (a[ca] == b[cb]) { (*res)[*lres] = a[ca]; (*lres)++; } cb++; } }} ublic static void main(String[] args) { // TODO Auto-generated method stub int [] A = {1,2,3,4}; int [] B = {4,5}; System.out.println(checkIntersection(A,B)); } private static String checkIntersection(int[] A, int[] B) { // TODO Auto-generated method stub int curA =0, curB=0; // indices for A and B String result = "No intersection"; while(curA < A.length && curB < B.length && result.compareTo("No intersection") == 0) { if(A[curA] < B[curB]){ curA++; } else if(A[curA] > B[curB]){ curB++; } else{ result = String.valueOf(A[curA]); } } return result; } ============================================= Find the lowest common ancestor for BST One solution is traversing the BST from top to bottom and for every node, you check its children. Then you recursively go down the chain until you reach the leaf. In BST you use the ordering property of BST: http://rmandvikar.blogspot.com/2008/08/lowest-common-ancestor-lca-in-binary.html The lowest common ancestor will be between the two values, so perform a dfs until you reach a node that is between the two values. We note that the lowest common ancestor must have a value between the 2 values you are given. Any other ancestors higher than that will have a number that is either greater or smaller than _both_ values. Hence the problem is simplified a lot. We can now simply traverse from the root, as if we are searching for the smallest of the 2 numbers. For each traversal, check whether the number is between the 2 values. If it is, return that node. =========================================== What is the relationship between equals(Object o) and hashCode() in Java? How would you implement each method? A class which overrides one must override the other such that objects which .equals() each other generate identical hash codes. Typically, what this would mean for implementation is that every field or attribute (including type depending on the constraints - maybe you want two objects of a given parent type to be equal to each other, so you check the parent type instead of the exact type) which is compared in the .equals() method must be used as input in the hashCode() algorithm. Something which doesn't affect object equality should never be hashed. You need to be very careful to preserve transitivity if you allow a subclass to be equal to a super class. The super class needs to be designed for this and you have to rely on all other subclasses also being implemented correctly ============================================== What is your favorite data structure? What opertions does it have? [Then some detailed questions about the algorithms based on this data structure] I will go for hash and heap. ============================ Design and describe a system/application that will most efficiently produce a report of the top 1 million Google search requests. You are given: You are given 12 servers to work with. They are all dual-processor machines with 4Gb of RAM, 4x400GB hard drives and networked together.(Basically, nothing more than high-end PC's) The log data has already been cleaned for you. It consists of 100 Billion log lines, broken down into 12 320 GB files of 40-byte search terms per line. You can use only custom written applications or available free open-source software. This is a great question. I would probably use some sort of map-reduce algorithm. Divide the job up between all of the servers (2 jobs per server, since they're dual-core machines), and calculate how large each job can be from the size of the dataset, and the memory size of each server. Have each server count the occurences of each search request, then send up the counts. Map-reduce will take care of the rest. Map reduce is the solution. But one should try to have tree structure(bottom up) of map tasks, so that you can filter out at every step. There are 12 machines, and 12 files of 320 GB each. Presumably there is one file on each machine. Since hard-drives are slow, loading a chunk of data should take much more time than processing it. So let's do lots of processing. On each machine: 1. Load a big chunk of 40-byte entries at a time, say 0.5GB each. Compress each entry, say by using in-memory zip. 2. Put the entries in the chunk in buckets, using a hash function whose range is a big number multiple of 12 (say 3 times a power of 2). The reason for this comes later. Use exactly the same hash function on all machines at all times. 3. Once that chunk is processed, append it to 12 files on disk, by taking the hash function value mod 12. 4. At the end we have 12 files on each machine, each having a (compressed) 40-byte query and its frequency. The same query-frequency pair can show up multiple times in each file, since we append to each file multiple times. 5. Each of those files is at most 3GB (assuming that our hash function has good randomness) and each file could be much less due to compression. 6. Process each file again to combine any repetitions. 7. Now, from each machine, send first file to first machine (by compressing it perhaps), send second file to second machine, etc. The beauty here is that due to the same hashing employed at all times a query will hash to exactly one of the 12 files, the same file on each machine. 8. So, after each machine gets 12 files each, all it needs to do is combine them once again. We will find the frequency of each query. 9. The last step is to output the one million most frequent queries. I did not think of this yet, but will require some more communication among the machines. ========================================= How would you find the most searched for phrase in Google, assuming that you could use 10000 computer in parallel? Distribute phrases randomly to each of the 10000 computers, sort list of phrases on each computer, find the phrase that came up the most number of times on each computer. Then compare the top searches across the computers. If you have 10000 computers, then the data for each map job will become N/10000 and emit only the top element. In the reduce step, you will sort 10000 and emit the top frequency word. =============================================== How do you delete a given node from a single linked list? template class LList { public: T val; LList* next; public: static void Delete(LList** head, T& node) { if(NULL == (*head)) return; LList* next; // the case when the element to delete is a head of the list if((*head)->val == node) { next = (*head)->next; delete (*head); (*head) = next; } else { LList* cur = *head; while(cur->next) { if(node == cur->next->val) { next = cur->next->next; delete cur->next; cur->next = next; break; } cur = cur->next; } } } };} ========================== Implement an LRU cache. ========================== You have a bag of random numbers which can be negative, positive or zero, and it might contain duplicates. Describe an algorithm for finding all pairs whose sum equals some value m. Devide and conquer algorithm Assume that m > 0 1. Devide all numbers for 5 buckets: 1st {n<0}; 2nd {n==0}; 3d {n>0 && n m} 2. To check if 2 numbers sum is equal to m we need to work with buckets this way: a) if 2nd and 4th are not empty: sum of all numbers from 2nd and 4th are equal to m b) if 1st and 5th are not empty check what sum of these negative and positive numbers are equal to m c) Alpply devide and conquer algorithm to the 3d bucket. ================================== Describe how you would reverse a single linked list. template class LList { public: T val; LList* next; public: static void Reverse(LList** head) { if(NULL == (*head)) return; LList* headNew = NULL; LList* cur = *head; LList* next; while(cur) { // to insert the pointer to the current node at the head of a new list next = cur->next; cur->next = headNew; headNew = cur; cur = next; } *head = headNew; } }; } and same thing but recursive: reverse(head) { if(head.next==null) return head next = head.next newlist = reverse(next) next.next = head }} ====================== I am playing a card game called 24. Cards ace to king are numbered 1 to 13. During a given round, I am provided four cards to play with from the shuffled pack. If the numbers from the four cards result in 24 then I win the round if I shout '24' first. How would you code a function for this? Four numbers and three operators at a time. Consider permutations of such expressions in postfix notation. Evaluate these permutations till you get 24. It's a naturally recursive operation. Take the four cards, A, B, C, D. In the different steps you have (target = 24): (target) : using cards A, B, C, D (target - A): using cards B, C, D (target - A - B): using cards C, D (target - A - B - C): using card D Repeat for each of the other cards in the step's grouping if the target isn't met, ie. (target) : using cards B, C, D (target - B): using cards C, D (target - B - C): using card D And so on. Pass true up the stack when any of the target values reach zero. Once you have that down, it's trivial to write the function. If we know the rules won't change (always four cards, always summing to 24) and speed REALLY matters, then the correct answer is to ignore complexity and elegance and let the CPU burn through this. Just add the first two cards and check the sum. If sum == 24, shout (return). If sum < 24, add third card and check. If sum < 24, add fourth card and check. Sorting or structuring will take too much time even though it could be a big win in the general case - someone else will shout while we're moving our cards around. It's a naturally recursive operation. Take the four cards, A, B, C, D. In the different steps you have (target = 24): (target) : using cards A, B, C, D (target - A): using cards B, C, D (target - A - B): using cards C, D (target - A - B - C): using card D Repeat for each of the other cards in the step's grouping if the target isn't met, ie. (target) : using cards B, C, D (target - B): using cards C, D (target - B - C): using card D And so on. Pass true up the stack when any of the target values reach zero. Once you have that down, it's trivial to write the function. You only consider the + operator, which is not true in 24 points game. There are +,-,* and / 4 operators. ======================================= Given the list of points of the skyline of a city in order (from East to West) Find the maximal rectangle contained in this skyline. I was asked to write the code. I managed to find the algorithm but was not sufficient. Here's an O(n^3) dynamic programming solution. It's pretty ugly, but it seems to work. More elegant/efficient solutions would be greatly appreciated. -------------------------------------------- import java.util.ArrayList; public class Skyline { private int maxX; private int maxY; public String[] findLargestRectangle(String[] points){ // First translate the points into a boolean matrix. // The cell [i][j] in the matrix should be true if // it appears under the skyline and false otherwise. boolean[][] cells = this.pointsToMatrix(points); System.out.println("Matrix is: "); this.printBooleanMatrix(cells,maxX,maxY); // Now we calculate rectangle size. Define // s[i][j] to be the size of the rectangle // with (i,j) at its upper left corner. int[][] s = new int[maxX+1][maxY+1]; int maxSizeX = 0; int maxSizeY = 0; int maxSize = 0; for (int i = 0; i < maxX; i++){ for (int j = 1; j < maxY+1; j++){ if (! cells[i][j]) { s[i][j] = 0; continue; } int maxHSize = 1; int maxVSize = 1; int i1 = i+1; int j1 = j-1; // Add one to the maximum horizontal size for // each cell that is true towards the right. while (cells[i1][j]){ i1++; maxHSize++; } // Add one to the maximum vertical size for each // cell that is true towards the bottom. while (cells[i][j1]){ j1--; maxVSize++; } int max = maxHSize * maxVSize; s[i][j] = max; if (max > maxSize){ maxSize = max; maxSizeX = i; maxSizeY = j; } } } System.out.println("Size matrix is: "); this.printIntegerMatrix(s,maxX,maxY); System.out.println("Maximum rectangle has size " + maxSize + "."); System.out.println("Its upper left corner is (" + maxSizeX + "," + maxSizeY + ")."); return null; } // Points to be given as "1 0", "2 5", etc. private boolean[][] pointsToMatrix(String[] points){ ArrayList list = new ArrayList(); // The maximum x-value will be that of the last point. maxX = Integer.parseInt(points[points.length-1].split(" ")[0]); maxY = 0; for (int i = 0; i < points.length; i++){ String[] s = points[i].split(" "); int x = Integer.parseInt(s[0]); int y = Integer.parseInt(s[1]); if (y > maxY) maxY = y; list.add(new Point(x,y)); // Assume there are no duplicates } System.out.println("Points are: " + list); System.out.println(" maxX = " + maxX + ", maxY = " + maxY); boolean[][] m = new boolean[maxX+1][maxY+1]; int prevX = 0; int prevY = -1; for (int k = 0; k < list.size(); k++){ Point p = list.get(k); //System.out.println("Next point: " + p); int x = p.x; int y = p.y; int j = y; // If y is zero, ignore the point. if (y == 0){ // Do nothing. } // If x is zero, this is the first cell. Its value and the value // of the cells below it is true. else if (x == 0){ m[x][y] = true; while (j > 0){ m[x][j] = true; j--; } } else { // Look left. If the cell to the left is false, then this cell // is true (start of rectangle top). The value of the cells below // is also true. // // If the cell to the left is true and the previous point had the same value // of x, then this cell represents the start of another rectangle. Set its value // to true and the values of the cells below to true. if ((! m[x-1][y]) || (prevX == x)){ m[x][y] = true; while (j > 0){ m[x][j] = true; j--; } // If the previous value of y is the same as this value of y and the // difference between x values is greater than 1, fill in the cells // in between. if (prevY == y && (x - prevX > 1)){ int i = x-1; while (i > prevX){ j = y; while (j > 0){ m[i][j] = true; j--; } i--; } } } // If the cell to the left is true and the previous point had a different // value of x, then this cell is false (end of rectangle top). Set it and // the cells below it to false. This cell may be inside another rectangle, // in which case its value will be reset to true. else { m[x][y] = false; while (j > 0){ m[x][j] = false; j--; } } } prevX = x; prevY = y; //System.out.println("After " + p + ", matrix is: "); //this.printBooleanMatrix(m,maxX,maxY); } return m; } } private void printIntegerMatrix(int[][] b, int maxX, int maxY){ for (int j = maxY; j >=0; j--){ for (int i = 0; i <= maxX; i++){ System.out.print(String.format("%8d",b[i][j])); } System.out.println(); } } private void printBooleanMatrix(boolean[][] b, int maxX, int maxY){ for (int j = maxY; j >=0; j--){ for (int i = 0; i <= maxX; i++){ System.out.print(String.format("%8b",b[i][j])); } System.out.println(); } } class Point implements Comparable{ int x; int y; Point(int x, int y){ this.x = x; this.y = y; } public int compareTo(Object o){ Point p = (Point) o; return (this.x != p.x ? this.x - p.x : this.y - p.y); } public boolean equals(Object o){ if (! (o instanceof Point)) return false; Point p = (Point) o; return (this.x == p.x && this.y == p.y); } public int hashCode(){ return x*37 + y*19 + x*11 + y*7; } public String toString(){ return "(" + x + "," + y + ")"; } } /** * @param args */ public static void main(String[] args) { String[] points = {"0 2","1 2","1 1","2 1","2 3","3 3","3 0"}; Skyline s = new Skyline(); s.findLargestRectangle(points); System.out.println(); String[] points1 = {"0 1","1 1","1 4","2 4","2 3","3 3","3 6","4 6","5 3","7 3","7 7","8 7"}; s = new Skyline(); s.findLargestRectangle(points1); } } Output: Points are: [(0,2), (1,2), (1,1), (2,1), (2,3), (3,3), (3,0)] maxX = 3, maxY = 3 Matrix is: false false true false true false true false true true true false false false false false Size matrix is: 0 0 3 0 2 0 2 0 3 2 1 0 0 0 0 0 Maximum rectangle has size 3. Its upper left corner is (0,1). Points are: [(0,1), (1,1), (1,4), (2,4), (2,3), (3,3), (3,6), (4,6), (5,3), (7,3), (7,7), (8,7)] maxX = 8, maxY = 7 Matrix is: false false false false false false false true false false false false true false false false true false false false false true false false false true false false true false true false false false true false false true true true false true true true false false true true true false true true true false true true true true false true true true false false false false false false false false false false Size matrix is: 0 0 0 0 0 0 0 7 0 0 0 0 6 0 0 0 6 0 0 0 0 5 0 0 0 5 0 0 4 0 4 0 0 0 4 0 0 9 6 3 0 9 6 3 0 0 6 4 2 0 6 4 2 0 4 3 2 1 0 3 2 1 0 0 0 0 0 0 0 0 0 0 Maximum rectangle has size 9. Its upper left corner is (1,3). Helpful Answer? Yes | No Inappropriate? Dec 7, 2009 by ellemeno: Found bug: public *int* findLargestRectangle(String[] points){ ... *return maxSize*; } (obviously) For others' reference: http://www.topcoder.com/stat?c=problem_statement&pm=7473&rd=10661 http://www.topcoder.com/tc?module=Static&d1=match_editorials&d2=srm337 https://www.spoj.pl/problems/HISTOGRA/ http://www.informatik.uni-ulm.de/acm/Locals/2003/html/judge.html ere's a much nicer O(n^2) solution. As Anonymous said, there's an O(n) solution, too. /* * In this input array a, the ith building * has height a[i], which means its lower left * corner is at (i,0) and its upper right corner * is at (i+1,a[i]). All buildings have width 1. * The total width of the skyline is n. */ public long getMax(long[] a){ int n = a.length; int[] leftWidths = new int[n]; int[] rightWidths = new int[n]; // For each building, check how many units width to the left of the // top block of this building we can move before we are no longer // in a building. for (int i = 0; i < n; i++){ int x = i-1; // Look left int count = 0; // Until we have passed the leftmost building and while the // height of the next left building is at least as high as our // building while ((x >= 0) && a[x] >= a[i]) { x--; count++; } leftWidths[i] = count; } // For each building, check how many units width to the right of the // top block of this building we can move before we are no longer // in a building. for (int i = 0; i < n; i++){ int x = i+1; // Look right int count = 0; // Until we have passed the rightmost building and while the // height of the next right building is at least as high as our // building while ((x < n) && a[x] >= a[i]) { x++; count++; } rightWidths[i] = count; } // Now find the maximum rectangle. The value of the ith building's // rectangle is its horizontal width times its height, or // (1 + leftWidths[i] + rightWidths[i]) * a[i]. long maxArea = 1; for (int i = 0; i < n; i++){ int totalWidth = 1 + leftWidths[i] + rightWidths[i]; long nextArea = totalWidth * a[i]; if (nextArea > maxArea) { maxArea = nextArea; } } return maxArea; } Helpful Answer? Yes | No Inappropriate? Dec 8, 2009 by ellemeno: O(n) solution: /* * Basic idea: use a stack to store the buildings. Look at * the buildings in left-to-right order (west to east). If a * building is taller than the building on the top of the stack * (the tallest building to its left), push it onto * the stack. If a building is equal in height to the building on the * top, skip it. If a building is shorter than the building on the top, * it is not part of the maximum rectangle that is topped by the tallest * building to its left. Pop that tallest building, calculate its area and * compare it to the current main area, then repeat the comparison * procedure with the new tallest building. * * Along the way, track the number of buildings to the left and right of a * given building that would participate in that building's maximum * rectangle. The number to the left is equal to the number of buildings * that are popped off the stack before this building is pushed - that is * the number of buildings to the left of this building that are taller. * We do not need to worry about the buildings that are equal in height * since they are discarded (they are accounted for in the topBuilding's * rightWidth count). * * The number of buildings to the right of this building that participate * in this building's maximum rectangle is equal to the number of buildings * that are discarded because they are equal to this building's height * plus the number of buildings that are pushed onto the stack because they * are taller than this building while this building is on the top of the * stack. * * In this input array a, the ith building has height a[i], which means * its lower left corner is at (i,0) and its upper right corner is at * (i+1,a[i]). All buildings have width 1. The total width of the skyline * is n. */ public long getMax_useStack(long[] a){ Stack stack = new Stack(); int n = a.length; long maxArea = 1; // Process the buildings in left-to-right order. for (int i= 0; i < n; i++){ Building nextBuilding = new Building(a[i]); // Keep track of the number of buildings that we pop before we // push nextBuilding. That number will be equal to the number // of buildings to the immediate left of nextBuilding that are // taller in size. int popCount = 0; // If the stack is empty, push the next building onto the stack. // There are no buildings to its left, so we do not need to // update nextBuilding.leftWidth. if (stack.empty()) { stack.push(nextBuilding); continue; } Helpful Answer? Yes | No Inappropriate? Dec 8, 2009 by ellemeno: (cont'd) // Otherwise, compare the new building's height to the building on // the top of the stack until the new building is either // discarded (if it is equal in size) or pushed (if it is taller). while (! stack.empty()){ Building topBuilding = stack.peek(); long heightDiff= nextBuilding.height - topBuilding.height; // If the new building is equal in height, skip it. Increment // the rightWidths count of topBuilding as its largest // rectangle goes through the new building. if (heightDiff == 0) { topBuilding.rightWidth++; break; } // If the new building is greater in height, push it onto the // stack. The number of buildings to the immediate left of it // that are taller is equal to the number of buildings that // were popped before this point, its popCount. Set its // leftWidth equal to its popCount. Increment the rightWidths // count of the top building as its largest rectangle goes // through the new building. if (heightDiff > 0) { nextBuilding.leftWidth = popCount; topBuilding.rightWidth++; stack.push(nextBuilding); break; } // If the new building is less in height, update the maximum area // with regards to the element at the top of the stack. long topArea = topBuilding.getArea(); if (topArea > maxArea){ maxArea = topArea; } // Then discard the top element and repeat the comparison // procedure with the current next building. stack.pop(); popCount++; } } // If all buildings have been processed and the stack is not yet empty, // finish the remaining subproblems by updating the maximum area // with regards to the building at the top of the stack. while (! stack.empty()){ Building topBuilding = stack.pop(); long topArea = topBuilding.getArea(); if (topArea > maxArea){ maxArea = topArea; } } return maxArea; } class Building { long height; int leftWidth; int rightWidth; Building(long y){ this.height = y; leftWidth = 0; rightWidth = 0; } long getArea(){ return height * (1 + leftWidth + rightWidth); } } Helpful Answer? Yes | No Inappropriate? Dec 9, 2009 by logimech: ellemeno, Nice solution/explanation. However, maxArea should be initialized to 0. Helpful Answer? Yes | No Inappropriate? Dec 9, 2009 by ellemeno: @logimech Right. :) Thanks. Helpful Answer? Yes | No Inappropriate? Dec 10, 2009 by tabatabaye: ellemeno: The program has bug. We need to store the building's x cordinate. And when the nextbuilding is shorter than the top one, we need to update each popped building's rightwidth as topbuilding.x - nextbuilding.x Helpful Answer? Yes | No Inappropriate? Dec 13, 2009 by Pratyus: Isnt this the convex hull problem? Helpful Answer? Yes | No Inappropriate? Jan 19, 2011 by Victor: The O(n) algorithm does not work. I have another solution for it, however. Suppose we have the array of heights a of length n. We are looking for the smallest building index (let's say x and divide the problem in 2: the left part ( between 0 and i - 1) and the right part (between i + 1 and n - 1). Then we compute the area that can be build using the building indexed with i: a[i] * (n - 1 - 0 + 1) = n * a[i]. We compare this area with the ones obtained from the left and from the right part and return the result. Therefore, the complexity will be: N * time to find the index of the smallest building in a range - which is RMQ problem that can be solved in O(logn) and O(n) preporcessing time (and O(N) space). Therefore, the complexity is O( N log N). It works for 3 6 5 6 2 4: the solution is 3 * 5 = 15. The problem is known as the biggest rectangle below a histogram. 0) =========================== find the ntersection of two integer arrays in increasing order. Simple O(n log n) solution: Sort the arrays individually, then compare. O(n) solution: Hash the elements in the first array to a hash table. Walk through the second array. If an element in the second array is in the hash table, add the element to a priority queue where priority = ~ value. Read the elements out of the priority queue. We could also use a bit array instead of a priority queue for unique, nonnegative integers. ================================ Write a program the generates the power set of a set of numbers How about something like this? Running time is O(n!). (Java) /** * The power set of S is the set of all subsets of S, including the empty * set and S itself. It is parallel to the idea of all combinations of * characters in a string. */ public Set> getPowerSet(Set