109search Hash Malik Ch09

Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

Data Structures Using C++ 2E

Chapter 9
Searching and Hashing Algorithms
Data Structures Using C++ 2E 2
Objectives
Learn the various search algorithms
Explore how to implement the sequential and
binary search algorithms
Discover how the sequential and binary search
algorithms perform
Become aware of the lower bound on
comparison-based search algorithms
Learn about hashing
Data Structures Using C++ 2E 3
Search Algorithms
Item key
Unique member of the item
Used in searching, sorting, insertion, deletion
Number of key comparisons
Comparing the key of the search item with the key of
an item in the list
Where/when to use?
Determine if a data item exist
Insert a data item
Delete a data item
Performance of search algorithms
Data Structures Using C++ 2E 4
Sequential Search
Sequential search in
Array-based list (Chapter 3):
class arrayListType
Linked lists (Chapter 5):
class unorderedLinkedList
class orderedLinkedList
Works the same for array-based lists and
linked lists

Data Structures Using C++ 2E 5
Data Structures Using C++ 2E 6


Data Structures Using C++ 2E 7
Data Structures Using C++ 2E 8
Sequential Search Analysis
Examine effect of for loop in seqSearch of
Array-based lists (page 499)
Different programmers might implement same
algorithm differently
Number of key comparisons: typically the same
Computer speed affects performance
Does not affect the number of key comparisons

Exercise: recursive sequential search
Data Structures Using C++ 2E 9
Sequential Search Analysis (contd.)
Sequential search algorithm performance
Examine worst case and average case
Count number of key comparisons
Unsuccessful search
Search item not in list
Make n comparisons
Successful search depending on the location
Best case: make one key comparison
Worst case: algorithm makes n comparisons
Data Structures Using C++ 2E 10
Sequential Search Analysis (contd.)
Determining the average number of comparisons
Consider all possible cases
Find number of comparisons for each case
Add number of comparisons, divide by number of cases
O(n)
Data Structures Using C++ 2E 11
Sequential Search Analysis (contd.)
TABLE 3-1 Time complexity of array-based list operations
Data Structures Using C++ 2E 12
Sequential Search Analysis (contd.)
TABLE 5-7 Time-complexity of the operations of the
class unorderedLinkedList
Data Structures Using C++ 2E 13
Sequential Search Analysis (contd.)
TABLE 5-8 Time-complexity of the operations of
the class orderedLinkedList
Can we do better?
Data Structures Using C++ 2E 14
Ordered Lists
Elements ordered according to some criteria
Usually ascending order
Operations
Same as those on an unordered list
Determining if list is empty or full, determining list length,
printing the list, clearing the list
Defining ordered list as an abstract data type
(ADT)
Use inheritance to derive the class to implement the
ordered lists from arrayListType or
linkedListType
Data Structures Using C++ 2E 15
Ordered Lists (contd.)
Data Structures Using C++ 2E 16
Binary Search
Performed only on ordered lists
divide-and-conquer technique
FIGURE 9-1 List of length 12. Searching for 75 in the list
FIGURE 9-2 Search list: list[0]...list[11]
FIGURE 9-3 Search list: list[6]...list[11]
Data Structures Using C++ 2E 17
Binary Search (contd.)
C++ function implementing binary search algorithm
Each iteration:
Unsuccessful case: 2 key comparisons
Successful case: 1 key comparison
Data Structures Using C++ 2E 18
Binary Search (contd.)
Example 9-1. Searching for 89
FIGURE 9-4 Sorted list for a binary search
TABLE 9-1 Values of first, last, and mid and the number of
comparisons for search item 89
Data Structures Using C++ 2E 19
Binary Search (contd.)
TABLE 9-2 Values of first, last, and mid and the
number of comparisons for search item 34
TABLE 9-3 Values of first, last, and mid and the number of
comparisons for search item 22
Data Structures Using C++ 2E 20
Insertion into an Ordered List
After insertion: resulting list must be ordered
Find place in the list to insert item
Use algorithm similar to binary search algorithm
Slide list elements one array position down to make
room for the item to be inserted
Insert the item
Use function insertAt of the class arrayListType
Data Structures Using C++ 2E 21
Insertion into an Ordered List (contd.)
Algorithm to insert the item
Function insertOrd implements algorithm
Data Structures Using C++ 2E 22
Insertion into an Ordered List (contd.)
Add binary search algorithm and the insertOrd
algorithm to the class orderedArrayListType
23
Why?
Data Structures Using C++ 2E 24
Insertion into an Ordered List (contd.)
class orderedArrayListType
Derived from class arrayListType
List elements of orderedArrayListType
Ordered
Must override functions insertAt and
insertEnd of class arrayListType in
class orderedArrayListType
If these functions are used by an object of type
orderedArrayListType, list elements will remain
in order
Data Structures Using C++ 2E 25
Insertion into an Ordered List (contd.)
Can also override function seqSearch
Perform sequential search on an ordered list
Takes into account that elements are ordered







Exercise: recursive binary search
TABLE 9-4 Number of comparisons for a list of length n
Data Structures Using C++ 2E 26
Lower Bound on Comparison-Based
Search Algorithms
Comparison-based search algorithms
Search list by comparing target element with list
elements
Sequential search: order n
Binary search: order log
2
n
Devising a search algorithm with order less than
log
2
n
Obtain lower bound on number of comparisons
Cannot be comparison based

Data Structures Using C++ 2E 27
Hashing
Algorithm of order one (on average)
Requires data to be specially organized
Hash table
Helps organize data
Stored in an array
Denoted by HT
Hash function
Arithmetic function denoted by h
Applied to key X
Compute h(X): read as h of X
h(X) gives address of the item
Data Structures Using C++ 2E 28
Hashing (contd.)
Organizing data in the hash table
Store data within the hash table (array)
Store data in linked lists
Hash table HT divided into b buckets
HT[0], HT[1], . . ., HT[b 1]
Each bucket capable of holding r items
Follows that br = m, where m is the size of HT
Generally r = 1
Each bucket can hold one item
The hash function h maps key X onto an integer t
h(X) = t, such that 0 <= h(X) <= b 1
Data Structures Using C++ 2E 29
Hashing (contd.)
See Examples 9-2 and 9-3
Synonym
Occurs if h(X
1
) = h(X
2
)
Given two keys X
1
and X
2
, such that X
1
X
2
Overflow
Occurs if bucket t full
Collision
Occurs if h(X
1
) = h(X
2
)
Given X
1
and X
2
non-identical keys
Data Structures Using C++ 2E 30
Data Structures Using C++ 2E 31
Data Structures Using C++ 2E 32
Hashing (contd.)
Overflow and collision occur at same time
If r = 1 (bucket size = one)
Choosing a hash function
Main objectives
Choose an easy to compute hash function
Minimize number of collisions
If HTSize denotes the size of hash table (array
size holding the hash table)
Assume bucket size = one
Each bucket can hold one item
Overflow and collision occur simultaneously
Data Structures Using C++ 2E 33
Hash Functions: Some Examples
Mid-square
Compute by squaring the key, then using the appropriate number of bits
from the middle, which usually depend on all characters of the key
Folding
Keys are divided into equal parts, except the last parts, then add all the
parts
Division (modular arithmetic)
In C++
h(X) = i
X
% HTSize;
C++ function
Data Structures Using C++ 2E 34
Collision Resolution
Desirable to minimize number of collisions
Collisions unavoidable in reality
Hash function always maps a larger domain onto a smaller
range
Collision resolution technique categories
Open addressing (closed hashing)
Data stored within the hash table
Chaining (open hashing)
Data organized in linked lists
Hash table: array of pointers to the linked lists
Data Structures Using C++ 2E 35
Collision Resolution: Open Addressing
Data stored within the hash table
For each key X, h(X) gives index in the array
Where item with key X likely to be stored
Data Structures Using C++ 2E 36
Open Addressing: Linear Probing
Starting at location t
Search array sequentially to find next available slot
Assume circular array
If lower portion of array full
Can continue search in top portion of array using mod
operator
Starting at t, check array locations using probe
sequence
t, (t + 1) % HTSize, (t + 2) % HTSize, . . ., (t + j) % HTSize
Data Structures Using C++ 2E 37
Open Addressing: Linear Probing
(contd.)
The next array slot is given by
(h(X) + j) % HTSize where j is the j
th
probe
See Example 9-4
C++ code implementing linear programming
Data Structures Using C++ 2E 38
Data Structures Using C++ 2E 39
Open Addressing: Linear Probing
(contd.)
Causes clustering (primary clustering)
More and more new keys would likely be hashed to
the array slots already occupied
FIGURE 9-7 Hash table of size 20 with certain positions occupied
FIGURE 9-6 Hash table of size 20 with certain positions occupied
FIGURE 9-5 Hash table of size 20
Slot 9 will be occupied if h(X) =
6, 7, 8, or 9. Probability = 4/20
Data Structures Using C++ 2E 40
Open Addressing: Linear Probing
(contd.)
Improving linear probing
Skip array positions by fixed constant (c) instead of
one
New hash address: (h(X) + i * c) % HTSize
If c = 2 and h(X) = 2k, i.e., h(X) even
Only even-numbered array positions visited
If c = 2 and h(X) = 2k + 1, i.e., h(X) odd
Only odd-numbered array positions visited
To visit all the array positions
Constant c must be relatively prime to HTSize
Data Structures Using C++ 2E 41
Open Addressing: Random Probing
Uses random number generator to find next
available slot
i
th
slot in probe sequence: (h(X) + r
i
) % HTSize
Where r
i
is the i
th
value in a random permutation of the
numbers 1 to HTSize 1
All insertions, searches use same random
numbers sequence
Data Structures Using C++ 2E 42
Open Addressing: Rehashing
If collision occurs with hash function h
Use a series of hash functions: h
1
, h
2
, . . ., h
s
If collision occurs at h(X)
Array slots h
i
(X), 1 <= h
i
(X) <= s, are examined
Data Structures Using C++ 2E 43
Open Addressing: Quadratic Probing
Suppose
Item with key X hashed at t , i.e., h(X) = t and 0 <= t
<= HTSize 1
Position t already occupied
Starting at position t
Linearly search array at locations
(t + 1)% HTSize,
(t + 2
2
) % HTSize = (t + 4) %HTSize,
(t + 3
2
) % HTSize = (t + 9) % HTSize, . . .,
(t + i
2
) % HTSize
Probe sequence: t, (t + 1) % HTSize, (t + 2
2
) %
HTSize, (t + 3
2
) % HTSize, . . ., (t + i
2
) % HTSize
Data Structures Using C++ 2E 44
Data Structures Using C++ 2E 45
Open Addressing: Quadratic Probing
(contd.)
See Example 9-6
Reduces primary clustering
Does not probe all positions in the table
When HTSize is a prime, probes about half the table
before repeating probe sequence
Collisions can be resolved if HTSize is a prime at
least twice the number of items
Considerable number of probes
Assume full table
Stop insertion (and search)
Data Structures Using C++ 2E 46
Open Addressing: Quadratic Probing
(contd.)
Generating the probe sequence
Data Structures Using C++ 2E 47
Open Addressing: Quadratic Probing
(contd.)
Consider probe sequence
t, t +1, t + 2
2
, t + 3
2
, . . . , (t + i
2
) % HTSize
C++ code computes i
th
probe
(t + i
2
) % HTSize

int inc = 1;
int pCount = 0;
while (pCount < i) {
t = (t + inc) % HTSize;
inc = inc + 2;
pCount++;
}
Data Structures Using C++ 2E 48
Open Addressing: Quadratic Probing
(contd.)
Pseudocode implementing quadratic probing
Data Structures Using C++ 2E 49
Open Addressing: Quadratic Probing
(contd.)
Random, quadratic probings eliminate primary
clustering
Secondary clustering
If two non-identical keys (X
1
and X
2
)

hashed to same
home position (h(X
1
) = h(X
2
))
Same probe sequence followed for both keys
If hash function causes a cluster at a particular home
position
Cluster remains under these probings
Not original key
Data Structures Using C++ 2E 50
Open Addressing: Quadratic Probing
(contd.)
Solve secondary clustering with double hashing
Use linear probing
Increment value: function of key
If collision occurs at h(X)
Probe sequence generation




See Examples 9-7 and 9-8
Data Structures Using C++ 2E 51
Deletion: Open Addressing
Two keys R and R hash to the same HT index. What
happens when we
Deleting R and search for R
Solution: Use two arrays
One stores the data
One uses indexStatusList as described in the previous
section
Indicates whether a position in hash table free (0), occupied (1),
used previously (-1)
See code on pages 521 and 522
Class template implementing hashing as an ADT
Definition of function insert
Data Structures Using C++ 2E 52
After Goldy
is removed
After Danny
is removed
Data Structures Using C++ 2E 53
Collision Resolution: Chaining (Open
Hashing)
Hash table HT: array of pointers
For each j, where 0 <= j <= HTsize -1
HT[j] is a pointer to a linked list
Hash table size (HTSize): less than or equal to the number of
items
FIGURE 9-10 Linked hash table
Data Structures Using C++ 2E 54
Collision Resolution: Chaining (contd.)
Item insertion and collision
For each key X (in the item)
First find h(X) = t, where 0 <= t <= HTSize 1
Item with this key inserted in linked list pointed to by HT[t]
For nonidentical keys X
1
and X
2

If h(X
1
) = h(X
2
)
Items with keys X
1
and X
2
inserted in same linked list
Collision handled quickly, effectively
Data Structures Using C++ 2E 55
Collision Resolution: Chaining (contd.)
Search
Determine whether item R with key X is in the hash
table
First calculate h(X)
Example: h(X) = T
Linked list pointed to by HT[t] searched sequentially
Deletion
Delete item R from the hash table
Search hash table to find where in a linked list R exists
Adjust pointers at appropriate locations
Deallocate memory occupied by R
Data Structures Using C++ 2E 56
Collision Resolution: Chaining (contd.)
Overflow
No longer a concern
Data stored in linked lists
Memory space to store data allocated dynamically
Hash table size
No longer needs to be greater than number of items
Hash table less than the number of items
Some linked lists contain more than one item
Good hash function has average linked list length still small
(search is efficient)
Data Structures Using C++ 2E 57
Collision Resolution: Chaining (contd.)
Advantages of chaining
Item insertion and deletion: straightforward
Efficient hash function
Few keys hashed to same home position
Short linked list (on average)
Shorter search length
If item size is large
Saves a considerable amount of space
Data Structures Using C++ 2E 58
Collision Resolution: Chaining (contd.)
Disadvantage of chaining
Small item size wastes space
Example: 1000 items each requires one word of
storage
Chaining
Requires 3000 words of storage
Quadratic probing
If hash table size twice number of items: 2000 words
If table size three times number of items
Keys reasonably spread out
Results in fewer collisions
Data Structures Using C++ 2E 59
Hashing Analysis
Load factor
Parameter
TABLE 9-5 Average number of comparisons in hashing
Data Structures Using C++ 2E 60
Summary
Sequential search
Order n
Ordered lists
Elements ordered according to some criteria
Binary search
Order log
2
n
Hashing
Data organized using a hash table
Apply hash function to determine if item with a key is
in the table
Two ways to organize data
Data Structures Using C++ 2E 61
Summary (contd.)
Hash functions
Mid-square
Folding
Division (modular arithmetic)
Collision resolution technique categories
Open addressing (closed hashing)
Chaining (open hashing)
Search analysis
Review number of key comparisons
Worst case, best case, average case
Self Exercises
Programming Exercises: 1, 2, 3, 6, 8
Data Structures Using C++ 2E 62

You might also like