0% found this document useful (0 votes)
408 views52 pages

Web Search Trends and User Behavior Analysis

This document summarizes research projects conducted by Amanda Spink at the University of Pittsburgh School of Information Sciences. The research focuses on human information behavior, cognitive information retrieval, and web search. Specific projects examine topics such as evolutionary human information behavior, multitasking frameworks, and analyzing large datasets of web search queries from 1997 to 2004 to identify trends in terms, subjects, and user behavior.

Uploaded by

imad
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
408 views52 pages

Web Search Trends and User Behavior Analysis

This document summarizes research projects conducted by Amanda Spink at the University of Pittsburgh School of Information Sciences. The research focuses on human information behavior, cognitive information retrieval, and web search. Specific projects examine topics such as evolutionary human information behavior, multitasking frameworks, and analyzing large datasets of web search queries from 1997 to 2004 to identify trends in terms, subjects, and user behavior.

Uploaded by

imad
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

RESEARCH PROJECTS

Amanda Spink
School of Information
Sciences
University of Pittsburgh

1
Research Areas

 Human Information Behavior


 Cognitive/Interactive Information Retrieval
 Web Search

 Charles Oppenheim & Amanda Spink (Eds.).


Handbook of Information Science and
Information Management. Sage.

2
Human Information Behavior Projects

 Evolutionary Human Information Behavior


 Integrated Human Information Behavior Framework
 Multitasking Framework for Information Behavior
- Web search - Public
libraries study - Nuclear power plant
crews study
 Amanda Spink & Charles Cole. (2005). New
Directions in Human Information Behavior.
Springer.

3
Cognitive Information Retrieval
Projects

 Multitasking Framework for IR

 Multitasking Web Search study

 IR Evaluation Measures study – Monica Landoni


(Strathclyde University)

 Amanda Spink & Charles Cole (2005). New


Directions in Cognitive Information Retrieval.
Springer.
4
Web Search
 Vivisimo.com – clustering
 InfoSpace, Inc. – meta-search
 Alioplex – personalization
 Reuters Ltd – search redevelopment
 Amanda Spink & Jim Jansen (2nd Edition in 2006).
Web Search: Public Searching of the Web.
Springer.

5
Web Searching Trends: 1997-2004

Jim Jansen
IST, Penn State

Sherry Koshman
SIS, University of Pittsburgh

6
Web Search Book

 Amanda Spink & Bernard J. Jansen (2004).


Web Search: Public Searching of the Web. Springer.

7
Web Search Engines

 Large-scale Web transaction logs: 1997 – 2004

 Excite.com
 AlltheWeb.com
 AltaVista.com
 AskJeeves.com
 Vivisimo.com

 No Google or MSN! They haven’t been


collaborating with academics

8
Research Goals

 Track Web search trends – user focus

 Identify characteristics of Web searching - session


length, query length, and use of query operators.

 Examine the distribution of query topics, terms,


queries, sources, and languages used on Web
search engines.

 Implications for theoretical and user models, and


Web services, interfaces and systems design.
9
Web Search Engines

 Search capabilities:
– Up to 10 terms per query; default OR
– Advanced search:
 Boolean AND, OR, AND NOT & parentheses
 “phrase” : must appear in answer
 + or - before term must or must not be in answer
– proprietary algorithms & concept linking method, but
follow basic information retrieval

10
Web Query Datasets

 51,000 queries by 18,000 Excite users collected


in 1997
 1 million query transaction logs from various Web
search engines – 1997 to 2004

 Dataset of 20 million+ Web queries

11
Terms Per Query: 1997-2001

Terms 1997 1999 2001


Mean 2.32 2.4 2.35
1 term 31% 26% 26%
2 31% 31% 26%
3 18% 18% 15%

12
Terms Per Query: 2002-2004

 Alta Vista (2002) – 2.9 terms per query


 1 term=20.4% 2 terms=30.8% 3+ terms=48.5%

 Vivisimo (2004) – 3.1 terms per query


 1 term=20.3% 2 terms=29.4% 3+ terms=50.2%

 Most queries include 2-3 terms

13
Queries Per User: 1997-2001

Queries 1997 1999 2001


Mean 2.8 2.5 2
1 query 67% 48% 78%
2 19% 21% 13%
3 7% 11% 4%

14
Queries Per User: 2002 & 2004

 Alta Vista (2002) – mean 2.9 queries per user

 Vivisimo (2004) – mean 3.8 queries per user

 Short users sessions, but growing in length

15
Session Duration (Minutes): 2002 &
2004

 Alta Vista (2002)


 71.6% sessions less than 5 minutes

 Vivisimo (2004)
 50% sessions less than 1 minute
 10% sessions 1-5 minutes
 45% sessions longer than 5 minutes

 Most sessions less than 5 minutes


16
Use of Boolean Operators: 1997-
2001

1997 1999 2001


In >10% of >5% 20%
queries

17
Use of Boolean Operators: 2002 & 2004

 Alta Vista (2002) – 20% sessions included Boolean


operators

 Vivisimo (2004) – 2.6% sessions included Boolean


operators

 Varied use of Boolean operators, but use may be


incorrect

18
Query Reformulation: 2002 & 2004

 Alta Vista (2002) – 52.4%

 Vivisimo (2004) – 62%

 More users modifying queries

19
Pages Viewed Per User: 1997-2001

Pages 1997 1997 2001


Mean 2.3 1.8 1.7
1 page 58% 29% 43%
2 19% 19% 21%
3 9% 14%

20
Pages Viewed Per User: 2002 & 2004

 Alta Vista (2002) – mean 1 page viewed


 1 page=72.8% 2 pages=13% 3+ pages=14.1%

 Vivisimo (2004) – 1 page viewed

 Limited page viewing – little click through studies

21
Term Distribution

22
Top 10 Query Terms: 1997-2001

97 99 01
sex sex sex
nude free Christmas
free nude nude
pictures pictures pictures
new university new
university pics pics
women chat music
chat adult university
gay women games
girls new porn

23
Top 10 Alta Vista Query Terms - 2002
Vivisimo Query Terms - 2004

 Download
Free
 Sex
New
 Pictures
Software
 New
Windows
 Nude
Sex
 Music
School
 School
History
 How
Online
 Lyrics
Video
 Home
What

24
Top 10 Co-Occurring Query
Terms: 1997-1999

97 99
free - pics new - york
university - of free - sex
new - york free - pics
free - sex university - of
real - estate pictures - of
home - page greeting - cards
free - nude britney - spears
pictures - of free - nude
free - pictures free - pictures
high - school real - estate

25
Top 10 Co-Occurring Vivisimo Query
Terms - 2004

 and & and


 free & download
 for& the
 for & sale
 windows & xp
 to & in
 britney & spears
 what & the
 high & school
 for & in
26
Query Subjects: 1997-1999
Subject Category 97 99

1. Entertainment, recreation 16.9% 7.5% (6)


2. Sex, porn, preferences 16.8% 7.5% (4)
3. Commerce travel, economy 13.3% 24.4% (1)
4. Computers & the Internet 12.5% 10.9% (3)
5. Health & the sciences 9.5% 7.8% (5)
6. People, places, things 6.7% 20.3% (2)
7. Society, culture, religion 5.7% 4.2% (9)
8. Education & the humanities 5.6% 5.3% (8)
9. Performing & fine arts 5.4% 1.1% (11)
10. Government 3.4% 1.6% (10)
11. Incomprehensible 4.1% 6.8% (7)
27
Query Subjects – Alta Vista 2002 &
Vivisimo 2004
1. People/Places
Commerce, etc. 49.2% 21%
2. Commerce,
Indiscernibleetc. 12.5% 19%
3. Computers,
People/Places,
etc. etc.
12.4%15%
4. Health/sciences
Computers/Internet 7.4% 13%
5. Education/Humanities
Social/Culture 5%9%
6. Entertainment,
Health/Sciences etc. 6%4.5%
7. Sex/Pornography
Education/Humanities3.2% 5%
8. Society/Culture,
Sex/Pornography etc.4%3.1%
9. Government
Performing/Fine Arts1.5% 3%
10. Performing/Fine
Government Arts3%0.6%
11. Entertainment, etc. 2%

28
Web Query Trends: 1997-2004

 From 1997 to 1999 - shift from entertainment/sex


queries to e-commerce/people queries
 Growth of non-English queries
 Sex/pornography queries – U.S. less than 5% &
Europe 8%

29
Web Search Trends: 1997-2004

 Users: not many queries per search

 Terms: not many per query


– in traditional IR queries 3 to 7 times larger

 Low Boolean use

30
Web Search Trends: 1997-2004

 Users do not view many pages

 Growing query reformulation

 From 1997 to 2004 – many aspects Web searching did


NOT change dramatically

31
Web Search Trends: 1997-2004

 Frequency of use of terms is highly skewed


– terms that were used only once
– Web query language quite unique
 Sex represents a small proportion of all categories
– great many other topics searched
– diversity of subjects searched very high

32
Web Search Trends: 1997-2004

 Move to more complex searches – Alta Vista

 Successive searches – related searches over time

 Multitasking searches – multiple topic sessions

33
Vivisimo Project

34
Vivisimo Project

 CMU spin off company June 2000


 Meta-searching environment
 Dynamically generates clusters

35
36
Research Questions

 What are the general characteristics of Vivisimo


searching?

 What are the characteristics of cluster–based


searching?

 What is the extent of cluster expansion by users?

 What is the distribution of clusters?

37
Vivisimo Transaction Log Data

 March 28 - April 04, 2004

 1,200,000 Records

 193,277 IP Addresses

 Quantitative and Qualitative Analyses

38
Vivisimo Queries 2004

 Mean queries per day 135,304

 80% of queries on weekdays

 1 term (20.3%) 2 terms=29.4% 3+ terms (22.6%)

 Mean sessions < 1 minute (45%)

39
Language and Source Requests

 Indiscernible and Non-English 19%

 Language None-specified (90%)

 Source: Web (87.6%) GermanWeb (6.8%)

40
Cluster Analysis Transaction Log

 April 25 to May 02, 2004

 196,802 IP addresses

 4,219,925 records

41
Queries Per IP Address

90000
80000
70000
IP Addresses

60000
50000
40000
30000
20000
10000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Queries
42
Log File Records

 Frame Structure

 44% List Records

 29% Tree Records

Post Query Records


 48.2% records show clusters clicked

43
Cluster Expansion

2.50%

2.00%
Percentage of Records

1.50%

1.00%

0.50%

0.00%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Number of Clusters
44
Vivisimo User Summary

 Majority of searches on weekdays

 Over 100,000 searches entered per day

 Highest % of queries contained 2 terms

 Web primary information source

45
Vivisimo Summary

 Language Preferences low use


 Infrequent cluster tree manipulation

 50% - post-query records show cluster clicking


activity to view the results pages of a cluster.

46
47
Future Vivisimo Research

 Cluster analysis on per query basis

 Usability - including cluster label selection, cluster


depth, “find in clusters”

 Comparative analysis with Clusty transaction log

48
Conclusions

 Web search technology is changing, however many


user search characteristics are relatively stable

 More successive and multitasking searches

 Will the introduction of new Web search technology


(e.g. history, visualization) impact user search
characteristics?

49
Conclusions

 Need for more comparison of Web search engine


performance
 Comparison of single versus meta-search engines

 Need for better user-based evaluation measures

 Better usability testing of Web search engine


interfaces and techniques

50
Conclusions

 Need for improved users and better Web technology


 Technology not the complete answer – more user
awareness of search process
 Many spelling and Boolean errors
 Many Web search features need to be studied to
determine the way searchers use the Web (e.g.
Advanced Search?)

51
Ongoing Web Research Projects
 Vivisimo.com – clustering
 InfoSpace, Inc. – meta-search
 Alioplex – personalization
 Reuters Ltd – search redevelopment

 Amanda Spink & Jim Jansen (2006). Web Search:


Public Searching of the Web. Springer.[2nd Edition]

52

You might also like