Extra Material for the
paper :
“Regret-based Online
Ranking for a Growing Digital Library”
(pdf)
Abstract :
The most common
environment in which ranking is used takes a very specific form. Users
sequentially
generate queries in a digital library. For each query, ranking is applied to
order
a set of
relevant items from which the user selects his favorite. This is the case when
ranking
search results for pages on the World Wide Web or for merchandize on an
e-commerce site. In this work, we present a new online ranking algorithm, called NoRegret
KLRank. Our
algorithm is designed to use “clickthrough” information as it is provided by
the users
to improve future ranking decisions. More importantly, we show that its long
term
average
performance will converge to the best rate achievable by any competing fixed
ranking
policy selected with the benefit of hindsight. We show how to ensure that this
property
continues to hold as new items are added to the set thus requiring a richer
class of
ranking
policies. Finally, our empirical results show that, while in some context
NoRegret
KLRank
might be considered conservative, a greedy variant of this algorithm actually
outperforms
many popular ranking algorithms.
Original
CORA database can be found in : http://www.cs.umass.edu/~mccallum/data/cora-classify.tar.gz
(A.
McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the construction of
internet portals with machine learning. Information Retrieval, 3(2):127–163,
2000.)
List of
22000 queries used in our paper : CoraOnlineRankingQueries_22000.tar.gz
Updated May
1st, 2009