Dagstuhl Seminar Proceedings, Volume 7071
Dagstuhl Seminar Proceedings
DagSemProc
https://www.dagstuhl.de/dagpub/1862-4405
https://dblp.org/db/series/dagstuhl
1862-4405
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
7071
2007
https://drops.dagstuhl.de/entities/volume/DagSemProc-volume-7071
07071 Abstracts Collection – Web Information Retrieval and Linear Algebra Algorithms
From 12th to 16th February 2007, the Dagstuhl Seminar 07071 ``Web Information Retrieval and Linear Algebra Algorithms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available.
Information retrieval
Markov chains
PageRank
numerical linear algebra
low rank approximations
sparsity
ranking
1-20
Regular Paper
Andreas
Frommer
Andreas Frommer
Michael W.
Mahoney
Michael W. Mahoney
Daniel B.
Szyld
Daniel B. Szyld
10.4230/DagSemProc.07071.1
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
07071 Report on Dagstuhl Seminar – Web Information Retrieval and Linear Algebra Algorithms
A seminar concentrating on the intersection of the fields of information retrieval
and other web-related aspects with numerical and applied linear algebra techniques
was held with the attendance of scientists from industry and academia.
Information retrieval
Markov chains
PageRank
numerical linear algebra
low rank approximations
sparsity
ranking
1-3
Regular Paper
Andreas
Frommer
Andreas Frommer
Michael W.
Mahoney
Michael W. Mahoney
Daniel B.
Szyld
Daniel B. Szyld
10.4230/DagSemProc.07071.2
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
A Deeper Investigation of PageRank as a Function of the Damping Factor
PageRank is defined as the stationary state of a Markov chain. The chain is
obtained by perturbing the transition matrix induced by a web graph with a
damping factor $alpha$ that spreads uniformly part of the rank. The choice
of $alpha$ is eminently empirical, and in most cases the original suggestion
$alpha=0.85$ by Brin and Page is still used.
In this paper, we give a mathematical analysis of PageRank when
$alpha$ changes. In particular, we show that, contrarily to popular belief,
for real-world graphs values of $alpha$ close to $1$ do not give a more
meaningful ranking. Then, we give closed-form formulae for PageRank derivatives of
any order,
and by proving that the $k$-th iteration of the Power Method gives exactly the
PageRank value obtained using a Maclaurin polynomial of degree $k$, we show
how to obtain an approximation of the derivatives. Finally, we view PageRank
as a linear operator acting on the preference vector and
show a tight connection between iterated computation and derivation.
PageRank
damping factor
Markov chains
1-19
Regular Paper
Paolo
Boldi
Paolo Boldi
Massimo
Santini
Massimo Santini
Sebastiano
Vigna
Sebastiano Vigna
10.4230/DagSemProc.07071.3
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
A Fast Algorithm for Matrix Balancing
As long as a square nonnegative matrix A contains sufficient nonzero elements, then the matrix can be balanced, that is we can find a diagonal scaling of A that is doubly stochastic. A number of algorithms have been proposed to achieve the balancing, the most well known of these being the Sinkhorn-Knopp algorithm. In this paper we derive new algorithms based on inner-outer iteration
schemes. We show that the Sinkhorn-Knopp algorithm belongs to this family, but other members can converge much more quickly. In particular, we show that while stationary iterative methods offer little or no improvement in many cases, a scheme using a preconditioned conjugate gradient method as the inner
iteration can give quadratic convergence at low cost.
Matrix balancing
Sinkhorn-Knopp algorithm
doubly stochastic matrix
conjugate gradient iteration
1-18
Regular Paper
Philip A.
Knight
Philip A. Knight
Daniel
Ruiz
Daniel Ruiz
10.4230/DagSemProc.07071.4
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
An Inner/Outer Stationary Iteration for Computing PageRank
We present a stationary iterative scheme for PageRank computation. The algorithm is based on a linear system formulation of the problem, uses inner/outer iterations, and amounts to a simple preconditioning technique. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Convergence analysis shows that the algorithm is effective for a crude inner tolerance and is not particularly sensitive to the choice of the parameters involved. Numerical examples featuring matrices of dimensions up to approximately $10^7$ confirm the analytical results and demonstrate the accelerated convergence of the algorithm compared to the power method.
PageRank
power method
stationary method
inner/outer iterations
damping factor
1-8
Regular Paper
Andrew P.
Gray
Andrew P. Gray
Chen
Greif
Chen Greif
Tracy
Lau
Tracy Lau
10.4230/DagSemProc.07071.5
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Asynchronous Computation of PageRank computation in an interactive multithreading environment
Numerical Linear Algebra has become almost indispensable in Web
Information Retrieval.
In this presentation we suggest that the asynchronous computation model
is an attractive paradigm for organizing concurrent computations
spanning data on Web scale. This suggestion is supported by
experiments which highlight some interesting characteristics of this
model as applied to 'page ranking' methods.
After an introduction on asynchronous computing in general and 'page
ranking' in particular, we present results from the asynchronous
compution of PageRank using typical combinations of execution units
(processes, threads) and communication mechanisms (message passing,
shared memory). Sound convergence properties predicted by theory are
numerically verified and interesting patterns of behavior are
unveiled. Our experiments were performed on Jylab, an evolving
environment enabling interactive multithreading and multiprocessing
computations. This work is supported by a Pythagoras-EPEAEK-II grant
and is conducted in collaboration with Daniel Szyld.
Asynchronous
pagerank
multithreading
multiprocessing
1-13
Regular Paper
Giorgios
Kollias
Giorgios Kollias
Efstratios
Gallopoulos
Efstratios Gallopoulos
10.4230/DagSemProc.07071.6
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Convergence of iterative aggregation/disaggregation methods based on splittings with cyclic iteration matrices
Iterative aggregation/disaggregation methods (IAD) belong to competitive tools for computation the characteristics of Markov chains as shown in some publications devoted to testing and comparing various methods designed to this purpose. According to Dayar T., Stewart W.J., ``Comparison of
partitioning techniques for two-level iterative solvers on large, sparse Markov chains,'' SIAM J. Sci. Comput., Vol.21, No. 5, 1691-1705 (2000), the IAD methods are effective in particular when applied to large ill posed problems. One of the purposes of this
paper is to contribute to a possible explanation of this fact. The
novelty may consist of the fact that the IAD algorithms do converge independently of whether the iteration matrix of the corresponding process is primitive or not. Some numerical tests
are presented and possible applications mentioned; e.g. computing the PageRank.
Iterative aggregation methods
stochastic matrix
stationary probability vector
Markov chains
cyclic iteration matrix
Google matrix
PageRank.
1-27
Regular Paper
Ivo
Marek
Ivo Marek
Ivana
Pultarová
Ivana Pultarová
Petr
Mayer
Petr Mayer
10.4230/DagSemProc.07071.7
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Exploiting Community Behavior for Enhanced Link Analysis and Web Search
Methods for Web link analysis and authority ranking such as PageRank are based on the assumption that a user endorses a Web page when creating a hyperlink to this page. There is a wealth of additional user-behavior information that could be considered for improving authority analysis, for example, the history of queries that a user community posed to a search engine over an extended time period, or observations about which query-result pages were clicked on and which ones were not clicked on after a user saw the summary snippets of the top-10 results.
We study enhancements of link analysis methods by incorporating additional user assessments based on query logs and click streams, including negative feedback when a query-result page does not satisfy the user demand or is even perceived as spam. Our methods use various novel forms of Markov models whose states correspond to users and queries in addition to Web pages and whose links also reflect the relationships derived from query-result clicks, query refinements, and explicit ratings.
Query logs
link analysis
Markov reward model
1-17
Regular Paper
Julia
Luxenburger
Julia Luxenburger
Gerhard
Weikum
Gerhard Weikum
10.4230/DagSemProc.07071.8
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Extrapolation and minimization procedures for the PageRank vector
An important problem in Web search is to determine the importance of
each page. This problem consists in computing, by the power method,
the left principal eigenvector (the PageRank vector) of a matrix
depending on a parameter $c$ which has to be chosen close to 1.
However, when $c$ is close to 1, the problem is ill-conditioned, and
the power method converges slowly. So, the idea developed in this
paper consists in computing the PageRank vector for several values
of $c$, and then to extrapolate them, by a conveniently chosen
rational function, at a point near 1. The choice of this
extrapolating function is based on the mathematical considerations
about the PageRank vector.
Extrapolation
PageRank
Web matrix
eigenvector computation.
1-6
Regular Paper
Claude
Brezinski
Claude Brezinski
Michela
Redivo-Zaglia
Michela Redivo-Zaglia
10.4230/DagSemProc.07071.9
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Google Pageranking Problem: The Model and the Analysis
Let $A$ be a given $n$-by-$n$ complex matrix with eigenvalues $lambda
,lambda _{2},ldots ,lambda _{n}$. Suppose there are nonzero vectors $%
x,yin mathbb{C}^{n}$ such that $Ax=lambda x$, $y^{ast }A=lambda y^{ast
}$, and $y^{ast }x=1$. Let $vin mathbb{C}^{n}$ be such that $v^{ast }x=1$%
, let $cin mathbb{C}$, and assume that $lambda
eq clambda _{j}$ for
each $j=2,ldots ,n$. Define $A(c):=cA+(1-c)lambda xv^{ast }$. The eigenvalues of $%
A(c)$ are $lambda ,clambda _{2},ldots ,clambda _{n}$. Every
left eigenvector of $A(c)$ corresponding to $lambda $ is a scalar multiple of $%
y-z(c)$, in which the vector $z(c)$ is an explicit rational
function of $c$. If a standard form such as the Jordan canonical
form or the Schur triangular form is known for $A$, we show how to
obtain the corresponding standard form of $A(c)$.
The web hyper-link matrix $G(c)$ used by Google for computing the
PageRank is a special case in which $A$ is real, nonnegative, and
row stochastic (taking into consideration the dangling nodes),
$cin (0,1)$, $x$ is the vector of all ones, and $v$ is a positive
probability vector. The PageRank vector (the normalized dominant
left eigenvector of $G(c)$) is therefore an explicit rational
function of $c$. Extrapolation procedures on the complex field may
give a practical and efficient way to compute the PageRank vector
when $c$ is close to $1$.
A discussion on the model, on its adherence to reality, and on
possible variations is also considered.
Google matrix
rank-one perturbation
Jordan canonical form
extrapolation formulae.
1-34
Regular Paper
Stefano
Serra Capizzano
Stefano Serra Capizzano
10.4230/DagSemProc.07071.10
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Graph matching with type constraints on nodes and edges
In this paper, we consider two particular problems of directed
graph matching. The first problem concerns graphs with nodes that
have been subdivided into classes of different type. The second
problem treats graphs with edges of different types. In the two
cases, the matching process is based on a constrained projection
of the nodes and of the edges of both graphs in a lower
dimensional space. The procedures are formulated as non-convex
optimization problems. The objective functions use the adjacency
matrices and the constraints on the problem impose the isometry of
the so-called projections. Iterative algorithms are proposed to
solve the optimization problems. As illustration, we give an
example of graph matching for graphs with two types of nodes and
graphs with two types of edges.
Graph matching
Optimization
Typed nodes
Typed edges
1-16
Regular Paper
Catherine
Fraikin
Catherine Fraikin
Paul
Van Dooren
Paul Van Dooren
10.4230/DagSemProc.07071.11
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Iteration at Different Levels: Multi-Level Methods fro Structured Markov Chains
For the stationary analysis of large Markov chains in continuous and discrete time a wide
variety of solution techniques has been applied in the past. Empirical
comparisons show that in particular so called multi-level approaches that
perform iterations at different levels are the most efficient
solvers for a wide class of Markov chains. The methods combine ideas from
aggregation disaggregation methods and algebraic multigrid.
The talk gives an overview of the basic ideas of multi level approaches and
shows which design alternatives for the algorithms exist. In particular it considers
different forms of defining levels, available alternatives to realize
prolongation and interpolation operations, different cycle types and different
stopping criteria for the smoothing operations at each level. The last part of
the talk is devoted to implementation issues and data structures that are
necessary for an efficient realization of multi-level methods.
Stationary Analysis
Multi-Level Techniques
Kronecker Representation
1-10
Regular Paper
Peter
Buchholz
Peter Buchholz
10.4230/DagSemProc.07071.12
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Multidamping simulation framework for link-based ranking
We review methods for the
approximate computation of PageRank. Standard methods are based on
the eigenvector and linear system characterizations. Our starting
point are recent methods based on series representation whose
coefficients are damping functions, for example Linear Rank,
HyperRank and TotalRank, etc. We propose a multidamping framework
for interpreting PageRank and these methods. Multidamping is based
on some new useful properties of Google type matrices. The approach can
be generalized and could help in the exploration of new
approximations for list-based ranking. This is joint work with Georgios Kollias and is supported by a Pythagoras-EPEAEK-II grant.
PageRank
Google
power method
eigenvalues
teleportation
list-based ranking
TotalRank
1-17
Regular Paper
Giorgios
Kollias
Giorgios Kollias
Efstratios
Gallopoulos
Efstratios Gallopoulos
10.4230/DagSemProc.07071.13
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Nonlinear Approximation and Image Representation using Wavelets
We address the problem of finding sparse wavelet representations of high-dimensional vectors. We present a lower-bounding technique and use it to develop an algorithm for computing provably-approximate instance-specific representations minimizing general $ell_p$ distances under a wide variety of compactly-supported wavelet bases. More specifically, given a vector $f in mathbb{R}^n$, a compactly-supported wavelet basis, a sparsity constraint $B in mathbb{Z}$, and $pin[1,infty]$, our algorithm returns a $B$-term representation (a linear combination of $B$ vectors from the given basis) whose $ell_p$ distance from $f$ is a $O(log n)$ factor away from that of the optimal such representation of $f$. Our algorithm applies in the one-pass sublinear-space data streaming model of computation, and it generalize to weighted $p$-norms and multidimensional signals. Our technique also generalizes to a version of the problem where we are given a bit-budget rather than a term-budget. Furthermore, we use it to construct a emph{universal representation} that consists of at most $B(log n)^2$ terms and gives a $O(log n)$-approximation under all $p$-norms simultaneously.
Nonlinear approximation
wavelets
approximation algorithms
streaming algorithms
1-18
Regular Paper
Sudipto
Guha
Sudipto Guha
Boulos
Harb
Boulos Harb
10.4230/DagSemProc.07071.14
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Stanford Matrix Considered Harmful
I discuss the implications of using small data sets for experiments related to the web graph.
Weg graph
PageRank
HITS
1-3
Regular Paper
Sebastiano
Vigna
Sebastiano Vigna
10.4230/DagSemProc.07071.15
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
The Sinkhorn-Knopp Algorithm:Convergence and Applications
As long as a square nonnegative matrix $A$ contains sufficient nonzero
elements, the Sinkhorn-Knopp algorithm can be used to balance the matrix,
that is, to find a diagonal scaling of $A$ that is doubly stochastic.
We relate balancing to problems in traffic flow and describe how balancing
algorithms can be used to give a two sided measure of nodes in a graph. We
show that with an appropriate modification, the Sinkhorn-Knopp algorithm is a
natural candidate for computing the measure on enormous data sets.
Matrix balancing
Sinkhorn-Knopp algorithm
PageRank
doubly stochastic matrix
1-18
Regular Paper
Philip A.
Knight
Philip A. Knight
10.4230/DagSemProc.07071.16
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative
The three results on the PageRank vector are preliminary but shed light on the eigenstructure of a PageRank modified Markov chain and what happens when changing the teleportation parameter in the PageRank model.
Computations with the derivative of the PageRank vector with respect to the teleportation parameter show predictive ability and identify an interesting set of pages from Wikipedia.
PageRank
PageRank derivative
PageRank sensitivity
PageRank eigenstructure
1-10
Regular Paper
David
Gleich
David Gleich
Peter
Glynn
Peter Glynn
Gene
Golub
Gene Golub
Chen
Greif
Chen Greif
10.4230/DagSemProc.07071.17
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode