eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
20
10.4230/DagSemProc.07071.1
article
07071 Abstracts Collection – Web Information Retrieval and Linear Algebra Algorithms
Frommer, Andreas
Mahoney, Michael W.
Szyld, Daniel B.
From 12th to 16th February 2007, the Dagstuhl Seminar 07071 ``Web Information Retrieval and Linear Algebra Algorithms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.1/DagSemProc.07071.1.pdf
Information retrieval
Markov chains
PageRank
numerical linear algebra
low rank approximations
sparsity
ranking
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
3
10.4230/DagSemProc.07071.2
article
07071 Report on Dagstuhl Seminar – Web Information Retrieval and Linear Algebra Algorithms
Frommer, Andreas
Mahoney, Michael W.
Szyld, Daniel B.
A seminar concentrating on the intersection of the fields of information retrieval
and other web-related aspects with numerical and applied linear algebra techniques
was held with the attendance of scientists from industry and academia.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.2/DagSemProc.07071.2.pdf
Information retrieval
Markov chains
PageRank
numerical linear algebra
low rank approximations
sparsity
ranking
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
19
10.4230/DagSemProc.07071.3
article
A Deeper Investigation of PageRank as a Function of the Damping Factor
Boldi, Paolo
Santini, Massimo
Vigna, Sebastiano
PageRank is defined as the stationary state of a Markov chain. The chain is
obtained by perturbing the transition matrix induced by a web graph with a
damping factor $alpha$ that spreads uniformly part of the rank. The choice
of $alpha$ is eminently empirical, and in most cases the original suggestion
$alpha=0.85$ by Brin and Page is still used.
In this paper, we give a mathematical analysis of PageRank when
$alpha$ changes. In particular, we show that, contrarily to popular belief,
for real-world graphs values of $alpha$ close to $1$ do not give a more
meaningful ranking. Then, we give closed-form formulae for PageRank derivatives of
any order,
and by proving that the $k$-th iteration of the Power Method gives exactly the
PageRank value obtained using a Maclaurin polynomial of degree $k$, we show
how to obtain an approximation of the derivatives. Finally, we view PageRank
as a linear operator acting on the preference vector and
show a tight connection between iterated computation and derivation.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.3/DagSemProc.07071.3.pdf
PageRank
damping factor
Markov chains
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
18
10.4230/DagSemProc.07071.4
article
A Fast Algorithm for Matrix Balancing
Knight, Philip A.
Ruiz, Daniel
As long as a square nonnegative matrix A contains sufficient nonzero elements, then the matrix can be balanced, that is we can find a diagonal scaling of A that is doubly stochastic. A number of algorithms have been proposed to achieve the balancing, the most well known of these being the Sinkhorn-Knopp algorithm. In this paper we derive new algorithms based on inner-outer iteration
schemes. We show that the Sinkhorn-Knopp algorithm belongs to this family, but other members can converge much more quickly. In particular, we show that while stationary iterative methods offer little or no improvement in many cases, a scheme using a preconditioned conjugate gradient method as the inner
iteration can give quadratic convergence at low cost.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.4/DagSemProc.07071.4.pdf
Matrix balancing
Sinkhorn-Knopp algorithm
doubly stochastic matrix
conjugate gradient iteration
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
8
10.4230/DagSemProc.07071.5
article
An Inner/Outer Stationary Iteration for Computing PageRank
Gray, Andrew P.
Greif, Chen
Lau, Tracy
We present a stationary iterative scheme for PageRank computation. The algorithm is based on a linear system formulation of the problem, uses inner/outer iterations, and amounts to a simple preconditioning technique. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Convergence analysis shows that the algorithm is effective for a crude inner tolerance and is not particularly sensitive to the choice of the parameters involved. Numerical examples featuring matrices of dimensions up to approximately $10^7$ confirm the analytical results and demonstrate the accelerated convergence of the algorithm compared to the power method.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.5/DagSemProc.07071.5.pdf
PageRank
power method
stationary method
inner/outer iterations
damping factor
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
13
10.4230/DagSemProc.07071.6
article
Asynchronous Computation of PageRank computation in an interactive multithreading environment
Kollias, Giorgios
Gallopoulos, Efstratios
Numerical Linear Algebra has become almost indispensable in Web
Information Retrieval.
In this presentation we suggest that the asynchronous computation model
is an attractive paradigm for organizing concurrent computations
spanning data on Web scale. This suggestion is supported by
experiments which highlight some interesting characteristics of this
model as applied to 'page ranking' methods.
After an introduction on asynchronous computing in general and 'page
ranking' in particular, we present results from the asynchronous
compution of PageRank using typical combinations of execution units
(processes, threads) and communication mechanisms (message passing,
shared memory). Sound convergence properties predicted by theory are
numerically verified and interesting patterns of behavior are
unveiled. Our experiments were performed on Jylab, an evolving
environment enabling interactive multithreading and multiprocessing
computations. This work is supported by a Pythagoras-EPEAEK-II grant
and is conducted in collaboration with Daniel Szyld.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.6/DagSemProc.07071.6.pdf
Asynchronous
pagerank
multithreading
multiprocessing
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
27
10.4230/DagSemProc.07071.7
article
Convergence of iterative aggregation/disaggregation methods based on splittings with cyclic iteration matrices
Marek, Ivo
Pultarová, Ivana
Mayer, Petr
Iterative aggregation/disaggregation methods (IAD) belong to competitive tools for computation the characteristics of Markov chains as shown in some publications devoted to testing and comparing various methods designed to this purpose. According to Dayar T., Stewart W.J., ``Comparison of
partitioning techniques for two-level iterative solvers on large, sparse Markov chains,'' SIAM J. Sci. Comput., Vol.21, No. 5, 1691-1705 (2000), the IAD methods are effective in particular when applied to large ill posed problems. One of the purposes of this
paper is to contribute to a possible explanation of this fact. The
novelty may consist of the fact that the IAD algorithms do converge independently of whether the iteration matrix of the corresponding process is primitive or not. Some numerical tests
are presented and possible applications mentioned; e.g. computing the PageRank.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.7/DagSemProc.07071.7.pdf
Iterative aggregation methods
stochastic matrix
stationary probability vector
Markov chains
cyclic iteration matrix
Google matrix
PageRank.
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
17
10.4230/DagSemProc.07071.8
article
Exploiting Community Behavior for Enhanced Link Analysis and Web Search
Luxenburger, Julia
Weikum, Gerhard
Methods for Web link analysis and authority ranking such as PageRank are based on the assumption that a user endorses a Web page when creating a hyperlink to this page. There is a wealth of additional user-behavior information that could be considered for improving authority analysis, for example, the history of queries that a user community posed to a search engine over an extended time period, or observations about which query-result pages were clicked on and which ones were not clicked on after a user saw the summary snippets of the top-10 results.
We study enhancements of link analysis methods by incorporating additional user assessments based on query logs and click streams, including negative feedback when a query-result page does not satisfy the user demand or is even perceived as spam. Our methods use various novel forms of Markov models whose states correspond to users and queries in addition to Web pages and whose links also reflect the relationships derived from query-result clicks, query refinements, and explicit ratings.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.8/DagSemProc.07071.8.pdf
Query logs
link analysis
Markov reward model
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
6
10.4230/DagSemProc.07071.9
article
Extrapolation and minimization procedures for the PageRank vector
Brezinski, Claude
Redivo-Zaglia, Michela
An important problem in Web search is to determine the importance of
each page. This problem consists in computing, by the power method,
the left principal eigenvector (the PageRank vector) of a matrix
depending on a parameter $c$ which has to be chosen close to 1.
However, when $c$ is close to 1, the problem is ill-conditioned, and
the power method converges slowly. So, the idea developed in this
paper consists in computing the PageRank vector for several values
of $c$, and then to extrapolate them, by a conveniently chosen
rational function, at a point near 1. The choice of this
extrapolating function is based on the mathematical considerations
about the PageRank vector.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.9/DagSemProc.07071.9.pdf
Extrapolation
PageRank
Web matrix
eigenvector computation.
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
34
10.4230/DagSemProc.07071.10
article
Google Pageranking Problem: The Model and the Analysis
Serra Capizzano, Stefano
Let $A$ be a given $n$-by-$n$ complex matrix with eigenvalues $lambda
,lambda _{2},ldots ,lambda _{n}$. Suppose there are nonzero vectors $%
x,yin mathbb{C}^{n}$ such that $Ax=lambda x$, $y^{ast }A=lambda y^{ast
}$, and $y^{ast }x=1$. Let $vin mathbb{C}^{n}$ be such that $v^{ast }x=1$%
, let $cin mathbb{C}$, and assume that $lambda
eq clambda _{j}$ for
each $j=2,ldots ,n$. Define $A(c):=cA+(1-c)lambda xv^{ast }$. The eigenvalues of $%
A(c)$ are $lambda ,clambda _{2},ldots ,clambda _{n}$. Every
left eigenvector of $A(c)$ corresponding to $lambda $ is a scalar multiple of $%
y-z(c)$, in which the vector $z(c)$ is an explicit rational
function of $c$. If a standard form such as the Jordan canonical
form or the Schur triangular form is known for $A$, we show how to
obtain the corresponding standard form of $A(c)$.
The web hyper-link matrix $G(c)$ used by Google for computing the
PageRank is a special case in which $A$ is real, nonnegative, and
row stochastic (taking into consideration the dangling nodes),
$cin (0,1)$, $x$ is the vector of all ones, and $v$ is a positive
probability vector. The PageRank vector (the normalized dominant
left eigenvector of $G(c)$) is therefore an explicit rational
function of $c$. Extrapolation procedures on the complex field may
give a practical and efficient way to compute the PageRank vector
when $c$ is close to $1$.
A discussion on the model, on its adherence to reality, and on
possible variations is also considered.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.10/DagSemProc.07071.10.pdf
Google matrix
rank-one perturbation
Jordan canonical form
extrapolation formulae.
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
16
10.4230/DagSemProc.07071.11
article
Graph matching with type constraints on nodes and edges
Fraikin, Catherine
Van Dooren, Paul
In this paper, we consider two particular problems of directed
graph matching. The first problem concerns graphs with nodes that
have been subdivided into classes of different type. The second
problem treats graphs with edges of different types. In the two
cases, the matching process is based on a constrained projection
of the nodes and of the edges of both graphs in a lower
dimensional space. The procedures are formulated as non-convex
optimization problems. The objective functions use the adjacency
matrices and the constraints on the problem impose the isometry of
the so-called projections. Iterative algorithms are proposed to
solve the optimization problems. As illustration, we give an
example of graph matching for graphs with two types of nodes and
graphs with two types of edges.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.11/DagSemProc.07071.11.pdf
Graph matching
Optimization
Typed nodes
Typed edges
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
10
10.4230/DagSemProc.07071.12
article
Iteration at Different Levels: Multi-Level Methods fro Structured Markov Chains
Buchholz, Peter
For the stationary analysis of large Markov chains in continuous and discrete time a wide
variety of solution techniques has been applied in the past. Empirical
comparisons show that in particular so called multi-level approaches that
perform iterations at different levels are the most efficient
solvers for a wide class of Markov chains. The methods combine ideas from
aggregation disaggregation methods and algebraic multigrid.
The talk gives an overview of the basic ideas of multi level approaches and
shows which design alternatives for the algorithms exist. In particular it considers
different forms of defining levels, available alternatives to realize
prolongation and interpolation operations, different cycle types and different
stopping criteria for the smoothing operations at each level. The last part of
the talk is devoted to implementation issues and data structures that are
necessary for an efficient realization of multi-level methods.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.12/DagSemProc.07071.12.pdf
Stationary Analysis
Multi-Level Techniques
Kronecker Representation
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
17
10.4230/DagSemProc.07071.13
article
Multidamping simulation framework for link-based ranking
Kollias, Giorgios
Gallopoulos, Efstratios
We review methods for the
approximate computation of PageRank. Standard methods are based on
the eigenvector and linear system characterizations. Our starting
point are recent methods based on series representation whose
coefficients are damping functions, for example Linear Rank,
HyperRank and TotalRank, etc. We propose a multidamping framework
for interpreting PageRank and these methods. Multidamping is based
on some new useful properties of Google type matrices. The approach can
be generalized and could help in the exploration of new
approximations for list-based ranking. This is joint work with Georgios Kollias and is supported by a Pythagoras-EPEAEK-II grant.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.13/DagSemProc.07071.13.pdf
PageRank
Google
power method
eigenvalues
teleportation
list-based ranking
TotalRank
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
18
10.4230/DagSemProc.07071.14
article
Nonlinear Approximation and Image Representation using Wavelets
Guha, Sudipto
Harb, Boulos
We address the problem of finding sparse wavelet representations of high-dimensional vectors. We present a lower-bounding technique and use it to develop an algorithm for computing provably-approximate instance-specific representations minimizing general $ell_p$ distances under a wide variety of compactly-supported wavelet bases. More specifically, given a vector $f in mathbb{R}^n$, a compactly-supported wavelet basis, a sparsity constraint $B in mathbb{Z}$, and $pin[1,infty]$, our algorithm returns a $B$-term representation (a linear combination of $B$ vectors from the given basis) whose $ell_p$ distance from $f$ is a $O(log n)$ factor away from that of the optimal such representation of $f$. Our algorithm applies in the one-pass sublinear-space data streaming model of computation, and it generalize to weighted $p$-norms and multidimensional signals. Our technique also generalizes to a version of the problem where we are given a bit-budget rather than a term-budget. Furthermore, we use it to construct a emph{universal representation} that consists of at most $B(log n)^2$ terms and gives a $O(log n)$-approximation under all $p$-norms simultaneously.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.14/DagSemProc.07071.14.pdf
Nonlinear approximation
wavelets
approximation algorithms
streaming algorithms
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
3
10.4230/DagSemProc.07071.15
article
Stanford Matrix Considered Harmful
Vigna, Sebastiano
I discuss the implications of using small data sets for experiments related to the web graph.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.15/DagSemProc.07071.15.pdf
Weg graph
PageRank
HITS
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
18
10.4230/DagSemProc.07071.16
article
The Sinkhorn-Knopp Algorithm:Convergence and Applications
Knight, Philip A.
As long as a square nonnegative matrix $A$ contains sufficient nonzero
elements, the Sinkhorn-Knopp algorithm can be used to balance the matrix,
that is, to find a diagonal scaling of $A$ that is doubly stochastic.
We relate balancing to problems in traffic flow and describe how balancing
algorithms can be used to give a two sided measure of nodes in a graph. We
show that with an appropriate modification, the Sinkhorn-Knopp algorithm is a
natural candidate for computing the measure on enormous data sets.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.16/DagSemProc.07071.16.pdf
Matrix balancing
Sinkhorn-Knopp algorithm
PageRank
doubly stochastic matrix
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2007-06-28
7071
1
10
10.4230/DagSemProc.07071.17
article
Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative
Gleich, David
Glynn, Peter
Golub, Gene
Greif, Chen
The three results on the PageRank vector are preliminary but shed light on the eigenstructure of a PageRank modified Markov chain and what happens when changing the teleportation parameter in the PageRank model.
Computations with the derivative of the PageRank vector with respect to the teleportation parameter show predictive ability and identify an interesting set of pages from Wikipedia.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.17/DagSemProc.07071.17.pdf
PageRank
PageRank derivative
PageRank sensitivity
PageRank eigenstructure