07071 Abstracts Collection – Web Information Retrieval and Linear Algebra Algorithms

eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 20 10.4230/DagSemProc.07071.1 article 07071 Abstracts Collection – Web Information Retrieval and Linear Algebra Algorithms Frommer, Andreas Mahoney, Michael W. Szyld, Daniel B. From 12th to 16th February 2007, the Dagstuhl Seminar 07071 ``Web Information Retrieval and Linear Algebra Algorithms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.1/DagSemProc.07071.1.pdf Information retrieval Markov chains PageRank numerical linear algebra low rank approximations sparsity ranking eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 3 10.4230/DagSemProc.07071.2 article 07071 Report on Dagstuhl Seminar – Web Information Retrieval and Linear Algebra Algorithms Frommer, Andreas Mahoney, Michael W. Szyld, Daniel B. A seminar concentrating on the intersection of the fields of information retrieval and other web-related aspects with numerical and applied linear algebra techniques was held with the attendance of scientists from industry and academia. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.2/DagSemProc.07071.2.pdf Information retrieval Markov chains PageRank numerical linear algebra low rank approximations sparsity ranking eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 19 10.4230/DagSemProc.07071.3 article A Deeper Investigation of PageRank as a Function of the Damping Factor Boldi, Paolo Santini, Massimo Vigna, Sebastiano PageRank is defined as the stationary state of a Markov chain. The chain is obtained by perturbing the transition matrix induced by a web graph with a damping factor $alpha$ that spreads uniformly part of the rank. The choice of $alpha$ is eminently empirical, and in most cases the original suggestion $alpha=0.85$ by Brin and Page is still used. In this paper, we give a mathematical analysis of PageRank when $alpha$ changes. In particular, we show that, contrarily to popular belief, for real-world graphs values of $alpha$ close to $1$ do not give a more meaningful ranking. Then, we give closed-form formulae for PageRank derivatives of any order, and by proving that the $k$-th iteration of the Power Method gives exactly the PageRank value obtained using a Maclaurin polynomial of degree $k$, we show how to obtain an approximation of the derivatives. Finally, we view PageRank as a linear operator acting on the preference vector and show a tight connection between iterated computation and derivation. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.3/DagSemProc.07071.3.pdf PageRank damping factor Markov chains eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 18 10.4230/DagSemProc.07071.4 article A Fast Algorithm for Matrix Balancing Knight, Philip A. Ruiz, Daniel As long as a square nonnegative matrix A contains sufficient nonzero elements, then the matrix can be balanced, that is we can find a diagonal scaling of A that is doubly stochastic. A number of algorithms have been proposed to achieve the balancing, the most well known of these being the Sinkhorn-Knopp algorithm. In this paper we derive new algorithms based on inner-outer iteration schemes. We show that the Sinkhorn-Knopp algorithm belongs to this family, but other members can converge much more quickly. In particular, we show that while stationary iterative methods offer little or no improvement in many cases, a scheme using a preconditioned conjugate gradient method as the inner iteration can give quadratic convergence at low cost. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.4/DagSemProc.07071.4.pdf Matrix balancing Sinkhorn-Knopp algorithm doubly stochastic matrix conjugate gradient iteration eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 8 10.4230/DagSemProc.07071.5 article An Inner/Outer Stationary Iteration for Computing PageRank Gray, Andrew P. Greif, Chen Lau, Tracy We present a stationary iterative scheme for PageRank computation. The algorithm is based on a linear system formulation of the problem, uses inner/outer iterations, and amounts to a simple preconditioning technique. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Convergence analysis shows that the algorithm is effective for a crude inner tolerance and is not particularly sensitive to the choice of the parameters involved. Numerical examples featuring matrices of dimensions up to approximately $10^7$ confirm the analytical results and demonstrate the accelerated convergence of the algorithm compared to the power method. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.5/DagSemProc.07071.5.pdf PageRank power method stationary method inner/outer iterations damping factor eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 13 10.4230/DagSemProc.07071.6 article Asynchronous Computation of PageRank computation in an interactive multithreading environment Kollias, Giorgios Gallopoulos, Efstratios Numerical Linear Algebra has become almost indispensable in Web Information Retrieval. In this presentation we suggest that the asynchronous computation model is an attractive paradigm for organizing concurrent computations spanning data on Web scale. This suggestion is supported by experiments which highlight some interesting characteristics of this model as applied to 'page ranking' methods. After an introduction on asynchronous computing in general and 'page ranking' in particular, we present results from the asynchronous compution of PageRank using typical combinations of execution units (processes, threads) and communication mechanisms (message passing, shared memory). Sound convergence properties predicted by theory are numerically verified and interesting patterns of behavior are unveiled. Our experiments were performed on Jylab, an evolving environment enabling interactive multithreading and multiprocessing computations. This work is supported by a Pythagoras-EPEAEK-II grant and is conducted in collaboration with Daniel Szyld. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.6/DagSemProc.07071.6.pdf Asynchronous pagerank multithreading multiprocessing eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 27 10.4230/DagSemProc.07071.7 article Convergence of iterative aggregation/disaggregation methods based on splittings with cyclic iteration matrices Marek, Ivo Pultarová, Ivana Mayer, Petr Iterative aggregation/disaggregation methods (IAD) belong to competitive tools for computation the characteristics of Markov chains as shown in some publications devoted to testing and comparing various methods designed to this purpose. According to Dayar T., Stewart W.J., ``Comparison of partitioning techniques for two-level iterative solvers on large, sparse Markov chains,'' SIAM J. Sci. Comput., Vol.21, No. 5, 1691-1705 (2000), the IAD methods are effective in particular when applied to large ill posed problems. One of the purposes of this paper is to contribute to a possible explanation of this fact. The novelty may consist of the fact that the IAD algorithms do converge independently of whether the iteration matrix of the corresponding process is primitive or not. Some numerical tests are presented and possible applications mentioned; e.g. computing the PageRank. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.7/DagSemProc.07071.7.pdf Iterative aggregation methods stochastic matrix stationary probability vector Markov chains cyclic iteration matrix Google matrix PageRank. eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 17 10.4230/DagSemProc.07071.8 article Exploiting Community Behavior for Enhanced Link Analysis and Web Search Luxenburger, Julia Weikum, Gerhard Methods for Web link analysis and authority ranking such as PageRank are based on the assumption that a user endorses a Web page when creating a hyperlink to this page. There is a wealth of additional user-behavior information that could be considered for improving authority analysis, for example, the history of queries that a user community posed to a search engine over an extended time period, or observations about which query-result pages were clicked on and which ones were not clicked on after a user saw the summary snippets of the top-10 results. We study enhancements of link analysis methods by incorporating additional user assessments based on query logs and click streams, including negative feedback when a query-result page does not satisfy the user demand or is even perceived as spam. Our methods use various novel forms of Markov models whose states correspond to users and queries in addition to Web pages and whose links also reflect the relationships derived from query-result clicks, query refinements, and explicit ratings. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.8/DagSemProc.07071.8.pdf Query logs link analysis Markov reward model eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 6 10.4230/DagSemProc.07071.9 article Extrapolation and minimization procedures for the PageRank vector Brezinski, Claude Redivo-Zaglia, Michela An important problem in Web search is to determine the importance of each page. This problem consists in computing, by the power method, the left principal eigenvector (the PageRank vector) of a matrix depending on a parameter $c$ which has to be chosen close to 1. However, when $c$ is close to 1, the problem is ill-conditioned, and the power method converges slowly. So, the idea developed in this paper consists in computing the PageRank vector for several values of $c$, and then to extrapolate them, by a conveniently chosen rational function, at a point near 1. The choice of this extrapolating function is based on the mathematical considerations about the PageRank vector. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.9/DagSemProc.07071.9.pdf Extrapolation PageRank Web matrix eigenvector computation. eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 34 10.4230/DagSemProc.07071.10 article Google Pageranking Problem: The Model and the Analysis Serra Capizzano, Stefano Let $A$ be a given $n$-by-$n$ complex matrix with eigenvalues $lambda ,lambda _{2},ldots ,lambda _{n}$. Suppose there are nonzero vectors $% x,yin mathbb{C}^{n}$ such that $Ax=lambda x$, $y^{ast }A=lambda y^{ast }$, and $y^{ast }x=1$. Let $vin mathbb{C}^{n}$ be such that $v^{ast }x=1$% , let $cin mathbb{C}$, and assume that $lambda eq clambda _{j}$ for each $j=2,ldots ,n$. Define $A(c):=cA+(1-c)lambda xv^{ast }$. The eigenvalues of $% A(c)$ are $lambda ,clambda _{2},ldots ,clambda _{n}$. Every left eigenvector of $A(c)$ corresponding to $lambda $ is a scalar multiple of $% y-z(c)$, in which the vector $z(c)$ is an explicit rational function of $c$. If a standard form such as the Jordan canonical form or the Schur triangular form is known for $A$, we show how to obtain the corresponding standard form of $A(c)$. The web hyper-link matrix $G(c)$ used by Google for computing the PageRank is a special case in which $A$ is real, nonnegative, and row stochastic (taking into consideration the dangling nodes), $cin (0,1)$, $x$ is the vector of all ones, and $v$ is a positive probability vector. The PageRank vector (the normalized dominant left eigenvector of $G(c)$) is therefore an explicit rational function of $c$. Extrapolation procedures on the complex field may give a practical and efficient way to compute the PageRank vector when $c$ is close to $1$. A discussion on the model, on its adherence to reality, and on possible variations is also considered. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.10/DagSemProc.07071.10.pdf Google matrix rank-one perturbation Jordan canonical form extrapolation formulae. eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 16 10.4230/DagSemProc.07071.11 article Graph matching with type constraints on nodes and edges Fraikin, Catherine Van Dooren, Paul In this paper, we consider two particular problems of directed graph matching. The first problem concerns graphs with nodes that have been subdivided into classes of different type. The second problem treats graphs with edges of different types. In the two cases, the matching process is based on a constrained projection of the nodes and of the edges of both graphs in a lower dimensional space. The procedures are formulated as non-convex optimization problems. The objective functions use the adjacency matrices and the constraints on the problem impose the isometry of the so-called projections. Iterative algorithms are proposed to solve the optimization problems. As illustration, we give an example of graph matching for graphs with two types of nodes and graphs with two types of edges. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.11/DagSemProc.07071.11.pdf Graph matching Optimization Typed nodes Typed edges eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 10 10.4230/DagSemProc.07071.12 article Iteration at Different Levels: Multi-Level Methods fro Structured Markov Chains Buchholz, Peter For the stationary analysis of large Markov chains in continuous and discrete time a wide variety of solution techniques has been applied in the past. Empirical comparisons show that in particular so called multi-level approaches that perform iterations at different levels are the most efficient solvers for a wide class of Markov chains. The methods combine ideas from aggregation disaggregation methods and algebraic multigrid. The talk gives an overview of the basic ideas of multi level approaches and shows which design alternatives for the algorithms exist. In particular it considers different forms of defining levels, available alternatives to realize prolongation and interpolation operations, different cycle types and different stopping criteria for the smoothing operations at each level. The last part of the talk is devoted to implementation issues and data structures that are necessary for an efficient realization of multi-level methods. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.12/DagSemProc.07071.12.pdf Stationary Analysis Multi-Level Techniques Kronecker Representation eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 17 10.4230/DagSemProc.07071.13 article Multidamping simulation framework for link-based ranking Kollias, Giorgios Gallopoulos, Efstratios We review methods for the approximate computation of PageRank. Standard methods are based on the eigenvector and linear system characterizations. Our starting point are recent methods based on series representation whose coefficients are damping functions, for example Linear Rank, HyperRank and TotalRank, etc. We propose a multidamping framework for interpreting PageRank and these methods. Multidamping is based on some new useful properties of Google type matrices. The approach can be generalized and could help in the exploration of new approximations for list-based ranking. This is joint work with Georgios Kollias and is supported by a Pythagoras-EPEAEK-II grant. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.13/DagSemProc.07071.13.pdf PageRank Google power method eigenvalues teleportation list-based ranking TotalRank eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 18 10.4230/DagSemProc.07071.14 article Nonlinear Approximation and Image Representation using Wavelets Guha, Sudipto Harb, Boulos We address the problem of finding sparse wavelet representations of high-dimensional vectors. We present a lower-bounding technique and use it to develop an algorithm for computing provably-approximate instance-specific representations minimizing general $ell_p$ distances under a wide variety of compactly-supported wavelet bases. More specifically, given a vector $f in mathbb{R}^n$, a compactly-supported wavelet basis, a sparsity constraint $B in mathbb{Z}$, and $pin[1,infty]$, our algorithm returns a $B$-term representation (a linear combination of $B$ vectors from the given basis) whose $ell_p$ distance from $f$ is a $O(log n)$ factor away from that of the optimal such representation of $f$. Our algorithm applies in the one-pass sublinear-space data streaming model of computation, and it generalize to weighted $p$-norms and multidimensional signals. Our technique also generalizes to a version of the problem where we are given a bit-budget rather than a term-budget. Furthermore, we use it to construct a emph{universal representation} that consists of at most $B(log n)^2$ terms and gives a $O(log n)$-approximation under all $p$-norms simultaneously. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.14/DagSemProc.07071.14.pdf Nonlinear approximation wavelets approximation algorithms streaming algorithms eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 3 10.4230/DagSemProc.07071.15 article Stanford Matrix Considered Harmful Vigna, Sebastiano I discuss the implications of using small data sets for experiments related to the web graph. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.15/DagSemProc.07071.15.pdf Weg graph PageRank HITS eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 18 10.4230/DagSemProc.07071.16 article The Sinkhorn-Knopp Algorithm:Convergence and Applications Knight, Philip A. As long as a square nonnegative matrix $A$ contains sufficient nonzero elements, the Sinkhorn-Knopp algorithm can be used to balance the matrix, that is, to find a diagonal scaling of $A$ that is doubly stochastic. We relate balancing to problems in traffic flow and describe how balancing algorithms can be used to give a two sided measure of nodes in a graph. We show that with an appropriate modification, the Sinkhorn-Knopp algorithm is a natural candidate for computing the measure on enormous data sets. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.16/DagSemProc.07071.16.pdf Matrix balancing Sinkhorn-Knopp algorithm PageRank doubly stochastic matrix eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Dagstuhl Seminar Proceedings 1862-4405 2007-06-28 7071 1 10 10.4230/DagSemProc.07071.17 article Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative Gleich, David Glynn, Peter Golub, Gene Greif, Chen The three results on the PageRank vector are preliminary but shed light on the eigenstructure of a PageRank modified Markov chain and what happens when changing the teleportation parameter in the PageRank model. Computations with the derivative of the PageRank vector with respect to the teleportation parameter show predictive ability and identify an interesting set of pages from Wikipedia. https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.17/DagSemProc.07071.17.pdf PageRank PageRank derivative PageRank sensitivity PageRank eigenstructure

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.1</doi>

<documentType>article</documentType>

<title language="eng">07071 Abstracts Collection – Web Information Retrieval and Linear Algebra Algorithms</title>

<name>Frommer, Andreas</name>

</author>

<name>Mahoney, Michael W.</name>

</author>

<name>Szyld, Daniel B.</name>

</author>

</authors>

<abstract language="eng">From 12th to 16th February 2007, the Dagstuhl Seminar 07071 ``Web Information Retrieval and Linear Algebra Algorithms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.1/DagSemProc.07071.1.pdf</fullTextUrl>

<keyword>Information retrieval</keyword>

<keyword>Markov chains</keyword>

<keyword>PageRank</keyword>

<keyword>numerical linear algebra</keyword>

<keyword>low rank approximations</keyword>

<keyword>sparsity</keyword>

<keyword>ranking</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.2</doi>

<documentType>article</documentType>

<title language="eng">07071 Report on Dagstuhl Seminar – Web Information Retrieval and Linear Algebra Algorithms</title>

<name>Frommer, Andreas</name>

</author>

<name>Mahoney, Michael W.</name>

</author>

<name>Szyld, Daniel B.</name>

</author>

</authors>

<abstract language="eng">A seminar concentrating on the intersection of the fields of information retrieval and other web-related aspects with numerical and applied linear algebra techniques was held with the attendance of scientists from industry and academia.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.2/DagSemProc.07071.2.pdf</fullTextUrl>

<keyword>Information retrieval</keyword>

<keyword>Markov chains</keyword>

<keyword>PageRank</keyword>

<keyword>numerical linear algebra</keyword>

<keyword>low rank approximations</keyword>

<keyword>sparsity</keyword>

<keyword>ranking</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.3</doi>

<documentType>article</documentType>

<title language="eng">A Deeper Investigation of PageRank as a Function of the Damping Factor</title>

<name>Boldi, Paolo</name>

</author>

<name>Santini, Massimo</name>

</author>

<name>Vigna, Sebastiano</name>

</author>

</authors>

<abstract language="eng">PageRank is defined as the stationary state of a Markov chain. The chain is obtained by perturbing the transition matrix induced by a web graph with a damping factor $alpha$ that spreads uniformly part of the rank. The choice of $alpha$ is eminently empirical, and in most cases the original suggestion $alpha=0.85$ by Brin and Page is still used. In this paper, we give a mathematical analysis of PageRank when $alpha$ changes. In particular, we show that, contrarily to popular belief, for real-world graphs values of $alpha$ close to $1$ do not give a more meaningful ranking. Then, we give closed-form formulae for PageRank derivatives of any order, and by proving that the $k$-th iteration of the Power Method gives exactly the PageRank value obtained using a Maclaurin polynomial of degree $k$, we show how to obtain an approximation of the derivatives. Finally, we view PageRank as a linear operator acting on the preference vector and show a tight connection between iterated computation and derivation.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.3/DagSemProc.07071.3.pdf</fullTextUrl>

<keyword>PageRank</keyword>

<keyword>damping factor</keyword>

<keyword>Markov chains</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.4</doi>

<documentType>article</documentType>

<title language="eng">A Fast Algorithm for Matrix Balancing</title>

<name>Knight, Philip A.</name>

</author>

<name>Ruiz, Daniel</name>

</author>

</authors>

<abstract language="eng">As long as a square nonnegative matrix A contains sufficient nonzero elements, then the matrix can be balanced, that is we can find a diagonal scaling of A that is doubly stochastic. A number of algorithms have been proposed to achieve the balancing, the most well known of these being the Sinkhorn-Knopp algorithm. In this paper we derive new algorithms based on inner-outer iteration schemes. We show that the Sinkhorn-Knopp algorithm belongs to this family, but other members can converge much more quickly. In particular, we show that while stationary iterative methods offer little or no improvement in many cases, a scheme using a preconditioned conjugate gradient method as the inner iteration can give quadratic convergence at low cost.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.4/DagSemProc.07071.4.pdf</fullTextUrl>

<keyword>Matrix balancing</keyword>

<keyword>Sinkhorn-Knopp algorithm</keyword>

<keyword>doubly stochastic matrix</keyword>

<keyword>conjugate gradient iteration</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.5</doi>

<documentType>article</documentType>

<title language="eng">An Inner/Outer Stationary Iteration for Computing PageRank</title>

<name>Gray, Andrew P.</name>

</author>

<name>Greif, Chen</name>

</author>

<name>Lau, Tracy</name>

</author>

</authors>

<abstract language="eng">We present a stationary iterative scheme for PageRank computation. The algorithm is based on a linear system formulation of the problem, uses inner/outer iterations, and amounts to a simple preconditioning technique. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Convergence analysis shows that the algorithm is effective for a crude inner tolerance and is not particularly sensitive to the choice of the parameters involved. Numerical examples featuring matrices of dimensions up to approximately $10^7$ confirm the analytical results and demonstrate the accelerated convergence of the algorithm compared to the power method.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.5/DagSemProc.07071.5.pdf</fullTextUrl>

<keyword>PageRank</keyword>

<keyword>power method</keyword>

<keyword>stationary method</keyword>

<keyword>inner/outer iterations</keyword>

<keyword>damping factor</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.6</doi>

<documentType>article</documentType>

<title language="eng">Asynchronous Computation of PageRank computation in an interactive multithreading environment</title>

<name>Kollias, Giorgios</name>

</author>

<name>Gallopoulos, Efstratios</name>

</author>

</authors>

<abstract language="eng">Numerical Linear Algebra has become almost indispensable in Web Information Retrieval. In this presentation we suggest that the asynchronous computation model is an attractive paradigm for organizing concurrent computations spanning data on Web scale. This suggestion is supported by experiments which highlight some interesting characteristics of this model as applied to 'page ranking' methods. After an introduction on asynchronous computing in general and 'page ranking' in particular, we present results from the asynchronous compution of PageRank using typical combinations of execution units (processes, threads) and communication mechanisms (message passing, shared memory). Sound convergence properties predicted by theory are numerically verified and interesting patterns of behavior are unveiled. Our experiments were performed on Jylab, an evolving environment enabling interactive multithreading and multiprocessing computations. This work is supported by a Pythagoras-EPEAEK-II grant and is conducted in collaboration with Daniel Szyld.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.6/DagSemProc.07071.6.pdf</fullTextUrl>

<keyword>Asynchronous</keyword>

<keyword>pagerank</keyword>

<keyword>multithreading</keyword>

<keyword>multiprocessing</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.7</doi>

<documentType>article</documentType>

<title language="eng">Convergence of iterative aggregation/disaggregation methods based on splittings with cyclic iteration matrices</title>

<name>Marek, Ivo</name>

</author>

<name>Pultarová, Ivana</name>

</author>

<name>Mayer, Petr</name>

</author>

</authors>

<abstract language="eng">Iterative aggregation/disaggregation methods (IAD) belong to competitive tools for computation the characteristics of Markov chains as shown in some publications devoted to testing and comparing various methods designed to this purpose. According to Dayar T., Stewart W.J., ``Comparison of partitioning techniques for two-level iterative solvers on large, sparse Markov chains,'' SIAM J. Sci. Comput., Vol.21, No. 5, 1691-1705 (2000), the IAD methods are effective in particular when applied to large ill posed problems. One of the purposes of this paper is to contribute to a possible explanation of this fact. The novelty may consist of the fact that the IAD algorithms do converge independently of whether the iteration matrix of the corresponding process is primitive or not. Some numerical tests are presented and possible applications mentioned; e.g. computing the PageRank.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.7/DagSemProc.07071.7.pdf</fullTextUrl>

<keyword>Iterative aggregation methods</keyword>

<keyword>stochastic matrix</keyword>

<keyword>stationary probability vector</keyword>

<keyword>Markov chains</keyword>

<keyword>cyclic iteration matrix</keyword>

<keyword>Google matrix</keyword>

<keyword>PageRank.</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.8</doi>

<documentType>article</documentType>

<title language="eng">Exploiting Community Behavior for Enhanced Link Analysis and Web Search</title>

<name>Luxenburger, Julia</name>

</author>

<name>Weikum, Gerhard</name>

</author>

</authors>

<abstract language="eng">Methods for Web link analysis and authority ranking such as PageRank are based on the assumption that a user endorses a Web page when creating a hyperlink to this page. There is a wealth of additional user-behavior information that could be considered for improving authority analysis, for example, the history of queries that a user community posed to a search engine over an extended time period, or observations about which query-result pages were clicked on and which ones were not clicked on after a user saw the summary snippets of the top-10 results. We study enhancements of link analysis methods by incorporating additional user assessments based on query logs and click streams, including negative feedback when a query-result page does not satisfy the user demand or is even perceived as spam. Our methods use various novel forms of Markov models whose states correspond to users and queries in addition to Web pages and whose links also reflect the relationships derived from query-result clicks, query refinements, and explicit ratings.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.8/DagSemProc.07071.8.pdf</fullTextUrl>

<keyword>Query logs</keyword>

<keyword>link analysis</keyword>

<keyword>Markov reward model</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.9</doi>

<documentType>article</documentType>

<title language="eng">Extrapolation and minimization procedures for the PageRank vector</title>

<name>Brezinski, Claude</name>

</author>

<name>Redivo-Zaglia, Michela</name>

</author>

</authors>

<abstract language="eng">An important problem in Web search is to determine the importance of each page. This problem consists in computing, by the power method, the left principal eigenvector (the PageRank vector) of a matrix depending on a parameter $c$ which has to be chosen close to 1. However, when $c$ is close to 1, the problem is ill-conditioned, and the power method converges slowly. So, the idea developed in this paper consists in computing the PageRank vector for several values of $c$, and then to extrapolate them, by a conveniently chosen rational function, at a point near 1. The choice of this extrapolating function is based on the mathematical considerations about the PageRank vector.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.9/DagSemProc.07071.9.pdf</fullTextUrl>

<keyword>Extrapolation</keyword>

<keyword>PageRank</keyword>

<keyword>Web matrix</keyword>

<keyword>eigenvector computation.</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.10</doi>

<documentType>article</documentType>

<title language="eng">Google Pageranking Problem: The Model and the Analysis</title>

<name>Serra Capizzano, Stefano</name>

</author>

</authors>

<abstract language="eng">Let $A$ be a given $n$-by-$n$ complex matrix with eigenvalues $lambda ,lambda _{2},ldots ,lambda _{n}$. Suppose there are nonzero vectors $% x,yin mathbb{C}^{n}$ such that $Ax=lambda x$, $y^{ast }A=lambda y^{ast }$, and $y^{ast }x=1$. Let $vin mathbb{C}^{n}$ be such that $v^{ast }x=1$% , let $cin mathbb{C}$, and assume that $lambda eq clambda _{j}$ for each $j=2,ldots ,n$. Define $A(c):=cA+(1-c)lambda xv^{ast }$. The eigenvalues of $% A(c)$ are $lambda ,clambda _{2},ldots ,clambda _{n}$. Every left eigenvector of $A(c)$ corresponding to $lambda $ is a scalar multiple of $% y-z(c)$, in which the vector $z(c)$ is an explicit rational function of $c$. If a standard form such as the Jordan canonical form or the Schur triangular form is known for $A$, we show how to obtain the corresponding standard form of $A(c)$. The web hyper-link matrix $G(c)$ used by Google for computing the PageRank is a special case in which $A$ is real, nonnegative, and row stochastic (taking into consideration the dangling nodes), $cin (0,1)$, $x$ is the vector of all ones, and $v$ is a positive probability vector. The PageRank vector (the normalized dominant left eigenvector of $G(c)$) is therefore an explicit rational function of $c$. Extrapolation procedures on the complex field may give a practical and efficient way to compute the PageRank vector when $c$ is close to $1$. A discussion on the model, on its adherence to reality, and on possible variations is also considered.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.10/DagSemProc.07071.10.pdf</fullTextUrl>

<keyword>Google matrix</keyword>

<keyword>rank-one perturbation</keyword>

<keyword>Jordan canonical form</keyword>

<keyword>extrapolation formulae.</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.11</doi>

<documentType>article</documentType>

<title language="eng">Graph matching with type constraints on nodes and edges</title>

<name>Fraikin, Catherine</name>

</author>

<name>Van Dooren, Paul</name>

</author>

</authors>

<abstract language="eng">In this paper, we consider two particular problems of directed graph matching. The first problem concerns graphs with nodes that have been subdivided into classes of different type. The second problem treats graphs with edges of different types. In the two cases, the matching process is based on a constrained projection of the nodes and of the edges of both graphs in a lower dimensional space. The procedures are formulated as non-convex optimization problems. The objective functions use the adjacency matrices and the constraints on the problem impose the isometry of the so-called projections. Iterative algorithms are proposed to solve the optimization problems. As illustration, we give an example of graph matching for graphs with two types of nodes and graphs with two types of edges.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.11/DagSemProc.07071.11.pdf</fullTextUrl>

<keyword>Graph matching</keyword>

<keyword>Optimization</keyword>

<keyword>Typed nodes</keyword>

<keyword>Typed edges</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.12</doi>

<documentType>article</documentType>

<title language="eng">Iteration at Different Levels: Multi-Level Methods fro Structured Markov Chains</title>

<name>Buchholz, Peter</name>

</author>

</authors>

<abstract language="eng">For the stationary analysis of large Markov chains in continuous and discrete time a wide variety of solution techniques has been applied in the past. Empirical comparisons show that in particular so called multi-level approaches that perform iterations at different levels are the most efficient solvers for a wide class of Markov chains. The methods combine ideas from aggregation disaggregation methods and algebraic multigrid. The talk gives an overview of the basic ideas of multi level approaches and shows which design alternatives for the algorithms exist. In particular it considers different forms of defining levels, available alternatives to realize prolongation and interpolation operations, different cycle types and different stopping criteria for the smoothing operations at each level. The last part of the talk is devoted to implementation issues and data structures that are necessary for an efficient realization of multi-level methods.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.12/DagSemProc.07071.12.pdf</fullTextUrl>

<keyword>Stationary Analysis</keyword>

<keyword>Multi-Level Techniques</keyword>

<keyword>Kronecker Representation</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.13</doi>

<documentType>article</documentType>

<title language="eng">Multidamping simulation framework for link-based ranking</title>

<name>Kollias, Giorgios</name>

</author>

<name>Gallopoulos, Efstratios</name>

</author>

</authors>

<abstract language="eng">We review methods for the approximate computation of PageRank. Standard methods are based on the eigenvector and linear system characterizations. Our starting point are recent methods based on series representation whose coefficients are damping functions, for example Linear Rank, HyperRank and TotalRank, etc. We propose a multidamping framework for interpreting PageRank and these methods. Multidamping is based on some new useful properties of Google type matrices. The approach can be generalized and could help in the exploration of new approximations for list-based ranking. This is joint work with Georgios Kollias and is supported by a Pythagoras-EPEAEK-II grant.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.13/DagSemProc.07071.13.pdf</fullTextUrl>

<keyword>PageRank</keyword>

<keyword>Google</keyword>

<keyword>power method</keyword>

<keyword>eigenvalues</keyword>

<keyword>teleportation</keyword>

<keyword>list-based ranking</keyword>

<keyword>TotalRank</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.14</doi>

<documentType>article</documentType>

<title language="eng">Nonlinear Approximation and Image Representation using Wavelets</title>

<name>Guha, Sudipto</name>

</author>

<name>Harb, Boulos</name>

</author>

</authors>

<abstract language="eng">We address the problem of finding sparse wavelet representations of high-dimensional vectors. We present a lower-bounding technique and use it to develop an algorithm for computing provably-approximate instance-specific representations minimizing general $ell_p$ distances under a wide variety of compactly-supported wavelet bases. More specifically, given a vector $f in mathbb{R}^n$, a compactly-supported wavelet basis, a sparsity constraint $B in mathbb{Z}$, and $pin[1,infty]$, our algorithm returns a $B$-term representation (a linear combination of $B$ vectors from the given basis) whose $ell_p$ distance from $f$ is a $O(log n)$ factor away from that of the optimal such representation of $f$. Our algorithm applies in the one-pass sublinear-space data streaming model of computation, and it generalize to weighted $p$-norms and multidimensional signals. Our technique also generalizes to a version of the problem where we are given a bit-budget rather than a term-budget. Furthermore, we use it to construct a emph{universal representation} that consists of at most $B(log n)^2$ terms and gives a $O(log n)$-approximation under all $p$-norms simultaneously.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.14/DagSemProc.07071.14.pdf</fullTextUrl>

<keyword>Nonlinear approximation</keyword>

<keyword>wavelets</keyword>

<keyword>approximation algorithms</keyword>

<keyword>streaming algorithms</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.15</doi>

<documentType>article</documentType>

<title language="eng">Stanford Matrix Considered Harmful</title>

<name>Vigna, Sebastiano</name>

</author>

</authors>

<abstract language="eng">I discuss the implications of using small data sets for experiments related to the web graph.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.15/DagSemProc.07071.15.pdf</fullTextUrl>

<keyword>Weg graph</keyword>

<keyword>PageRank</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.16</doi>

<documentType>article</documentType>

<title language="eng">The Sinkhorn-Knopp Algorithm:Convergence and Applications</title>

<name>Knight, Philip A.</name>

</author>

</authors>

<abstract language="eng">As long as a square nonnegative matrix $A$ contains sufficient nonzero elements, the Sinkhorn-Knopp algorithm can be used to balance the matrix, that is, to find a diagonal scaling of $A$ that is doubly stochastic. We relate balancing to problems in traffic flow and describe how balancing algorithms can be used to give a two sided measure of nodes in a graph. We show that with an appropriate modification, the Sinkhorn-Knopp algorithm is a natural candidate for computing the measure on enormous data sets.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.16/DagSemProc.07071.16.pdf</fullTextUrl>

<keyword>Matrix balancing</keyword>

<keyword>Sinkhorn-Knopp algorithm</keyword>

<keyword>PageRank</keyword>

<keyword>doubly stochastic matrix</keyword>

</keywords>

</record>

<publisher>Schloss Dagstuhl – Leibniz-Zentrum für Informatik</publisher>

<journalTitle>Dagstuhl Seminar Proceedings</journalTitle>

<doi>10.4230/DagSemProc.07071.17</doi>

<documentType>article</documentType>

<title language="eng">Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative</title>

<name>Gleich, David</name>

</author>

<name>Glynn, Peter</name>

</author>

<name>Golub, Gene</name>

</author>

<name>Greif, Chen</name>

</author>

</authors>

<abstract language="eng">The three results on the PageRank vector are preliminary but shed light on the eigenstructure of a PageRank modified Markov chain and what happens when changing the teleportation parameter in the PageRank model. Computations with the derivative of the PageRank vector with respect to the teleportation parameter show predictive ability and identify an interesting set of pages from Wikipedia.</abstract>

<fullTextUrl format="pdf">https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol07071/DagSemProc.07071.17/DagSemProc.07071.17.pdf</fullTextUrl>

<keyword>PageRank</keyword>

<keyword>PageRank derivative</keyword>

<keyword>PageRank sensitivity</keyword>

<keyword>PageRank eigenstructure</keyword>

</keywords>

</record>

</records>