Dagstuhl Seminar Proceedings, Volume 9061
Dagstuhl Seminar Proceedings
DagSemProc
https://www.dagstuhl.de/dagpub/1862-4405
https://dblp.org/db/series/dagstuhl
1862-4405
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
9061
2009
https://drops.dagstuhl.de/entities/volume/DagSemProc-volume-9061
09061 Abstracts Collection – Combinatorial Scientific Computing
From 01.02.2009 to 06.02.2009, the Dagstuhl Seminar 09061 ``Combinatorial Scientific Computing '' was held in Schloss Dagstuhl – Leibniz Center for Informatics.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available.
Graphs
combinatorics
high-performance scientific computing
1-49
Regular Paper
Uwe
Naumann
Uwe Naumann
Olaf
Schenk
Olaf Schenk
Horst D
Simon
Horst D Simon
Sivan
Toledo
Sivan Toledo
10.4230/DagSemProc.09061.1
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
A Model for Task Repartioning under Data Replication
We propose a two-phase model for solving the problem of
task repartitioning under data replication with memory constraints. The
hypergraph-partitioning-based model proposed for the first phase aims
to minimize the total message volume that will be incurred due to the
replication/migration of input data while maintaining balance on computational
and receive-volume loads of processors. The network-flow-based
model proposed for the second phase aims to minimize the maximum
message volume handled by processors via utilizing the flexibility in assigning
send-communication tasks to processors, which is introduced by
data replication. The validity of our proposed model is verified on parallelization of a direct volume rendering algorithm.
Task repartitioning
data replication
hypergraph partitioning with fixed vertices
assignment flow network
1-3
Regular Paper
Cevdet
Aykanat
Cevdet Aykanat
Erkan
Okuyan
Erkan Okuyan
B. Barla
Cambazoglu
B. Barla Cambazoglu
10.4230/DagSemProc.09061.2
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
A Nearly-Linear Time Algorithm for Approximately Solving Linear Systems in a Symmetric M-Matrix
We present an algorithm for solving a linear system in a symmetric M-matrix.
In particular, for $n times n$ symmetric M-matrix $M$, we show how to find a diagonal matrix $D$ such that
$DMD$ is diagonally-dominant. To compute $D$, the algorithm must solve $O{log n}$ linear systems in diagonally-dominant matrices. If we solve these diagonally-dominant systems approximately using the Spielman-Teng
nearly-linear time solver, then we obtain an algorithm for approximately solving linear systems in symmetric M-matrices, for which the expected running time is also nearly-linear.
M-matrix
diagonally-dominant matrix
linear system solver
iterative algorithm
randomized algorithm
network flow
gain graph
1-4
Regular Paper
Samuel I.
Daitch
Samuel I. Daitch
Daniel A.
Spielman
Daniel A. Spielman
10.4230/DagSemProc.09061.3
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Algorithmic Differentiation Through Automatic Graph Elimination Ordering (ADTAGEO)
Algorithmic Differentiation Through Automatic Graph Elimination
Ordering (ADTAGEO) is based on the principle of Instant
Elimination: At runtime we dynamically maintain a DAG representing
only active variables that are alive at any time. Whenever an
active variable is deallocated or its value is overwritten the
corresponding vertex in the Live-DAG will be eliminated
immediately by the well known vertex elimination rule [1].
Consequently, the total memory requirement is equal to that of the
sparse forward mode. Assuming that local variables are destructed
in the opposite order of their construction (as in C++), a single
assignment code is in effect differentiated in reverse mode. If
compiler-generated temporaries are destroyed in reverse order too,
then Instant Elimination yields the statement level reverse mode of
ADIFOR [2] naturally.
The user determines the elimination order intentionally (or
unintentionally) by the order in which he declares variables,
which makes hybrid modes of AD possible by combining forward and
reverse differentiated parts.
By annotating the Live-DAG with local Hessians and applying second
order elimination rules, Hessian-vector products can be computed
efficiently since the annotated Live-DAG stores one half of the
symmetric Hessian graph only (as suggested in [1]).
Nested automatic differentiation is done easily by subsequent
propagations, since sensitivities between variables alive can be
obtained at any point in time within the Live-DAG.
The concept of maintaining a Live-DAG fits optimally into the
strategy of overloaded operators for classes, it is a very natural
example of Object Oriented Programming. A proof-of-concept
implementation in C++ is available (contact the first author).
References
1. Griewank, A.: Evaluating Derivatives. Principles and
Techniques of Algorithmic Differentiation.
SIAM (2000)
2.Bischof, C.H., Carle, A., Khademi, P., Mauer, A.: ADIFOR 2.0:
Automatic differentiation of Fortran 77 programs.
IEEE Computational Science & Engineering 3 (1996) 18-32
Automatic Differentiation
Instant Elimination
Live-DAG
symmetric Hessian-DAG
forward mode
reverse mode
checkpointing
ADTAGEO
1-2
Regular Paper
Jan
Riehme
Jan Riehme
Andreas
Griewank
Andreas Griewank
10.4230/DagSemProc.09061.4
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Assessing an approximation algorithm for the minimum fill-in problem in practice
We investigate an implementation of an approximation algorithm for the minimum fill-in problem. The algorithm has some degree of freedom since it is composed of several subtasks for which one can choose between different algorithms. The goal of the present work is to study the impact of theses components and carefully examine the practicability of the overall approximation algorithm by a set of numerical examples.
Sparse linear algebra
1-1
Regular Paper
H. Martin
Bücker
H. Martin Bücker
Michael
Lülfesmann
Michael Lülfesmann
Arno
Rasch
Arno Rasch
10.4230/DagSemProc.09061.5
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Combinatorial Problems in High-Performance Computing: Partitioning
This extended abstract presents a survey of combinatorial problems
encountered in scientific computations on today's
high-performance architectures, with sophisticated memory
hierarchies, multiple levels of cache, and multiple processors
on chip as well as off-chip.
For parallelism, the most important problem is to partition
sparse matrices, graph, or hypergraphs into nearly equal-sized
parts while trying to reduce inter-processor communication.
Common approaches to such problems involve multilevel
methods based on coarsening and uncoarsening (hyper)graphs,
matching of similar vertices, searching for good separator sets
and good splittings, dynamical adjustment of load imbalance,
and two-dimensional matrix splitting methods.
Partitioning
sparse matrix
hypergraph
parallel
HPC
1-5
Regular Paper
Rob
Bisseling
Rob Bisseling
Tristan
van Leeuwen
Tristan van Leeuwen
Umit V.
Catalyurek
Umit V. Catalyurek
10.4230/DagSemProc.09061.6
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Combinatorial Problems in OpenAD
Computing derivatives using automatic differentiation methods entails
a variety of combinatorial problems. The OpenAD tool implements automatic
differentiation as source transformation of a program that represents a numerical
model. We select three combinatorial problems and discuss the solutions
implemented in OpenAD. Our intention is to explain the specific parts of the implementation so that readers can easily use OpenAD to investigate and develop
their own solutions to these problems.
Automatic differentiation
combinatorial problem
tool tutorial
1-12
Regular Paper
Jean
Utke
Jean Utke
Uwe
Naumann
Uwe Naumann
10.4230/DagSemProc.09061.7
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Combinatorial problems in solving linear systems
Numerical linear algebra and combinatorial optimization are vast
subjects; as is their interaction. In virtually all cases there should
be a notion of sparsity for a combinatorial problem to arise. Sparse
matrices, therefore, form the basis of the interaction of these two
seemingly disparate subjects. As the core of many of today's numerical linear
algebra computations consists of sparse linear system solutions, we
will cover combinatorial problems, notions, and algorithms relating to
those computations.
This talk is thus concerned with direct and iterative methods for sparse
linear systems and their intercation with combinatorial optimization.
On the direct methods side, we discuss matrix ordering; bipartite matching
and matrix scaling for better pivoting; task assignment and scheduling
for parallel multifrontal solvers. On the iterative method side, we discuss
preconditioning techniques including incomplete factor preconditioners
(notion of level of fill-in), support graph preconditioners (graph
embedding concepts), and algebraic multigrids (independent sets in
undirected graphs).
In a separate part of the talk, we discuss methods that aim to exploit
sparsity during linear system solution. These methods include block
diagonalization of the matrix; efficient triangular system solutions
for right-hand side vectors of single nonzero entries. Towards the
end, we mention, quite briefly as they are topics of other invited
talks, some other areas whose interactions with combinatorial
optimization are of great benefit to numerical linear algebra. These
include graph and hypergraph partitioning for load balancing problems,
and colouring problems in numerical optimization. On closing, we
compile and list a set of open problems.
Combinatorial scientific computing
graph theory
combinatorial optimization
sparse matrices
linear system solution
1-37
Regular Paper
Iain S.
Duff
Iain S. Duff
Bora
Ucar
Bora Ucar
10.4230/DagSemProc.09061.8
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Distillating knowledge about SCOTCH
The design of the Scotch library for static mapping, graph
partitioning and sparse matrix ordering is highly modular,
so as to allow users and potential contributors to tweak it
and add easily new static mapping, graph bipartitioning,
vertex separation or graph ordering methods to match their
particular needs.
The purpose of this tutorial is twofold. It will start with a
description of the interface of Scotch, presenting its visible
objects and data structures.
Then, we will step into the API mirror and have a look at the inside:
the internal representation of graphs, mappings and orderings, and the
basic sequential and parallel building blocks: graph induction, graph
coarsening which can be re-used by third-party software. As an
example, we will show how to add a simple genetic algorithm routine to
the graph bipartitioning methods.
Scotch
graph algorithms
data structures
1-12
Regular Paper
Francois
Pellegrini
Francois Pellegrini
10.4230/DagSemProc.09061.9
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Getting Started with ADOL-C
The C++ package ADOL-C described in this paper facilitates the evaluation of
first and higher derivatives of vector functions that are defined
by computer programs written in C or C++.
The numerical values of derivative vectors are obtained free
of truncation errors at mostly a small multiple of the run time and
a fix small multiple random access memory required by the given
function evaluation program.
Derivative matrices are obtained by columns, by rows or in sparse
format. This tutorial describes the source code modification required
for the application of ADOL-C, the most frequently used drivers to
evaluate derivatives and some recent developments.
ADOL-C
algorithmic differentiation of C/C++ programs
1-10
Regular Paper
Andrea
Walther
Andrea Walther
10.4230/DagSemProc.09061.10
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Getting Started with Zoltan: A Short Tutorial
The Zoltan library is a toolkit of parallel combinatorial algorithms for
unstructured and/or adaptive computations. In this paper, we briefly describe
the most significant tools in Zoltan: dynamic partitioning, graph
coloring and ordering. We also describe how to obtain, build, and use
Zoltan in parallel applications.
Parallel computing
load balancing
partitioning
coloring
ordering
software
1-10
Regular Paper
Karen D.
Devine
Karen D. Devine
Erik G.
Boman
Erik G. Boman
Lee Ann
Riesen
Lee Ann Riesen
Umit V.
Catalyurek
Umit V. Catalyurek
Cédric
Chevalier
Cédric Chevalier
10.4230/DagSemProc.09061.11
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Low-Memory Tour Reversal in Directed Graphs
We consider the problem of reversing a {em tour} $(i_1,i_2,ldots,i_l)$ in a directed graph $G=(V,E)$ with positive integer vertices $V$ and edges
$E subseteq V imes V$, where $i_j in V$ and $(i_j,i_{j+1}) in E$ for
all $j=1,ldots,l-1.$ The tour can be processed in last-in-first-out order as long as the size of the corresponding stack does not exceed the available memory. This constraint is violated in most cases when considering control-flow graphs of large-scale numerical simulation programs. The tour reversal problem also arises in adjoint programs used, for example, in the context of derivative-based nonlinear optimization, sensitivity analysis, or
other, often inverse, problems. The intention is to compress the tour in order not to run out of memory. As the general optimal compression problem was proven to be NP-hard and big control-flow graphs results from loops in programs we do not want to use general purpose algorithms to compress the tour. We want rather to compress the tour by finding loops and replace the redundant information by proper representation of the loops.
Directed graph
tour reversal
offline algorithm
dynamic programming
online algorithm
control-flow reversal
adjoint program
1-3
Regular Paper
Viktor
Mosenkis
Viktor Mosenkis
Uwe
Naumann
Uwe Naumann
Elmar
Peise
Elmar Peise
10.4230/DagSemProc.09061.12
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Multifrontral multithreaded rank-revealing sparse QR factorization
SuiteSparseQR is a sparse multifrontal QR factorization algorithm.
Dense matrix methods within each frontal matrix enable
the method to obtain high performance on multicore architectures. Parallelism
across different frontal matrices is handled with Intel's Threading Building
Blocks library.
Rank-detection is performed within each
frontal matrix using Heath's method, which does not require column pivoting.
The resulting sparse QR factorization obtains a substantial fraction of the
theoretical peak performance of a multicore computer.
Sparse matrix algorithms
QR factorization
multifrontal
1-3
Regular Paper
Timothy
Davis
Timothy Davis
10.4230/DagSemProc.09061.13
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Parallelization of Mapping Algorithms for Next Generation Sequencing Applications
With the advent of next-generation high throughput sequencing
instruments, large volumes of short sequence data are generated at an
unprecedented rate. Processing and analyzing these massive data
requires overcoming several challenges. A particular challenge
addressed in this abstract is the mapping of short sequences (reads)
to a reference genome by allowing mismatches. This is a significantly
time consuming combinatorial problem in many applications including
whole-genome resequencing, targeted sequencing, transcriptome/small
RNA, DNA methylation and ChiP sequencing, and takes time on the order
of days using existing sequential techniques on large scale
datasets. In this work, we introduce six parallelization methods each
having different scalability characteristics to speedup short sequence
mapping. We also address an associated load balancing problem that
involves grouping nodes of a tree from different levels. This problem
arises due to a trade-off between computational cost and granularity
while partitioning the workload. We comparatively present the
proposed parallelization methods and give theoretical cost models for
each of them. Experimental results on real datasets demonstrate the
effectiveness of the methods and indicate that they are successful at
reducing the execution time from the order of days to under just a few
hours for large datasets.
To the best of our knowledge this is the first study on
parallelization of short sequence mapping problem.
Genome sequencing
sequence mapping
1-1
Regular Paper
Doruk
Bozdag
Doruk Bozdag
Catalin C.
Barbacioru
Catalin C. Barbacioru
Umit V.
Catalyurek
Umit V. Catalyurek
10.4230/DagSemProc.09061.14
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Randomized Heuristics for Exploiting Jacobian Scarcity
Griewank and Vogel introduced the notion of Jacobian scarcity, which
generalizes the properties of sparsity and rank to capture a kind of
deficiency in the degrees of freedom of the Jacobian matrix $F'(mathbf{x}).$
We describe new randomized heuristics that exploit scarcity for the
optimized evaluation of collections of Jacobian-vector or
Jacobian-transpose-vector products.
Jacobian
scarcity
accumulation
directed acyclic graph
1-2
Regular Paper
Andrew
Lyons
Andrew Lyons
Ilya
Safro
Ilya Safro
10.4230/DagSemProc.09061.15
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Short Tutorial: Getting Started With Ipopt in 90 Minutes
Ipopt is an open-source software package for large-scale nonlinear optimization. This tutorial gives a short introduction that should allow the reader to install and test the package on a UNIX-like system, and to run simple examples in a short period of time.
Nonlinear Optimization
Tutorial
Ipopt
COIN-OR
1-17
Regular Paper
Andreas
Wächter
Andreas Wächter
10.4230/DagSemProc.09061.16
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Stabilising aggregation AMG
When applied to linear systems arising from scalar elliptic partial differential equations, algebraic multigrid (AMG) schemes based on aggregation exhibit a mesh size dependent convergence behaviour. As the number of iterations increases with the number of unknowns in the linear system, the computational complexity of such a scheme is non-optimal. This contribution presents a stabilisation of the aggregation AMG algorithm which adds a number of subspace projection steps at different stages of the algorithm and
allows for variable cycling strategies. Numerical results illustrate the advantage of the stabilised algorithm over its original formulation.
Algebraic multigrid
aggregation
stabilisation
1-4
Regular Paper
Frank
Hülsemann
Frank Hülsemann
10.4230/DagSemProc.09061.17
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
The CPR Method and Beyond : Prologue
In $1974$ A.R. Curtis, M.J.D. Powell, and J.K.Reid published a seminal paper on the estimation of Jacobian matrices which was later coined as the CPR method. Central to the CPR method is the effective utilization of a priori known sparsity information. It is only recently that the optimal CPR method in its general form is characterized and the theoretical underpinning for the optimality is shown. In this short note we provide an overview of the development of computational techniques and software tools for the estimation of Jacobian
matrices.
Structural Orthogonality
Optimal CPR
Sparse Jacobian Estimation Software
1-3
Regular Paper
Shahadat
Hossain
Shahadat Hossain
Trond
Steihaug
Trond Steihaug
10.4230/DagSemProc.09061.18
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
The Enabling Power of Graph Coloring Algorithms in Automatic Differentiation and Parallel Processing
Combinatorial scientific computing (CSC) is founded
on the recognition of the enabling power of combinatorial algorithms
in scientific and engineering computation and in high-performance computing.
The domain of CSC extends beyond traditional scientific computing---the
three major branches of which are numerical linear algebra,
numerical solution of differential equations, and
numerical optimization---to include a range of emerging and
rapidly evolving computational and information science disciplines.
Orthogonally, CSC problems could also emanate from
infrastructural technologies for supporting high-performance computing.
Despite the apparent disparity in their origins,
CSC problems and scenarios are unified by the following common features:
(A) The overarching goal is often to make computation
efficient---by minimizing overall execution time, memory usage,
and/or storage space---or to facilitate knowledge discovery or analysis.
(B) Identifying the most accurate combinatorial abstractions that
help achieve this goal is usually a part of the challenge.
(C) The abstractions are often expressed, with advantage, as graph
or hypergraph problems.
(D) The identified combinatorial problems are typically NP-hard to
solve optimally. Thus, fast, often linear-time, approximation (or
heuristic) algorithms are the methods of choice.
(E) The combinatorial algorithms themselves often need to be
parallelized, to avoid their being bottlenecks within a larger
parallel computation.
(F) Implementing the algorithms and deploying them via software
toolkits is critical.
This talk attempts to illustrate the aforementioned features of CSC
through an example: we consider the enabling role graph coloring
models and their algorithms play in efficient computation of
sparse derivative matrices via automatic differentiation (AD).
The talk focuses on efforts being made on this topic within
the SciDAC Institute for Combinatorial Scientific Computing
and Petascale Simulations (CSCAPES).
Aiming at providing overview than details, we discuss
the various coloring models used in sparse Jacobian and Hessian computation,
the serial and parallel algorithms developed in CSCAPES
for solving the coloring problems, and a
case study that demonstrate the efficacy of the coloring techniques
in the context of an optimization problem in a Simulated Moving Bed process.
Implementations of our serial algorithms for the coloring
and related problems in derivative computation are assembled
and made publicly available in a package called ColPack.
Implementations of our parallel coloring algorithms are
incorporated into and deployed via the load-balancing toolkit Zoltan.
ColPack has been interfaced with ADOL-C, an operator overloading-based
AD tool that has recently acquired improved capabilities for
automatic detection of sparsity patterns of Jacobians and Hessians
(sparsity pattern detection is the first step in derivative matrix
computation via coloring-based compression).
Further information on ColPack and Zoltan is available
at their respective websites, which can be accessed via
http://www.cscapes.org
Graph coloring
sparse derivative computation
automatic differentiation
parallel computing
1-3
Regular Paper
Assefaw H.
Gebremedhin
Assefaw H. Gebremedhin
10.4230/DagSemProc.09061.19
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
The Past, Present and Future of High Performance Computing
In this overview paper we start by looking at the birth of what is called
``High Performance Computing'' today. It all began over 30 years ago
when the Cray 1 and CDC Cyber 205 ``supercomputers'' were introduced.
This had a huge impact on scientific computing. A very turbulent time at both
the hardware and software level was to follow. Eventually the situation
stabilized, but not for long.
Today, there are two different trends in hardware architectures
and have created a bifurcation in the market. On one hand the GPGPU quickly
found a place in the marketplace, but is still the domain of the expert. In
contrast to this, multicore processors make hardware parallelism available
to the masses. Each have their own set of issues to deal with.
In the last section we make an attempt to look into the future, but this is
of course a highly personal opinion.
High-Performance Scientific Computing
1-7
Regular Paper
Ruud
van der Pas
Ruud van der Pas
10.4230/DagSemProc.09061.20
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode
Weighted aggregation for multi-level graph partitioning
Graph partitioning is a well-known optimization problem of great interest in theoretical and applied studies. Since the 1990s, many multilevel schemes have been introduced as a practical tool to solve this problem. A multilevel algorithm may be viewed as a process of graph topology learning at different scales in order to generate a better approximation for any approximation method incorporated at the uncoarsening stage in the framework. In this work we compare two multilevel frameworks based on the geometric and the algebraic multigrid schemes for the partitioning problem.
Graph partitioning
multilevel
coarsening
weighted aggregation
1-1
Regular Paper
Cédric
Chevalier
Cédric Chevalier
Ilya
Safro
Ilya Safro
10.4230/DagSemProc.09061.21
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode