Fast (Parallel) Dense Linear Interval Systems Solvers in C-XSC Using Error Free Transformations and BLAS

Zimmer, Michael; Krämer, Walter

doi:10.4230/DagSemProc.08021.11

Abstract

The traditional solver for linear interval systems available in C-XSC [6,1] is mathematically based on the Krawczyk[12] operator and modifications introduced by Rump[17]. The Krawczyk operator is composed of matrix/vector operations. These operations are realized in C-XSC with higest accuracy (only one final rounding) using a so called long accumulator (dotprecision variable). C-XSC dotprecision variables allow the error free computation of sums of floating point numbers as well as the error free computation of scalar products of floating point vectors. Thus, from a mathematical point of view these operations are perfect. Because actual hardware does not support these perfect scalar products all operations have to be realized by software. This fact leads to a tremendous time penalty (note: it has been shown that with modest additional hardware costs perfect scalar products can be made as fast as simple floating-point loops). To speed up the C-XSC scalar product software-operations we adapt the so called DotK algorithm as published in [14]. Error free transformations[14,3,4,10] are used as basic building blocks to develop summation and scalar product algorithms simulating a K-fold precision. Compared to the perfect C-XSC operations these operations are fast. They are more accurate than simple floating-point loops (but of course no longer perfect in the mathematical sense). The fast operations are available in C-XSC via the new data types DotK, IDotK, CDotk and CIDotK. These new data types are composed in such a way that traditional C-XSC code using dotprecision variables can be adapted with minimal effort. It is possible to switch (at runtime!) from perfect computations to fast operations using K-fold precision (K equal 0 means traditional dotprecision computations) and it is possible to hold intermediate results with corresponding error bounds for further summations or scalar product updates. The details are described in [19]. Additionaly, based on similar algorithms used in Intlab[16], BLAS and LAPACK libraries [2] are used in the O(nÃ‚Â³) parts of the linear system solver. For matrix-matrix products, manipulation of the rounding mode of the processor is used to compute enclosures of the correct result. Comparing the traditional solver with the new version shows that the class of problems which are solvable with the new version is smaller than the class of problems which can be solved using the solver based on perfect operations. But it seems that for real world problems also the new solver is appropriate. Using the new solver based on BLAS and simulating a quadrupel precision (i.e. k==2) the speedup comes close to 200(!). The new solver is nearly as fast as the corresponding IntLab[16] solver verifylss. Solving a real linaer system of dimension 1000 on a Pentium 4 with 3.2GHz takes about 2.8 seconds. In all cases tested the accuracy of our new solver was better and in some cases significantly better than the accuracy of the corresponding IntLab results. The new solver also allows solving larger (dense) problems than its IntLab counterpart. We also show some examples where IntLab falls down whereas our new solver still works. A parallel version of this solver, based on ScaLAPACK, is also available. Unlike the previous parallel solver in C-XSC[5], this new solver does not depend on a root-node, which makes it possible to compute a verified solution even of very large linear systems. In the talk we will discuss the new data types in more detail, we will emphasize our modifications to the DotK algorithm taken from the literature [14,15], we will show time measurements and we will present results concerning the accuracy of the computed enclosures. Our results will also be compared to corresponding results computed with the IntLab package. We also will comment on hardware features and compiler options which can/should be used to get reliable results on different platforms efficiently. References: [1] Downloads: C-XSC library: http://www.math.uni-wuppertal.de/~xsc/xsc/cxsc.html Solvers: http://www.math.uni-wuppertal.de/~xsc/xsc/cxsc_software.html [2] L.S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, R. C. Whaley, An Updated Set of Basic Linear Algebra Subprograms (BLAS), ACM Trans. Math. Soft., 28-2 (2002), pp. 135--151. [3] Bohlender, G.; Walter, W.; Kornerup, P.; Matula, D.W.; Kornerup, P.; Matula, D.W.: Semantics for Exact Floating Point Operations. Proceedings, 10th IEEE Symposium on Computer Arithmetic, 26-28 June 1991, IEEE, 1991. [4] Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math., 18:224, 1971. [5] Grimmer, M.: Selbstverifizierende Mathematische Softwarewerkzeuge im High-Performance Computing. Konzeption, Entwicklung und Analyse am Beispiel der parallelen verifizierten Loesung linearer Fredholmscher Integralgleichungen zweiter Art. Logos Verlag, 2007. [6] Hofschuster, W.; Kraemer, W.: C-XSC 2.0: A C++ Library for Extended Scientific Computing. Numerical Software with Result Verification, Lecture Notes in Computer Science, Volume 2991/2004, Springer-Verlag, Heidelberg, pp. 15 - 35, 2004. [7] Kersten, Tim: Verifizierende rechnerinvariante Numerikmodule, Dissertation, University of Karlsruhe, 1998 [8] Klatte, Kulisch, Wiethoff, Lawo, Rauch: "C-XSC - A C++ Class Library for Extended Scientific Computing", Springer-Verlag, Heidelberg, 1993. Due to the C++ standardization (1998) and dramatic changes in C++ compilers over the last years this documentation describes no longer the actual C-XSC environment. Please refer to more accurate documentation (e.g.[1]) available from the web site of our research group: http... [9] Kirchner, R., Kulisch, U.: Hardware Support for Interval Arithmetic. Reliable Computing, Volume 12, Number 3, June 2006 , pp. 225-237(13). [10] Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms. Addison Wesley, 1969, vol. 2. [11] Kulisch, U.: Computer Arithmetic and Validity - Theory, Implementation. To appear. [12] Krawczyk, R.: Newton-Algorithmen zur Bestimmung von Nullstellen mit Fehlerschranken, Computing, 4:187-201, 1969. [13] Lerch, M.; Tischler, G.; Wolff von Gudenberg, J.; Hofschuster, W; Kraemer, W.: filib++, a Fast Interval Library Supporting Containment Computations. ACM TOMS, volume 32, number 2, pp. 299-324, 2006. [14] Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM Journal on Scientific Computing, 26:6, 2005. [15] Oishi, S., Tanabe, K., Ogita, T., Rump, S.M., Yamanaka, N.: A Parallel Algorithm of Accurate Dot Product. Submitted for publication, 2007. [16] Rump, S.M.: Intlab - Interval Laboratory. Developments in Reliable Computing, pp. 77-104, 1999. [17] Rump, S.M.: Kleine Fehlerschranken bei Matrixproblemen, Dissertation, University of Karlsruhe, 1980 [18] Stroustrup, Bjarne: The C++-Programming Language, 3rd Edition, Addison-Wesley, 2000. [19] Zimmer, Michael: Laufzeiteffiziente, parallele Loeser fuer lineare Intervallgleichungssysteme in C-XSC, Master thesis, University of Wuppertal, 2007. AMS subject classification: 65H10, 15-04, 65G99, 65G10, 65-04

Fast (Parallel) Dense Linear Interval Systems Solvers in C-XSC Using Error Free Transformations and BLAS

Authors Michael Zimmer, Walter Krämer

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics