An Algebraic Approach to Vectorial Programs

Paperman, Charles; Salvati, Sylvain; Soyez-Martin, Claire

doi:10.4230/LIPIcs.STACS.2023.51

Abstract

Vectorial programming, the combination of SIMD instructions with usual processor instructions, is known to speed-up many standard algorithms. Simple regular languages have benefited from this technology. This paper is a first step towards pushing these benefits further. We take advantage of the inner algebraic structure of regular languages and produce high level representations of efficient vectorial programs that recognize certain classes of regular languages. As a technical ingredient, we establish equivalences between classes of vectorial circuits and logical formalisms, namely unary temporal logic and first order logic. The main result is the construction of compilation procedures that turns syntactic semigroups into vectorial circuits. The circuits we obtain are small in that they improve known upper-bounds on representations of automata within the logical formalisms. The gain is mostly due to a careful sharing of sub-formulas based on algebraic tools.

Anne Bergeron and Sylvie Hamel. Cascade decompositions are bit-vector algorithms. In International Conference on Implementation and Application of Automata, pages 13-26. Springer, 2001.
Anne Bergeron and Sylvie Hamel. Vector algorithms for approximate string matching. International Journal of Foundations of Computer Science, 13(01):53-65, 2002.
Mikolaj Bojańczyk. Factorization forests. In International Conference on Developments in Language Theory, pages 1-17. Springer, 2009.
Robert D Cameron, Thomas C Shermer, Arrvindh Shriraman, Kenneth S Herdy, Dan Lin, Benjamin R Hull, and Meng Lin. Bitwise data parallelism in regular expression matching. In 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pages 139-150. IEEE, 2014.
V. Diekert, P. Gastin, and M. Kufleitner. A survey on small fragments of first-order logic over finite words. Internat. J. Found. Comput. Sci., 19:513-548, 2008.
Michael Farrar. Striped Smith–Waterman speeds database searches six times over other SIMD implementations. Bioinformatics, 23(2):156-161, November 2006.
Nathanaël Fijalkow and Charles Paperman. Monadic second-order logic with arbitrary monadic predicates. ACM Transactions on Computational Logic (TOCL), 18(3):1-17, 2017.
Tobias Grosser, Hongbin Zheng, Raghesh Aloor, Andreas Simbürger, Armin Größlinger, and Louis-Noël Pouchet. Polly-polyhedral optimization in LLVM. In Proceedings of the First International Workshop on Polyhedral Compilation Techniques (IMPACT), volume 2011, page 1, 2011.
Johan Anthony Wilem Kamp. Tense logic and the theory of linear order. University of California, Los Angeles, 1968.
John Keiser and Daniel Lemire. Validating UTF-8 in less than one instruction per byte. Software: Practice and Experience, 51(5):950-964, 2021.
Donald Ervin Knuth. The art of computer programming: Bitwise tricks & techniques. Binary Decision Diagrams, 4, 2009.
M Oguzhan Külekci. Filter based fast matching of long patterns by using SIMD instructions. In Stringology, pages 118-128, 2009.
Leslie Lamport. Multiple byte processing with full-word instructions. Communications of the ACM, 18(8):471-475, 1975.
Geoff Langdale and Daniel Lemire. Parsing gigabytes of JSON per second. The VLDB Journal, 28(6):941-960, 2019.
McNaughton, Robert, Papert, and Seymour A. Counter-Free Automata (MIT research monograph no. 65). The MIT Press, 1971.
Filip Murlak, Charles Paperman, and Michal Pilipczuk. Schema validation via streaming circuits. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 237-249, 2016.
Gene Myers. A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM (JACM), 46(3):395-415, 1999.
Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, and Ayal Zaks. Vapor SIMD: Auto-vectorize once, run everywhere. In International Symposium on Code Generation and Optimization (CGO 2011), pages 151-160. IEEE, 2011.
Dorit Nuzman and Ayal Zaks. Autovectorization in GCC-two years later. In Proceedings of the 2006 GCC Developers Summit, volume 6, 2006.
Dorit Nuzman and Ayal Zaks. Outer-loop vectorization: revisited for short SIMD architectures. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 2-11, 2008.
Charles Paperman, Sylvain Salvati, and Claire Soyez-Martin. Addition Lemma, September 2022. URL: https://hal.archives-ouvertes.fr/hal-03787033.
Charles Paperman, Sylvain Salvati, and Claire Soyez-Martin. An algebraic approach to vectorial programs. Complete version of the paper, January 2023. URL: https://hal.archives-ouvertes.fr/hal-03831752v2.
J.E. Pin and European Mathematical Society Publishing House ETH-Zentrum SEW A27. Handbook of Automata Theory: Volume I: Theoretical Foundations; Volume II: Automata in Mathematics and Selected Applications. EMS Press, 2021.
Jean Eric Pin. Varieties of formal languages, volume 184. Springer, 1986.
M. P. Schützenberger. On finite monoids having only trivial subgroups. Information and control, 8:190-194, 1965.
Olivier Serre. Vectorial languages and linear temporal logic. Theoretical computer science, 310(1-3):79-116, 2004.
Howard Straubing. Finite semigroup varieties of the form V*D. Journal of Pure and Applied Algebra, 36:53-94, 1985.
Howard Straubing. Finite automata, formal logic, and circuit complexity. Springer Science & Business Media, 2012.
Pascal Tesson and Denis Thérien. Diamonds are forever: The variety DA. In Semigroups, algorithms, automata and languages, pages 475-499. World Scientific, 2002.
Denis Therien and Pascal Tesson. Logic meets algebra: the case of regular languages. Logical Methods in Computer Science, 3, 2007.
Denis Thérien and Thomas Wilke. Over words, two variables are as powerful as one quantifier alternation. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pages 234-240, 1998.
Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. Polyhedral-model guided loop-nest auto-vectorization. In 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 327-337. IEEE, 2009.
Xiang Wang, Yang Hong, Harry Chang, KyoungSoo Park, Geoff Langdale, Jiayu Hu, and Heqing Zhu. Hyperscan: A fast multi-pattern regex matcher for modern CPUs. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 631-648, 2019.
Philipp Weis and Neil Immerman. Structure theorem and strict alternation hierarchy for FO² on words. In International Workshop on Computer Science Logic, pages 343-357. Springer, 2007.
Thomas Wilke. Classifying discrete temporal properties. In Annual symposium on theoretical aspects of computer science, pages 32-46. Springer, 1999.
James Worrell, Rastislav Lenhardt, and Michael Benedikt. Two variable vs. linear temporal logic in model checking and games. Logical Methods in Computer Science, 9, 2013.

An Algebraic Approach to Vectorial Programs

Authors Charles Paperman , Sylvain Salvati, Claire Soyez-Martin

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

An Algebraic Approach to Vectorial Programs

Authors Charles Paperman , Sylvain Salvati, Claire Soyez-Martin

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References