Engineering Motif Search for Large Motifs

Authors Petteri Kaski, Juho Lauri, Suhas Thejaswi



PDF
Thumbnail PDF

File

LIPIcs.SEA.2018.28.pdf
  • Filesize: 0.73 MB
  • 19 pages

Document Identifiers

Author Details

Petteri Kaski
  • Department of Computer Science, Aalto University, Espoo, Finland
Juho Lauri
  • Nokia Bell Labs, Dublin, Ireland
Suhas Thejaswi
  • Department of Computer Science, Aalto University, Espoo, Finland

Cite As Get BibTex

Petteri Kaski, Juho Lauri, and Suhas Thejaswi. Engineering Motif Search for Large Motifs. In 17th International Symposium on Experimental Algorithms (SEA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 103, pp. 28:1-28:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.SEA.2018.28

Abstract

Given a vertex-colored graph H and a multiset M of colors as input, the graph motif problem asks us to decide whether H has a connected induced subgraph whose multiset of colors agrees with M. The graph motif problem is NP-complete but known to admit randomized algorithms based on constrained multilinear sieving over GF(2^b) that run in time O(2^kk^2m {M({2^b})}) and with a false-negative probability of at most k/2^{b-1} for a connected m-edge input and a motif of size k. On modern CPU microarchitectures such algorithms have practical edge-linear scalability to inputs with billions of edges for small motif sizes, as demonstrated by Björklund, Kaski, Kowalik, and Lauri [ALENEX'15]. This scalability to large graphs prompts the dual question whether it is possible to scale to large motif sizes. 
We present a vertex-localized variant of the constrained multilinear sieve that enables us to obtain, in time O(2^kk^2m{M({2^b})}) and for every vertex simultaneously, whether the vertex participates in at least one match with the motif, with a per-vertex probability of at most k/2^{b-1} for a false negative. Furthermore, the algorithm is easily vector-parallelizable for up to 2^k threads, and parallelizable for up to 2^kn threads, where n is the number of vertices in H. Here {M({2^b})} is the time complexity to multiply in GF(2^b).
We demonstrate with an open-source implementation that our variant of constrained multilinear sieving can be engineered for vector-parallel microarchitectures to yield hardware utilization that is bound by the available memory bandwidth. 
Our main engineering contributions are (a) a version of the recurrence for tightly labeled arborescences that can be executed as a sequence of memory-and-arithmetic coalescent parallel workloads on multiple GPUs, and (b) a bit-sliced low-level implementation for arithmetic in characteristic 2 to support (a).

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Graph algorithms
  • Mathematics of computing → Probabilistic algorithms
  • Theory of computation → Parallel algorithms
Keywords
  • algorithm engineering
  • constrained multilinear sieving
  • graph motif problem
  • multi-GPU
  • vector-parallel
  • vertex-localization

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Raphael Yuster, and Uri Zwick. Color-coding. J. ACM, 42(4):844-856, 1995. Google Scholar
  2. Nadja Betzler, Michael R. Fellows, Christian Komusiewicz, and Rolf Niedermeier. Parameterized algorithms and hardness results for some graph motif problems. In Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 5029 of Lecture Notes in Computer Science, pages 31-43. Springer, 2008. Google Scholar
  3. Eli Biham. A fast new DES implementation in software. In Proceedings of the 4th International Workshop on Fast Software Encryption (FSE), volume 1267 of Lecture Notes in Computer Science, pages 260-272. Springer, 1997. Google Scholar
  4. Andreas Björklund. Determinant sums for undirected hamiltonicity. SIAM J. Comput., 43(1):280-299, 2014. Google Scholar
  5. Andreas Björklund, Thore Husfeldt, Petteri Kaski, and Mikko Koivisto. Narrow sieves for parameterized paths and packings. J. Comput. Syst. Sci., 87:119-139, 2017. Google Scholar
  6. Andreas Björklund, Petteri Kaski, and Łukasz Kowalik. Constrained multilinear detection and generalized graph motifs. Algorithmica, 74(2):947-967, 2016. Google Scholar
  7. Andreas Björklund, Petteri Kaski, Łukasz Kowalik, and Juho Lauri. Engineering motif search for large graphs. In Proceedings of the 17th Workshop on Algorithm Engineering and Experiments (ALENEX), pages 104-118. SIAM, 2015. Google Scholar
  8. Sharon Bruckner, Falk Hüffner, Richard M. Karp, Ron Shamir, and Roded Sharan. Topology-free querying of protein interaction networks. Journal of Computational Biology, 17(3):237-252, 2010. Google Scholar
  9. Ferdinando Cicalese, Travis Gagie, Emanuele Giaquinta, Eduardo Sany Laber, Zsuzsanna Lipták, Romeo Rizzi, and Alexandru I. Tomescu. Indexes for jumbled pattern matching in strings, trees and graphs. In Proceedings of the 20th International Symposium on String Processing and Information Retrieval (SPIRE), volume 8214 of Lecture Notes in Computer Science, pages 56-63. Springer, 2013. Google Scholar
  10. Marek Cygan, Fedor V. Fomin, Łukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. Google Scholar
  11. Riccardo Dondi, Guillaume Fertin, and Stéphane Vialette. Maximum motif problem in vertex-colored graphs. In Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 5577 of Lecture Notes in Computer Science, pages 221-235. Springer, 2009. Google Scholar
  12. Édouard Bonnet and Florian Sikora. The graph motif problem parameterized by the structure of the input graph. Discrete Applied Mathematics, 231:78-94, 2017. Google Scholar
  13. Michael R. Fellows, Guillaume Fertin, Danny Hermelin, and Stéphane Vialette. Upper and lower bounds for finding connected motifs in vertex-colored graphs. J. Comput. Syst. Sci., 77(4):799-811, 2011. URL: http://dx.doi.org/10.1016/j.jcss.2010.07.003.
  14. Travis Gagie, Danny Hermelin, Gad M. Landau, and Oren Weimann. Binary jumbled pattern matching on trees and tree-like structures. In Proceedings of the 21st Annual European Symposium on Algorithms (ESA), volume 8125 of Lecture Notes in Computer Science, pages 517-528. Springer, 2013. Google Scholar
  15. Emanuele Giaquinta and Szymon Grabowski. New algorithms for binary jumbled pattern matching. Inf. Process. Lett., 113(14-16):538-542, 2013. Google Scholar
  16. Shay Gueron and Michael E. Kounavis. Intel (R) Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode - Rev 2.02. Intel Corporation, April 2014. URL: https://software.intel.com/en-us/articles/intel-carry-less-multiplication-instruction-and-its-usage-for-computing-the-gcm-mode.
  17. Sylvain Guillemot and Florian Sikora. Finding and counting vertex-colored subtrees. Algorithmica, 65(4):828-844, 2013. Google Scholar
  18. Tomasz Kociumaka, Jakub Radoszewski, and Wojciech Rytter. Efficient indexes for jumbled pattern matching with constant-sized alphabet. In Proceedings of the 21st Annual European Symposium on Algorithms (ESA), volume 8125 of Lecture Notes in Computer Science, pages 625-636. Springer, 2013. Google Scholar
  19. Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM Rev., 51(3):455-500, 2009. Google Scholar
  20. Ioannis Koutis. Faster algebraic algorithms for path and packing problems. In Proceedings of the 35th International Colloquium on Automata, Languages and Programming (ICALP), volume 5125 of Lecture Notes in Computer Science, pages 575-586. Springer, 2008. Google Scholar
  21. Ioannis Koutis. Constrained multilinear detection for faster functional motif discovery. Inf. Process. Lett., 112(22):889-892, 2012. URL: http://dx.doi.org/10.1016/j.ipl.2012.08.008.
  22. Ioannis Koutis and Ryan Williams. Algebraic fingerprints for faster algorithms. Commun. ACM, 59(1):98-105, 2016. Google Scholar
  23. Jérôme Kunegis. KONECT: the Koblenz network collection. In Proceedings of the 22nd International World Wide Web Conference (WWW), pages 1343-1350, 2013. URL: http://konect.uni-koblenz.de/networks/.
  24. Vincent Lacroix, Cristina G. Fernandes, and Marie-France Sagot. Motif search in graphs: Application to metabolic networks. IEEE/ACM Trans. Comput. Biology Bioinform., 3(4):360-368, 2006. URL: http://dx.doi.org/10.1109/TCBB.2006.55.
  25. Sian-Jheng Lin, Tareq Y. Al-Naffouri, Yunghsiang S. Han, and Wei-Ho Chung. Novel polynomial basis with fast Fourier transform and its application to Reed-Solomon erasure codes. IEEE Trans. Inform. Theory, 62(11):6284-6299, 2016. Google Scholar
  26. Edoardo Mastrovito. VLSI Architectures for Computations in Galois Fields. PhD thesis, Department of Electrical Engineering, Linköping University, 1991. Google Scholar
  27. Xinxin Mei and Xiaowen Chu. Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans. Parallel Distrib. Syst., 28(1):72-86, 2017. Google Scholar
  28. Jesper Nederlof. Fast polynomial-space algorithms using inclusion-exclusion. Algorithmica, 65(4):868-884, 2013. Google Scholar
  29. NVIDIA Corporation. CUDA C Programming Guide, Version 9. URL: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
  30. OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 4.5 - November 2015. URL: http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf.
  31. Ron Y. Pinter, Hadas Shachnai, and Meirav Zehavi. Deterministic parameterized algorithms for the graph motif problem. Discrete Applied Mathematics, 213:162-178, 2016. Google Scholar
  32. Ron Y. Pinter and Meirav Zehavi. Partial information network queries. In Proceedings of the 24th International Workshop on Combinatorial Algorithms (IWOCA), volume 8288 of Lecture Notes in Computer Science, pages 362-375. Springer, 2013. Google Scholar
  33. Ron Y. Pinter and Meirav Zehavi. Algorithms for topology-free and alignment network queries. J. Discrete Algorithms, 27:29-53, 2014. Google Scholar
  34. Atri Rudra, Pradeep K. Dubey, Charanjit S. Jutla, Vijay Kumar, Josyula R. Rao, and Pankaj Rohatgi. Efficient Rijndael encryption implementation with composite field arithmetic. In Proceedings of the 3rd International Workshop on Cryptographic Hardware and Embedded Systems (CHES), volume 2162 of Lecture Notes in Computer Science, pages 171-184. Springer, 2001. Google Scholar
  35. Vasily Volkov. Understanding Latency Hiding on GPUs. PhD thesis, Electrical Engineering and Computer Sciences, University of California at Berkeley, 2016. Technical Report No. UCB/EECS-2016-143. Google Scholar
  36. Ryan Williams. Finding paths of length k in O^*(2^k) time. Inf. Process. Lett., 109(6):315-318, 2009. URL: http://dx.doi.org/10.1016/j.ipl.2008.11.004.
  37. Meirav Zehavi. Parameterized algorithms for module motif. In Proceedings of the 38th International Symposium on Mathematical Foundations of Computer Science (MFCS), volume 8087 of Lecture Notes in Computer Science, pages 825-836. Springer, 2013. Google Scholar
  38. Meirav Zehavi. Parameterized algorithms for the module motif problem. Inf. Comput., 251:179-193, 2016. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail