Algorithmic Data Science (Invited Talk)

Author Petra Mutzel



PDF
Thumbnail PDF

File

LIPIcs.STACS.2019.3.pdf
  • Filesize: 1.02 MB
  • 15 pages

Document Identifiers

Author Details

Petra Mutzel
  • TU Dortmund, Department of Computer Science, Otto-Hahn-Str. 14, 44221 Dortmund, Germany

Cite As Get BibTex

Petra Mutzel. Algorithmic Data Science (Invited Talk). In 36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 126, pp. 3:1-3:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.STACS.2019.3

Abstract

The area of algorithmic data science provides new opportunities for researchers in the algorithmic community. In this paper we will see examples that demonstrate that algorithm engineering is the perfect basis for algorithmic data science. But there are also many open interesting questions for purely theoretically interested computer scientists. In my opinion, these opportunities should be taken because this will be fruitful for both areas, algorithmics as well as data sciences. I like to call for more participation in algorithmic data science by our community. Now we have the opportunity to shape this new emerging field.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Graph algorithms analysis
  • Theory of computation → Complexity theory and logic
  • Mathematics of computing → Combinatorial algorithms
Keywords
  • Algorithmic Data Science
  • Graph Similarity
  • Weisfeiler-Leman

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Marcel R. Ackermann, Johannes Blömer, Daniel Kuntze, and Christian Sohler. Analysis of Agglomerative Clustering. Algorithmica, 69(1):184-215, 2014. URL: http://dx.doi.org/10.1007/s00453-012-9717-4.
  2. Charu C. Aggarwal. Data Mining: The Textbook. Springer Publishing Company, Incorporated, 2015. Google Scholar
  3. Charu C. Aggarwal and Jiawei Han. Frequent Pattern Mining. Springer Publishing Company, Incorporated, 2014. Google Scholar
  4. Tatsuya Akutsu and Takeyuki Tamura. A Polynomial-Time Algorithm for Computing the Maximum Common Subgraph of Outerplanar Graphs of Bounded Degree. In Branislav Rovan, Vladimiro Sassone, and Peter Widmayer, editors, 37th International Symposium on Mathematical Foundations of Computer Science, MFCS 2012, pages 76-87, Berlin, Heidelberg, 2012. Springer. Google Scholar
  5. Albert Atserias and Elitza Maneva. Sherali-Adams Relaxations and Indistinguishability in Counting Logics. SIAM Journal on Computing, 42(1):112-137, 2013. URL: http://dx.doi.org/10.1137/120867834.
  6. László Babai. Graph isomorphism in quasipolynomial time. In Proceedings of the Forty-eighth Annual ACM Symposium on Theory of Computing, STOC 2016, pages 684-697, New York, NY, USA, 2016. ACM. Google Scholar
  7. László Babai, Paul Erdös, and Stanley M. Selkow. Random Graph Isomorphism. SIAM Journal on Computing, 9(3):628-635, 1980. URL: http://dx.doi.org/10.1137/0209047.
  8. Shai Ben-David, Pavel Hrubes, Shay Moran, Amir Shpilka, and Amir Yehudayoff. Learnability can be undecidable. Nature Machine Intelligence, 1:44-48, 2019. URL: http://dx.doi.org/10.1038/s42256-018-0002-3.
  9. Avrim Blum, John Hopcroft, and Ravi Kannan. Foundations of Data Science, 2018. URL: https://www.cs.cornell.edu/jeh/book.pdf.
  10. Hans L. Bodlaender. Polynomial algorithms for graph isomorphism and chromatic index on partial k-trees. Journal of Algorithms, 11(4):631-643, 1990. URL: http://dx.doi.org/10.1016/0196-6774(90)90013-5.
  11. Franz J. Brandenburg. Subgraph isomorphism problems for k-connected partial k-trees. Unpublished manuscript, 2000. Google Scholar
  12. Coen Bron and Joep Kerbosch. Algorithm 457: finding all cliques of an undirected graph. In CACM, 1973. Google Scholar
  13. Jin-yi Cai, Martin Fürer, and Neil Immerman. An optimal lower bound on the number of variables for graph identificationS. Combinatorica, 12(4):389-410, 1992. Google Scholar
  14. Artur Czumaj, Pan Peng, and Christian Sohler. Testing Cluster Structure of Graphs. In Rocco A. Servedio and Ronitt Rubinfeld, editors, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, pages 723-732. ACM, 2015. URL: http://dx.doi.org/10.1145/2746539.2746618.
  15. Holger Dell, Martin Grohe, and Gaurav Rattan. Lovász Meets Weisfeiler and Leman. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018), volume 107 of Leibniz International Proceedings in Informatics (LIPIcs), pages 40:1-40:14, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2018.40.
  16. Andre Droschinsky, Nils Kriege, and Petra Mutzel. Finding Largest Common Substructures of Molecules in Quadratic Time. In Bernhard Steffen, Christel Baier, Mark van den Brand, Johann Eder, Mike Hinchey, and Tiziana Margaria, editors, SOFSEM 2017: Theory and Practice of Computer Science - 43rd International Conference on Current Trends in Theory and Practice of Computer Science, volume 10139 of Lecture Notes in Computer Science, pages 309-321. Springer, 2017. URL: http://dx.doi.org/10.1007/978-3-319-51963-0_24.
  17. Andre Droschinsky, Nils M. Kriege, and Petra Mutzel. Largest Weight Common Subtree Embeddings with Distance Penalties. In Igor Potapov, Paul G. Spirakis, and James Worrell, editors, 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018, volume 117 of LIPIcs, pages 54:1-54:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.MFCS.2018.54.
  18. Dan Feldman, Melanie Schmidt, and Christian Sohler. Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering. In Sanjeev Khanna, editor, Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, pages 1434-1453. SIAM, 2013. URL: http://dx.doi.org/10.1137/1.9781611973105.103.
  19. Fedor V. Fomin, Fabrizio Grandoni, and Dieter Kratsch. Measure and Conquer: A Simple O(2^0.288N) Independent Set Algorithm. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA '06, pages 18-25, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=1109557.1109560.
  20. Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural Message Passing for Quantum Chemistry. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research (PMLR). PMLR, 2017. Google Scholar
  21. Martin Grohe. Descriptive Complexity, Canonisation, and Definable Graph Structure Theory. Lecture Notes in Logic. Cambridge University Press, 2017. Google Scholar
  22. Martin Grohe and Daniel Marx. Structure Theorem and Isomorphism Test for Graphs with Excluded Topological Subgraphs. SIAM Journal on Computing, 44(1):114-159, 2015. URL: http://dx.doi.org/10.1137/120892234.
  23. Martin Grohe, Daniel Neuen, Pascal Schweitzer, and Daniel Wiebking. An Improved Isomorphism Test for Bounded-Tree-Width Graphs. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, volume 107 of LIPIcs, pages 67:1-67:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2018.67.
  24. Martin Grohe and Martin Otto. Pebble games and linear equations. The Journal of Symbolic Logic, 80(3):797-844, 2015. URL: http://dx.doi.org/10.1017/jsl.2015.28.
  25. Martin Grohe, Gaurav Rattan, and Gerhard J. Woeginger. Graph Similarity and Approximate Isomorphism. In Igor Potapov, Paul G. Spirakis, and James Worrell, editors, 43rd International Symposium on Mathematical Foundations of Computer Science, MFCS 2018, volume 117 of LIPIcs, pages 20:1-20:16. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.MFCS.2018.20.
  26. John E. Hopcroft and Joseph K. Wong. Linear Time Algorithm for Isomorphism of Planar Graphs (Preliminary Report). In Proceedings of the Sixth Annual ACM Symposium on Theory of Computing, STOC 1974, pages 172-184, New York, NY, USA, 1974. ACM. URL: http://dx.doi.org/10.1145/800119.803896.
  27. Viggo Kann. On the approximability of the maximum common subgraph problem. In Alain Finkel and Matthias Jantzen, editors, STACS 92, pages 375-388, Berlin, Heidelberg, 1992. Springer Berlin Heidelberg. Google Scholar
  28. Nils Kriege, Florian Kurpicz, and Petra Mutzel. On maximum common subgraph problems in series-parallel graphs. European Journal of Combinatorics, 68:79-95, 2018. Combinatorial Algorithms, Dedicated to the Memory of Mirka Miller. URL: http://dx.doi.org/10.1016/j.ejc.2017.07.012.
  29. Nils M. Kriege, Lina Humbeck, and Oliver Koch. Chemical Similarity and Substructure Searches. In Shoba Ranganathan, Michael Gribskov, Kenta Nakai, and Christian Schönbach, editors, Encyclopedia of Bioinformatics and Computational Biology, pages 640-649. Academic Press, Oxford, 2019. URL: http://dx.doi.org/10.1016/B978-0-12-809633-8.20195-7.
  30. Nils Morten Kriege. Comparing Graphs: Algorithms &Applications. Phd thesis, TU Dortmund, 2015. Google Scholar
  31. Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of Massive Datasets. Cambridge University Press, 2 edition, 2014. URL: http://dx.doi.org/10.1017/CBO9781139924801.
  32. Daniel Lokshtanov, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Fixed-Parameter Tractable Canonization and Isomorphism Test for Graphs of Bounded Treewidth. SIAM Journal on Computing, 46(1):161-189, 2017. URL: http://dx.doi.org/10.1137/140999980.
  33. Eugene M. Luks. Isomorphism of graphs of bounded valence can be tested in polynomial time. Journal of Computer and System Sciences, 25:42-65, 1982. Google Scholar
  34. Peter N. Malkin. Sherali-Adams relaxations of graph isomorphism polytopes. Discrete Optimization, 12:73-97, 2014. Google Scholar
  35. David W. Matula. Subtree Isomorphism in O(n^5/2). In B. Alspach, P. Hell, and D.J. Miller, editors, Algorithmic Aspects of Combinatorics, volume 2 of Annals of Discrete Mathematics, pages 91-106. Elsevier, 1978. URL: http://dx.doi.org/10.1016/S0167-5060(08)70324-8.
  36. Brendan D. McKay and Adolfo Piperno. Practical Graph Isomorphism, II. J. Symb. Comput., 60:94-112, 2014. URL: http://dx.doi.org/10.1016/j.jsc.2013.09.003.
  37. Christopher Morris, Kristian Kersting, and Petra Mutzel. Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs. In Vijay Raghavan, Srinivas Aluru, George Karypis, Lucio Miele, and Xindong Wu, editors, IEEE International Conference on Data Mining (ICDM), 2017, pages 327-336. IEEE Computer Society, 2017. URL: http://dx.doi.org/10.1109/ICDM.2017.42.
  38. Christopher Morris and Petra Mutzel. Towards a practical k-dimensional Weisfeiler-Leman algorithm. Unpublished manuscript, 2019. Google Scholar
  39. Christopher Morris, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and Leman go neural: Higher-order graph neural networks. CoRR, abs/1810.02244, 2018. to appear at AAAI 2019. URL: http://arxiv.org/abs/1810.02244.
  40. John M. Robson. Finding a maximum independent set in time O(2^n/4). Technical report, LaBRI, Université Bordeaux I, 2001. Google Scholar
  41. Uwe Schöning. Graph isomorphism is in the low hierarchy. Journal of Computer and System Sciences, 37(3):312-323, 1988. URL: http://dx.doi.org/10.1016/0022-0000(88)90010-4.
  42. Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014. URL: http://dx.doi.org/10.1017/CBO9781107298019.
  43. Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research, 12:2539-2561, 2011. Google Scholar
  44. Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. Efficient graphlet kernels for large graph comparison. In David van Dyk and Max Welling, editors, Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pages 488-495, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 2009. PMLR. URL: http://proceedings.mlr.press/v5/shervashidze09a.html.
  45. Steven S. Skiena. The Data Science Design Manual. Springer, 2017. URL: https://www.springer.com/de/book/9783319554433.
  46. Gottfried Tinhofer. A note on compact graphs. Discrete Applied Mathematics, 30(2):253-264, 1991. Google Scholar
  47. University Consortium for Geographic Information Science. A UCGIS Call to Action: Bringing the Geospatial Perspective to Data Science Degrees and Curricula, 2018. URL: https://www.ucgis.org/assets/docs/UCGIS-Statement-on-Data-Science-Summer2018.pdf.
  48. Mohammed J. Zaki and Meira Wagner Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York, NY, USA, 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail