Feature Cross Search via Submodular Optimization

Chen, Lin; Esfandiari, Hossein; Fu, Gang; Mirrokni, Vahab S.; Yu, Qian

doi:10.4230/LIPIcs.ESA.2021.31

Abstract

In this paper, we study feature cross search as a fundamental primitive in feature engineering. The importance of feature cross search especially for the linear model has been known for a while, with well-known textbook examples. In this problem, the goal is to select a small subset of features, combine them to form a new feature (called the crossed feature) by considering their Cartesian product, and find feature crosses to learn an accurate model. In particular, we study the problem of maximizing a normalized Area Under the Curve (AUC) of the linear model trained on the crossed feature column. First, we show that it is not possible to provide an n^{1/log log n}-approximation algorithm for this problem unless the exponential time hypothesis fails. This result also rules out the possibility of solving this problem in polynomial time unless 𝖯 = NP. On the positive side, by assuming the naïve Bayes assumption, we show that there exists a simple greedy (1-1/e)-approximation algorithm for this problem. This result is established by relating the AUC to the total variation of the commutator of two probability measures and showing that the total variation of the commutator is monotone and submodular. To show this, we relate the submodularity of this function to the positive semi-definiteness of a corresponding kernel matrix. Then, we use Bochner’s theorem to prove the positive semi-definiteness by showing that its inverse Fourier transform is non-negative everywhere. Our techniques and structural results might be of independent interest.

L Douglas Baker and Andrew Kachites McCallum. Distributional clustering of words for text classification. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 96-103. ACM, 1998.
Mohammadhossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, and Afshin Rostamizadeh. Categorical feature compression via submodular optimization. In International Conference on Machine Learning, pages 515-523, 2019.
Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. Detecting high log-densities: an o (n^1/4) approximation for densest k-subgraph. In STOC, pages 201-210. ACM, 2010.
Simon Byrne. A note on the use of empirical auc for evaluating probabilistic forecasts. Electronic Journal of Statistics, 10(1):380-393, 2016.
Lin Chen, Hossein Esfandiari, Gang Fu, Vahab S Mirrokni, and Qian Yu. Feature cross search via submodular optimization. arXiv preprint arXiv:2107.02139, 2021.
Yuxin Chen, S Hamed Hassani, Amin Karbasi, and Andreas Krause. Sequential information maximization: When is greedy near-optimal? In Conference on Learning Theory, pages 338-363, 2015.
Inderjit S Dhillon, Subramanyam Mallela, and Rahul Kumar. A divisive information-theoretic feature clustering algorithm for text classification. Journal of machine learning research, 3(Mar):1265-1287, 2003.
Ethan R Elenberg, Rajiv Khanna, Alexandros G Dimakis, and Sahand Negahban. Restricted strong convexity implies weak submodularity. The Annals of Statistics, 46(6B):3539-3568, 2018.
Susana Eyheramendy, David D Lewis, and David Madigan. On the naive bayes model for text categorization. In 9th International Workshop on Artificial Intelligence and Statistics. Citeseer, 2003.
Isabelle Guyon and André Elisseeff. An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157-1182, 2003.
Nazrul Hoque, Dhruba K Bhattacharyya, and Jugal K Kalita. Mifs-nd: A mutual information-based feature selection method. Expert Systems with Applications, 41(14):6371-6385, 2014.
Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. Journal of Computer and System Sciences, 62(2):367-375, 2001.
Ken-ichi Iwata and Shin-ya Ozawa. Quantizer design for outputs of binary-input discrete memoryless channels using smawk algorithm. In 2014 IEEE International Symposium on Information Theory, pages 191-195. IEEE, 2014.
Andreas Krause and Daniel Golovin. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems, pages 71-104. Cambridge University Press, 2014.
Andreas Krause and Carlos Guestrin. Near-optimal nonmyopic value of information in graphical models. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, pages 324-331. AUAI Press, 2005.
Andreas Krause, Carlos Guestrin, Anupam Gupta, and Jon Kleinberg. Near-optimal sensor placements: Maximizing information while minimizing communication cost. In Proceedings of the 5th international conference on Information processing in sensor networks, pages 2-10. ACM, 2006.
Brian M Kurkoski and Hideki Yagi. Quantization of binary-input discrete memoryless channels. IEEE Transactions on Information Theory, 60(8):4544-4552, 2014.
Nojun Kwak and Chong-Ho Choi. Input feature selection by mutual information based on parzen window. IEEE transactions on pattern analysis and machine intelligence, 24(12):1667-1671, 2002.
Hui Lin. Submodularity in natural language processing: algorithms and applications. PhD thesis, University of Washington, 2012.
Yuanfei Luo, Mengshuo Wang, Hao Zhou, Quanming Yao, Wei-Wei Tu, Yuqiang Chen, Wenyuan Dai, and Qiang Yang. Autocross: Automatic feature crossing for tabular data in real-world applications. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019., pages 1936-1945, 2019.
Pasin Manurangsi. Almost-polynomial ratio eth-hardness of approximating densest k-subgraph. In STOC, pages 954-961. ACM, 2017.
Tom Mitchell. Machine Learning. McGraw-Hill International Editions. McGraw-Hill, 1997.
Marko Mitrovic, Ehsan Kazemi, Morteza Zadimoghaddam, and Amin Karbasi. Data summarization at scale: A two-stage submodular approach. In ICML, pages 3593-3602, 2018.
George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations for maximizing submodular set functions—i. Mathematical programming, 14(1):265-294, 1978.
Feiping Nie, Heng Huang, Xiao Cai, and Chris H Ding. Efficient and robust feature selection via joint 𝓁_2,1-norms minimization. In Advances in neural information processing systems, pages 1813-1821, 2010.
Monica Rogati and Yiming Yang. High-performing feature selection for text classification. In Proceedings of the eleventh international conference on Information and knowledge management, pages 659-661. ACM, 2002.
Henry Scheffé. A useful convergence theorem for probability distributions. The Annals of Mathematical Statistics, 18(3):434-438, 1947.
Noam Slonim and Naftali Tishby. The power of word clusters for text classification. In 23rd European Colloquium on Information Retrieval Research, volume 1, page 200, 2001.
Burak Turhan and Ayse Bener. Analysis of naive bayes’ assumptions on software fault data: An empirical study. Data & Knowledge Engineering, 68(2):278-290, 2009.
Dennis Wei, Sanjeeb Dash, Tian Gao, and Oktay Günlük. Generalized linear rule models. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 6687-6696, 2019.
Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active learning. In International Conference on Machine Learning, pages 1954-1963. PMLR, 2015.
Jason Weston, Sayan Mukherjee, Olivier Chapelle, Massimiliano Pontil, Tomaso Poggio, and Vladimir Vapnik. Feature selection for svms. In Advances in neural information processing systems, pages 668-674, 2001.
Sepehr Abbasi Zadeh, Mehrdad Ghadiri, Vahab Mirrokni, and Morteza Zadimoghaddam. Scalable feature selection via distributed diversity maximization. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
Yuanxing Zhang, Yichong Bai, Lin Chen, Kaigui Bian, and Xiaoming Li. Influence maximization in messenger-based social networks. In GLOBECOM, pages 1-6. IEEE, 2016.

Feature Cross Search via Submodular Optimization

Authors Lin Chen, Hossein Esfandiari, Gang Fu, Vahab S. Mirrokni, Qian Yu

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Feature Cross Search via Submodular Optimization

Authors Lin Chen, Hossein Esfandiari, Gang Fu, Vahab S. Mirrokni, Qian Yu

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message