Overparameterization: A Connection Between Software 1.0 and Software 2.0

Carbin, Michael

doi:10.4230/LIPIcs.SNAPL.2019.1

Abstract

A new ecosystem of machine-learning driven applications, titled Software 2.0, has arisen that integrates neural networks into a variety of computational tasks. Such applications include image recognition, natural language processing, and other traditional machine learning tasks. However, these techniques have also grown to include other structured domains, such as program analysis and program optimization for which novel, domain-specific insights mate with model design. In this paper, we connect the world of Software 2.0 with that of traditional software - Software 1.0 - through overparameterization: a program may provide more computational capacity and precision than is necessary for the task at hand. In Software 2.0, overparamterization - when a machine learning model has more parameters than datapoints in the dataset - arises as a contemporary understanding of the ability for modern, gradient-based learning methods to learn models over complex datasets with high-accuracy. Specifically, the more parameters a model has, the better it learns. In Software 1.0, the results of the approximate computing community show that traditional software is also overparameterized in that software often simply computes results that are more precise than is required by the user. Approximate computing exploits this overparameterization to improve performance by eliminating unnecessary, excess computation. For example, one - of many techniques - is to reduce the precision of arithmetic in the application. In this paper, we argue that the gap between available precision and that that is required for either Software 1.0 or Software 2.0 is a fundamental aspect of software design that illustrates the balance between software designed for general-purposes and domain-adapted solutions. A general-purpose solution is easier to develop and maintain versus a domain-adapted solution. However, that ease comes at the expense of performance. We show that the approximate computing community and the machine learning community have developed overlapping techniques to improve performance by reducing overparameterization. We also show that because of these shared techniques, questions, concerns, and answers on how to construct software can translate from one software variant to the other.

Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. A Convergence Theory for Deep Learning via Over-Parameterization. In International Conference on Machine Learning, pages 242-252. PMLR, 2019.
Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, and Ruosong Wang. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks. In International Conference on Machine Learning, pages 322-332. PMLR, 2019.
E. Atkinson, C. Yang, and M. Carbin. Verifying Handcoded Probabilistic Inference Procedures. CoRR, 2018. URL: http://arxiv.org/abs/1805.01863.
Eric Atkinson and Michael Carbin. Towards Correct-by-Construction Probabilistic Inference. In NIPS Workshop on Machine Learning Systems, 2016.
Jimmy Ba and Rich Caruana. Do deep nets really need to be deep? In Advances in neural information processing systems, 2014.
M. Carbin, D. Kim, S. Misailovic, and M. Rinard. Proving Acceptability Properties of Relaxed Nondeterministic Approximate Programs. In Conference on Programming Language Design and Implementation, pages 169-180. ACM, 2012. URL: https://doi.org/10.1145/2254064.2254086.
M. Carbin, D. Kim, S. Misailovic, and M. Rinard. Verified integrity properties for safe approximate program transformations. In Workshop on Partial Evaluation and Program Manipulation, pages 63-66. ACM, 2013. URL: https://doi.org/10.1145/2426890.2426901.
M. Carbin, S. Misailovic, and M. Rinard. Verifying Quantitative Reliability for Programs That Execute on Unreliable Hardware. In International Conference on Object Oriented Programming Systems Languages & Applications, pages 33-52, 2013. URL: https://doi.org/10.1145/2509136.2509546.
S. Chaudhuri, S. Gulwani, R. Lublinerman, and S. Navidpour. Proving Programs Robust. In Symposium on the Foundations of Software Engineering and European Software Engineering Conference, pages 102-112, 2011. URL: https://doi.org/10.1145/2025113.2025131.
Wei-Fan Chiang, Mark Baranowski, Ian Briggs, Alexey Solovyev, Ganesh Gopalakrishnan, and Zvonimir Rakamarić. Rigorous Floating-point Mixed-precision Tuning. In Symposium on Principles of Programming Languages, pages 300-315. ACM, 2017.
Eva Darulova, Einar Horn, and Saksham Sharma. Sound Mixed-precision Optimization with Rewriting. In International Conference on Cyber-Physical Systems, pages 208-219. IEEE Press, 2018. URL: https://doi.org/10.1109/ICCPS.2018.00028.
Eva Darulova and Viktor Kuncak. Sound compilation of reals. In Symposium on Principles of Programming Languages. ACM, 2014. URL: https://doi.org/10.1145/2535838.2535874.
M. de Kruijf, S. Nomura, and K. Sankaralingam. Relax: an architectural framework for software recovery of hardware faults. In International Symposium on Computer Architecture, pages 497-508. ACM, 2010. URL: https://doi.org/10.1145/1815961.1816026.
Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. In Advances Neural Information Processing Systems, 2014.
Simon S. Du, Jason D. Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai. Gradient Descent Finds Global Minima of Deep Neural Networks. In International Conference on Machine Learning, pages 1675-1685. PMLR, 2019.
D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. Razor: A low-power pipeline based on circuit-level timing speculation. In International Symposium on Microarchitecture, pages 7-18, 2003. URL: https://doi.org/10.1109/MICRO.2003.1253179.
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Architecture support for disciplined approximate programming. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 301-312. ACM, 2012. URL: https://doi.org/10.1145/2150976.2151008.
Jonathan Frankle and Michael Carbin. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In International Conference on Learning Representations, 2019.
Trevor Gale, Erich Elsen, and Sara Hooker. The State of Sparsity in Deep Neural Networks. CoRR, 2019. URL: http://arxiv.org/abs/1902.09574.
Noah D. Goodman, Vikash K. Mansinghka, Daniel M. Roy, Keith Bonawitz, and Joshua B. Tenenbaum. Church: a language for generative models. In Conference on Uncertainty in Artificial Intelligence, pages 220-229. AUAI Press, 2008.
Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, 2015.
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In NIPS Workshop on Deep Learning, 2014. URL: http://arxiv.org/abs/1503.02531.
H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic Knobs for Responsive Power-Aware Computing. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 199-212. ACM, 2011. URL: https://doi.org/10.1145/1950365.1950390.
Daniel Kang, Deepti Raghavan, Peter Bailis, and Matei Zaharia. Model Assertions for Debugging Machine Learning. In NeurIPS MLSys Workshop, 2018.
Andrej Karpathy. Software 2.0, November 2017.
Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In Advances in neural information processing systems, 1990.
L. Leem, H. Cho, J. Bau, Q. Jacobson, and S. Mitra. ERSA: error resilient system architecture for probabilistic applications. In Design, Automation and Test in Europe, pages 1560-1565, 2010. URL: https://doi.org/10.1109/DATE.2010.5457059.
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. In International Conference on Learning Representations, 2017.
Yuanzhi Li and Yingyu Liang. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data. In International Conference on Neural Information Processing Systems, pages 8168-8177. Curran Associates Inc., 2018.
Christos Louizos, Max Welling, and Diederik P Kingma. Learning sparse neural networks through L₀ regularization. In International Conference on Learning Representations, 2018.
Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A filter level pruning method for deep neural network compression. In International Conference on Computer Vision, pages 5068-5076, 2017. URL: https://doi.org/10.1109/ICCV.2017.541.
Vikash K. Mansinghka, Ulrich Schaechtle, Shivam Handa, Alexey Radul, Yutian Chen, and Martin Rinard. Probabilistic Programming with Programmable Inference. In Conference on Programming Language Design and Implementation, pages 603-616, 2018. URL: https://doi.org/10.1145/3192366.3192409.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, 2013.
S. Misailovic, M. Carbin, S. Achour, Z. Qi, and M. Rinard. Chisel: Reliability- and Accuracy-aware Optimization of Approximate Computational Kernels. In International Conference on Object Oriented Programming Systems Languages & Applications, pages 309-328. ACM, 2014. URL: https://doi.org/10.1145/2660193.2660231.
S. Misailovic, D. Roy, and M. Rinard. Probabilistically Accurate Program Transformations. In International Symposium on Static Analysis, pages 316-333. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-23702-7_24.
S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. Quality of service profiling. In International Conference on Software Engineering, pages 25-34, 2010. URL: https://doi.org/10.1145/1806799.1806808.
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. Pruning Convolutional Neural Networks for Resource Efficient Inference. In International Conference on Learning Representations, 2017.
S. Narayanan, J. Sartori, R. Kumar, and D. Jones. Scalable stochastic processors. In Design, Automation and Test in Europe, pages 335-338. IEEE Computer Society, 2010. URL: https://doi.org/10.1109/DATE.2010.5457181.
K. Palem. Energy aware computing through probabilistic switching: A study of limits. IEEE Transactions on Computers, 2005.
Antonio Polino, Razvan Pascanu, and Dan Alistarh. Model compression via distillation and quantization. In International Conference on Learning Representations, 2018.
Samyam Rajbhandari, Yuxiong He, Olatunji Ruwase, Michael Carbin, and Trishul Chilimbi. Optimizing CNNs on Multicores for Scalability, Performance and Goodput. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 267-280. ACM, 2017. URL: https://doi.org/10.1145/3037697.3037745.
M. Rinard. Acceptability-oriented computing. In International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 221-239. ACM, 2003. URL: https://doi.org/10.1145/949344.949402.
M. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In International Conference on Supercomputing, pages 324-334. ACM, 2006. URL: https://doi.org/10.1145/1183401.1183447.
C. Rubio-González, C. Nguyen, H. D. Nguyen, J. Demmel, W. Kahan, K. Sen, D. H. Bailey, C. Iancu, and D. Hough. Precimonious: Tuning assistant for floating-point precision. In International Conference for High Performance Computing, Networking, Storage and Analysis, pages 27:1-27:12. ACM, 2013. URL: https://doi.org/10.1145/2503210.2503296.
A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. EnerJ: Approximate data types for safe and general low-power computation. In Conference on Programming Language Design and Implementation, pages 164-174. ACM, 2011. URL: https://doi.org/10.1145/1993498.1993518.
Benjamin Sherman, Jesse Michel, and Michael Carbin. Sound and robust solid modeling via exact real arithmetic and continuity. In International Conference on Functional Programming. ACM, 2019.
Benjamin Sherman, Luke Sciarappa, Adam Chlipala, and Michael Carbin. Computable decision making on the reals and other spaces: via partiality and nondeterminism. In Symposium on Logic in Computer Science, pages 859-868. ACM, 2018. URL: https://doi.org/10.1145/3209108.3209193.
Benjamin Sherman, Jared Tramontano, and Michael Carbin. Constructive probabilistic semantics with non-spatial locales. In Workshop on Probabilistic Programming Languages, Semantics, and Systems, 2018.
S. Sidiroglou, S. Misailovic, H. Hoffmann, and M. Rinard. Managing Performance vs. Accuracy Trade-offs With Loop Perforation. In Symposium on the Foundations of Software Engineering, pages 124-134. ACM, 2011. URL: https://doi.org/10.1145/2025113.2025133.
Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Conference on Computer Vision and Pattern Recognition, pages 6071-6079, 2017. URL: https://doi.org/10.1109/CVPR.2017.643.
Z. Zhu, S. Misailovic, J. Kelner, and M. Rinard. Randomized Accuracy-Aware Program Transformations for Efficient Approximate Computations. In Symposium on Principles of Programming Languages, pages 441-454. ACM, 2012. URL: https://doi.org/10.1145/2103656.2103710.

Overparameterization: A Connection Between Software 1.0 and Software 2.0

Author Michael Carbin

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Overparameterization: A Connection Between Software 1.0 and Software 2.0

Author Michael Carbin

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message