Compression in a Distributed Setting

Authors Badih Ghazi, Elad Haramaty, Pritish Kamath, Madhu Sudan

Thumbnail PDF


  • Filesize: 0.55 MB
  • 22 pages

Document Identifiers

Author Details

Badih Ghazi
Elad Haramaty
Pritish Kamath
Madhu Sudan

Cite AsGet BibTex

Badih Ghazi, Elad Haramaty, Pritish Kamath, and Madhu Sudan. Compression in a Distributed Setting. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 67, pp. 19:1-19:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Motivated by an attempt to understand the formation and development of (human) language, we introduce a "distributed compression" problem. In our problem a sequence of pairs of players from a set of K players are chosen and tasked to communicate messages drawn from an unknown distribution Q. Arguably languages are created and evolve to compress frequently occurring messages, and we focus on this aspect. The only knowledge that players have about the distribution Q is from previously drawn samples, but these samples differ from player to player. The only common knowledge between the players is restricted to a common prior distribution P and some constant number of bits of information (such as a learning algorithm). Letting T_epsilon denote the number of iterations it would take for a typical player to obtain an epsilon-approximation to Q in total variation distance, we ask whether T_epsilon iterations suffice to compress the messages down roughly to their entropy and give a partial positive answer. We show that a natural uniform algorithm can compress the communication down to an average cost per message of O(H(Q) + log (D(P || Q)) in tilde{O}(T_epsilon) iterations while allowing for O(epsilon)-error, where D(. || .) denotes the KL-divergence between distributions. For large divergences this compares favorably with the static algorithm that ignores all samples and compresses down to H(Q) + D(P || Q) bits, while not requiring T_epsilon * K iterations that it would take players to develop optimal but separate compressions for each pair of players. Along the way we introduce a "data-structural" view of the task of communicating with a natural language and show that our natural algorithm can also be implemented by an efficient data structure, whose storage is comparable to the storage requirements of Q and whose query complexity is comparable to the lengths of the message to be compressed. Our results give a plausible mathematical analogy to the mechanisms by which human languages get created and evolve, and in particular highlights the possibility of coordination towards a joint task (agreeing on a language) while engaging in distributed learning.
  • Distributed Compression
  • Communication
  • Language Evolution
  • Isolating Hash Families


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Douglas Adams. The Hitchhiker’s Guide to the Galaxy #1. Del Rey, 1979. Google Scholar
  2. Noam Chomsky. Reflections on language. New York, 3, 1975. Google Scholar
  3. Noam Chomsky. Rules and representations. Behavioral and brain sciences, 3(01):1-15, 1980. Google Scholar
  4. Elad Haramaty and Madhu Sudan. Deterministic compression with uncertain priors. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 377-386. ACM, 2014. Google Scholar
  5. Marc D Hauser, Noam Chomsky, and W Tecumseh Fitch. The faculty of language: what is it, who has it, and how did it evolve? science, 298(5598):1569-1579, 2002. Google Scholar
  6. Brendan Juba, Adam Tauman Kalai, Sanjeev Khanna, and Madhu Sudan. Compression without a common prior: an information-theoretic justification for ambiguity in language. In Innovations in Computer Science - ICS, pages 79-86. Tsinghua University Press, 2011. Google Scholar
  7. Simon Kirby, Mike Dowman, and Thomas L Griffiths. Innateness and culture in the evolution of language. Proceedings of the National Academy of Sciences, 104(12):5241-5245, 2007. Google Scholar
  8. Erez Lieberman, Jean-Baptiste Michel, Joe Jackson, Tina Tang, and Martin A Nowak. Quantifying the evolutionary dynamics of language. Nature, 449(7163):713-716, 2007. Google Scholar
  9. Martin A Nowak. Evolutionary biology of language. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 355(1403):1615-1622, 2000. Google Scholar
  10. Martin A Nowak and Natalia L Komarova. Towards an evolutionary theory of language. Trends in cognitive sciences, 5(7):288-295, 2001. Google Scholar
  11. Martin A Nowak, Natalia L Komarova, and Partha Niyogi. Computational and evolutionary aspects of language. Nature, 417(6889):611-617, 2002. Google Scholar
  12. Martin A Nowak and David C Krakauer. The evolution of language. Proceedings of the National Academy of Sciences, 96(14):8028-8033, 1999. Google Scholar
  13. Martin A Nowak, Joshua B Plotkin, and David C Krakauer. The evolutionary language game. Journal of Theoretical Biology, 200(2):147-162, 1999. Google Scholar
  14. Steven Pinker and Paul Bloom. Natural language and natural selection. Behavioral and brain sciences, 13(04):707-727, 1990. Google Scholar
  15. Joshua B Plotkin and Martin A Nowak. Language evolution and information theory. Journal of Theoretical Biology, 205(1):147-159, 2000. Google Scholar
  16. Giacomo Rizzolatti and Michael A Arbib. Language within our grasp. Trends in neurosciences, 21(5):188-194, 1998. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail