DROPS

Document

DOI: 10.4230/LIPIcs.ECOOP.2023.27

An Efficient Vectorized Hash Table for Batch Computations

Authors: Hesam Shahrokhi and Amir Shaikhha

Published in: LIPIcs, Volume 263, 37th European Conference on Object-Oriented Programming (ECOOP 2023)

Abstract

In recent years, the increasing demand for high-performance analytics on big data has led the research on batch hash tables. It is shown that this type of hash table can benefit from the cache locality and multi-threading more than ordinary hash tables. Moreover, the batch design for hash tables is amenable to using advanced features of modern processors such as prefetching and SIMD vectorization. While state-of-the-art research and open-source projects on batch hash tables made efforts to propose improved designs by better usage of mentioned hardware features, their approaches still do not fully exploit the existing opportunities for performance improvements. Furthermore, there is a gap for a high-level batch API of such hash tables for wider adoption of these high-performance data structures. In this paper, we present Vec-HT, a parallel, SIMD-vectorized, and prefetching-enabled hash table for fast batch processing. To allow developers to fully take advantage of its performance, we recommend a high-level batch API design. Our experimental results show the superiority and competitiveness of this approach in comparison with the alternative implementations and state-of-the-art for the data-intensive workloads of relational join processing, set operations, and sparse vector processing.

Cite as

Hesam Shahrokhi and Amir Shaikhha. An Efficient Vectorized Hash Table for Batch Computations. In 37th European Conference on Object-Oriented Programming (ECOOP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 263, pp. 27:1-27:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{shahrokhi_et_al:LIPIcs.ECOOP.2023.27,
  author =	{Shahrokhi, Hesam and Shaikhha, Amir},
  title =	{{An Efficient Vectorized Hash Table for Batch Computations}},
  booktitle =	{37th European Conference on Object-Oriented Programming (ECOOP 2023)},
  pages =	{27:1--27:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-281-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{263},
  editor =	{Ali, Karim and Salvaneschi, Guido},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2023.27},
  URN =		{urn:nbn:de:0030-drops-182203},
  doi =		{10.4230/LIPIcs.ECOOP.2023.27},
  annote =	{Keywords: Hash tables, Vectorization, Parallelization, Prefetching}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2023.28

Hinted Dictionaries: Efficient Functional Ordered Sets and Maps

Authors: Amir Shaikhha, Mahdi Ghorbani, and Hesam Shahrokhi

Published in: LIPIcs, Volume 263, 37th European Conference on Object-Oriented Programming (ECOOP 2023)

Abstract

This paper introduces hinted dictionaries for expressing efficient ordered sets and maps functionally. As opposed to the traditional ordered dictionaries with logarithmic operations, hinted dictionaries can achieve better performance by using cursor-like objects referred to as hints. Hinted dictionaries unify the interfaces of imperative ordered dictionaries (e.g., C++ maps) and functional ones (e.g., Adams' sets). We show that such dictionaries can use sorted arrays, unbalanced trees, and balanced trees as their underlying representations. Throughout the paper, we use Scala to present the different components of hinted dictionaries. We also provide a C++ implementation to evaluate the effectiveness of hinted dictionaries. Hinted dictionaries provide superior performance for set-set operations in comparison with the standard library of C++. Also, they show a competitive performance in comparison with the SciPy library for sparse vector operations.

Cite as

Amir Shaikhha, Mahdi Ghorbani, and Hesam Shahrokhi. Hinted Dictionaries: Efficient Functional Ordered Sets and Maps. In 37th European Conference on Object-Oriented Programming (ECOOP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 263, pp. 28:1-28:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{shaikhha_et_al:LIPIcs.ECOOP.2023.28,
  author =	{Shaikhha, Amir and Ghorbani, Mahdi and Shahrokhi, Hesam},
  title =	{{Hinted Dictionaries: Efficient Functional Ordered Sets and Maps}},
  booktitle =	{37th European Conference on Object-Oriented Programming (ECOOP 2023)},
  pages =	{28:1--28:30},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-281-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{263},
  editor =	{Ali, Karim and Salvaneschi, Guido},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2023.28},
  URN =		{urn:nbn:de:0030-drops-182214},
  doi =		{10.4230/LIPIcs.ECOOP.2023.28},
  annote =	{Keywords: Functional Collections, Ordered Dictionaries, Sparse Linear Algebra}
}

Document

Extended Abstract

DOI: 10.4230/LIPIcs.ECOOP.2022.33

Hinted Dictionaries: Efficient Functional Ordered Sets and Maps (Extended Abstract)

Authors: Amir Shaikhha, Mahdi Ghorbani, and Hesam Shahrokhi

Published in: LIPIcs, Volume 222, 36th European Conference on Object-Oriented Programming (ECOOP 2022)

Abstract

Sets and maps are two essential collection types for programming used widely in data analytics [Shaikhha et al., 2022]. The underlying implementation for both are normally based on 1) hash tables or 2) ordered data structures. The former provides (average-case) constant-time lookup, insertion, and deletion operations, while the latter performs these operations in a logarithmic time. The trade-off between these two approaches has been heavily investigated in systems communities [Kim et al., 2009]. An important class of operations are those dealing with two collection types, such as set-set-union or the merge of two maps. One of the main advantages of hash-based implementations is a straightforward implementation for such operations with a linear computational complexity. However, naïvely using ordered dictionaries results in an implementation with a computational complexity of O(n log(n)). Motivating Example. The following C++ code computes the intersection of two sets, implemented by std::unordered_set, a hash-table-based set: std::unordered_set<K> result; for(auto& e : set1) { if(set2.count(e)) result.emplace(e); } However, the same fact is not true for ordered data structures; changing the dictionary type to std::set, an ordered implementation, results in a program with O(n log(n)) computational complexity. This is because both the count (lookup) and emplace (insertion) methods have logarithmic computational complexity. As a partial remedy, the standard library of C++ provides an alternative insertion method that can take linear time, if used appropriately. The emplace_hint method takes a hint for the position that the element will be inserted. If the hint correctly specifies the insertion point, the computational complexity will be amortized to constant time. std::set<K> result; auto hint = result.begin(); for(auto& e : set1) { if(set2.count(e)) hint = result.emplace_hint(hint, e); } However, the above implementation still suffers from an O(n log(n)) computational complexity, due to the logarithmic computational complexity of the lookup operation (count) of the second set. Thanks to the orderedness of the second set, one can observe that once an element is looked up, there is no longer any need to search its preceding elements at the next iterations. By leveraging this feature, we can provide a hinted lookup method with an amortized constant run-time. Hinted Data Structures. The following code, shows an alternative implementation for set intersection that uses such hinted lookup operations: hinted_set<K> result; hinted_set<K>::hint_t hint = result.begin(); for(auto& e : set1) { hinted_set<K>::hint_t hint2 = set2.seek(e); if(hint2.found) hint = result.insert_hint(hint, e); set2.after(hint2); } The above hinted set data-structure enables faster insertion and lookup by providing a cursor through a hint object (of type hint_t). The seek method returns the hint object hint2 pointing to element e. Thanks to the invocation of set2.after(hint2), the irrelevant elements of set2 (which are smaller than e) are no longer considered in the next iterations. The expression hint2.found specifies if the element exists in set2 or not. Finally, if an element exists in the second set (specified by hint2.found), it is inserted into its correct position using insert_hint. The existing work on efficient ordered dictionaries can be divided into two categories. First, in the imperative world, there are C++ ordered dictionaries (e.g., std::map) with limited hinting capabilities only for insertion through emplace_hint, but not for deletion and lookup, as observed previously. Second, from the functional world, Adams' sets [Adams, 1993] provide efficient implementations for set-set operators. Functional languages such as Haskell have implemented ordered sets and maps based on them for more than twenty years [Straka, 2010]. Furthermore, it has been shown [Blelloch et al., 2016] that Adams' maps can be used to provide a parallel implementation for balanced trees such as AVL, Red-Black, Weight-Balanced, and Treaps. However, Adams' maps do not expose any hint-based operations to the programmer. At first glance, these two approaches seem completely irrelevant to each other. The key contribution of this paper is hinted dictionaries, an ordered data structure that unifies the techniques from both imperative and functional worlds. The essential building block of hinted dictionaries are hint objects, that enable faster operations (than the traditional O(log n) complexity) by maintaining a pointer into the data structure. The underlying representation for hinted dictionaries can be sorted arrays, unbalanced trees, and balanced trees by sharing the same interface. In our running example, alternative data structures can be provided by simply changing the type signature of the hinted set from hinted_set to another implementation, without modifying anything else.

Cite as

Amir Shaikhha, Mahdi Ghorbani, and Hesam Shahrokhi. Hinted Dictionaries: Efficient Functional Ordered Sets and Maps (Extended Abstract). In 36th European Conference on Object-Oriented Programming (ECOOP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 222, pp. 33:1-33:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{shaikhha_et_al:LIPIcs.ECOOP.2022.33,
  author =	{Shaikhha, Amir and Ghorbani, Mahdi and Shahrokhi, Hesam},
  title =	{{Hinted Dictionaries: Efficient Functional Ordered Sets and Maps}},
  booktitle =	{36th European Conference on Object-Oriented Programming (ECOOP 2022)},
  pages =	{33:1--33:3},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-225-9},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{222},
  editor =	{Ali, Karim and Vitek, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2022.33},
  URN =		{urn:nbn:de:0030-drops-162610},
  doi =		{10.4230/LIPIcs.ECOOP.2022.33},
  annote =	{Keywords: Functional Collections, Ordered Dictionaries, Sparse Linear Algebra}
}

3 Search Results for "Shahrokhi, Hesam"

An Efficient Vectorized Hash Table for Batch Computations

Abstract

Cite as

Hinted Dictionaries: Efficient Functional Ordered Sets and Maps

Abstract

Cite as

Hinted Dictionaries: Efficient Functional Ordered Sets and Maps (Extended Abstract)

Abstract

Cite as