2 Search Results for "K�rkk�inen, Tommi"


Document
Command Similarity Measurement Using NLP

Authors: Zafar Hussain, Jukka K. Nurminen, Tommi Mikkonen, and Marcin Kowiel

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)


Abstract
Process invocations happen with almost every activity on a computer. To distinguish user input and potentially malicious activities, we need to better understand program invocations caused by commands. To achieve this, one must understand commands’ objectives, possible parameters, and valid syntax. In this work, we collected commands’ data by scrapping commands’ manual pages, including command description, syntax, and parameters. Then, we measured command similarity using two of these - description and parameters - based on commands' natural language documentation. We used Term Frequency-Inverse Document Frequency (TFIDF) of a word to compare the commands, followed by measuring cosine similarity to find a similarity of commands’ description. For parameters, after measuring TFIDF and cosine similarity, the Hungarian method is applied to solve the assignment of different parameters’ combinations. Finally, commands are clustered based on their similarity scores. The results show that these methods have efficiently clustered the commands in smaller groups (commands with aliases or close counterparts), and in a bigger group (commands belonging to a larger set of related commands, e.g., bitsadmin for Windows and systemd for Linux). To validate the clustering results, we applied topic modeling on the commands' data, which confirms that 84% of the Windows commands and 98% ofthe Linux commands are clustered correctly.

Cite as

Zafar Hussain, Jukka K. Nurminen, Tommi Mikkonen, and Marcin Kowiel. Command Similarity Measurement Using NLP. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 13:1-13:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{hussain_et_al:OASIcs.SLATE.2021.13,
  author =	{Hussain, Zafar and Nurminen, Jukka K. and Mikkonen, Tommi and Kowiel, Marcin},
  title =	{{Command Similarity Measurement Using NLP}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{13:1--13:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.13},
  URN =		{urn:nbn:de:0030-drops-144305},
  doi =		{10.4230/OASIcs.SLATE.2021.13},
  annote =	{Keywords: Natural Language Processing, NLP, Windows Commands, Linux Commands, Textual Similarity, Command Term Frequency, Inverse Document Frequency, TFIDF, Cosine Similarity, Linear Sum Assignment, Command Clustering}
}
Document
Feature Extractors for Describing Vehicle Routing Problem Instances

Authors: Jussi Rasku, Tommi Kärkkäinen, and Nysret Musliu

Published in: OASIcs, Volume 50, 5th Student Conference on Operational Research (SCOR 2016)


Abstract
The vehicle routing problem comes in varied forms. In addition to usual variants with diverse constraints and specialized objectives, the problem instances themselves – even from a single shared source - can be distinctly different. Heuristic, metaheuristic, and hybrid algorithms that are typically used to solve these problems are sensitive to this variation and can exhibit erratic performance when applied on new, previously unseen instances. To mitigate this, and to improve their applicability, algorithm developers often choose to expose parameters that allow customization of the algorithm behavior. Unfortunately, finding a good set of values for these parameters can be a tedious task that requires extensive experimentation and experience. By deriving descriptors for the problem classes and instances, one would be able to apply learning and adaptive methods that, when taught, can effectively exploit the idiosyncrasies of a problem instance. Furthermore, these methods can generalize from previously learnt knowledge by inferring suitable values for these parameters. As a necessary intermediate step towards this goal, we propose a set of feature extractors for vehicle routing problems. The descriptors include dimensionality of the problem; statistical descriptors of distances, demands, etc.; clusterability of the vertex locations; and measures derived using fitness landscape analysis. We show the relevancy of these features by performing clustering on classical problem instances and instance-specific algorithm configuration of vehicle routing metaheuristics.

Cite as

Jussi Rasku, Tommi Kärkkäinen, and Nysret Musliu. Feature Extractors for Describing Vehicle Routing Problem Instances. In 5th Student Conference on Operational Research (SCOR 2016). Open Access Series in Informatics (OASIcs), Volume 50, pp. 7:1-7:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{rasku_et_al:OASIcs.SCOR.2016.7,
  author =	{Rasku, Jussi and K\"{a}rkk\"{a}inen, Tommi and Musliu, Nysret},
  title =	{{Feature Extractors for Describing Vehicle Routing Problem Instances}},
  booktitle =	{5th Student Conference on Operational Research (SCOR 2016)},
  pages =	{7:1--7:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-004-0},
  ISSN =	{2190-6807},
  year =	{2016},
  volume =	{50},
  editor =	{Hardy, Bradley and Qazi, Abroon and Ravizza, Stefan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SCOR.2016.7},
  URN =		{urn:nbn:de:0030-drops-65193},
  doi =		{10.4230/OASIcs.SCOR.2016.7},
  annote =	{Keywords: Metaheuristics, Vehicle Routing Problem, Feature extraction, Unsupervised learning, Automatic Algorithm Configuration}
}
  • Refine by Author
  • 1 Hussain, Zafar
  • 1 Kowiel, Marcin
  • 1 Kärkkäinen, Tommi
  • 1 Mikkonen, Tommi
  • 1 Musliu, Nysret
  • Show More...

  • Refine by Classification
  • 1 Computing methodologies → Natural language processing

  • Refine by Keyword
  • 1 Automatic Algorithm Configuration
  • 1 Command Clustering
  • 1 Command Term Frequency
  • 1 Cosine Similarity
  • 1 Feature extraction
  • Show More...

  • Refine by Type
  • 2 document

  • Refine by Publication Year
  • 1 2016
  • 1 2021

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail