Towards Scope Detection in Textual Requirements

Authors Ole Magnus Holter , Basil Ell

Thumbnail PDF


  • Filesize: 0.66 MB
  • 15 pages

Document Identifiers

Author Details

Ole Magnus Holter
  • Department of Informatics, University of Oslo, Norway
Basil Ell
  • Department of Informatics, University of Oslo, Norway

Cite AsGet BibTex

Ole Magnus Holter and Basil Ell. Towards Scope Detection in Textual Requirements. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 31:1-31:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Requirements are an integral part of industry operation and projects. Not only do requirements dictate industrial operations, but they are used in legally binding contracts between supplier and purchaser. Some companies even have requirements as their core business. Most requirements are found in textual documents, this brings a couple of challenges such as ambiguity, scalability, maintenance, and finding relevant and related requirements. Having the requirements in a machine-readable format would be a solution to these challenges, however, existing requirements need to be transformed into machine-readable requirements using NLP technology. Using state-of-the-art NLP methods based on end-to-end neural modelling on such documents is not trivial because the language is technical and domain-specific and training data is not available. In this paper, we focus on one step in that direction, namely scope detection of textual requirements using weak supervision and a simple classifier based on BERT general domain word embeddings and show that using openly available data, it is possible to get promising results on domain-specific requirements documents.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Natural language processing
  • Scope Detection
  • Textual requirements
  • NLP


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. 15926browser. (visited on 2021-02-22). URL:
  2. Apache PDFBox | A Java PDF Library. (visited on 2021-12-21).
  3. Gensim. (visited on 2021-02-17).
  4. Google-Research/Bert. (visited on 2021-01-27).
  5. Iso15926 equipment class. (visited on 2021-02-22).
  6. Natural Language Toolkit - NLTK. (visited on 2021-02-08). URL:
  7. INCOSE - guide for writing requirements, 2017. Google Scholar
  8. S. Abualhaija, C Arora, et al. A Machine Learning-Based Approach for Demarcating Requirements in Textual Specifications. In RE 2019, pages 51-62, 2019. Google Scholar
  9. Det Norske Veritas AS. Drilling facilities. Technical report, DNV-OS-E101, Ed. January 2018. © DNV GL. Google Scholar
  10. Det Norske Veritas AS. Rules for classification: Ships. Technical report, DNV-RU-SHIP, Ed. July 2019. © DNV GL. Google Scholar
  11. Det Norske Veritas AS. Floating docks. Technical report, DNVGL-RU-FD, Ed. October 2015. © DNV GL. Google Scholar
  12. Det Norske Veritas AS. Submarine pipeline systems. Technical report, DNV-OS-F101, Ed. October 2017. © DNV GL. Google Scholar
  13. Equinor ASA. Field instrumentation. Technical report, TR3032, Ver 3, August 2011. © Equinor. Google Scholar
  14. H. Bast and C. Korzen. A Benchmark and Evaluation for Text Extraction from PDF. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pages 1-10, 2017. Google Scholar
  15. Agustin Casamayor, Daniela Godoy, and Marcelo Campo. Identification of non-functional requirements in textual specifications: A semi-supervised learning approach. Information and Software Technology, 52(4):436-445, 2010. Google Scholar
  16. Jacob Devlin, Ming-Wei Chang, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv, 2019. URL:
  17. Patric Drouin. TermoStat Web. (visited on 2021-02-17).
  18. Patrick Drouin. Term extraction using non-technical corpora as a point of leverage. Terminology, 9:99-115, 2003. Google Scholar
  19. G. Fantoni, E. Coli, et al. Text mining tool for translating terms of contract into technical specifications: Development and application in the railway sector. Computers in Industry, 124:103357, 2021. Google Scholar
  20. Jeremy Howard and Sebastian Ruder. Universal Language Model Fine-tuning for Text Classification. arXiv, 2018. URL:
  21. IBM. DOORS. (visited on 2021-03-08).
  22. BS ISO. Iso 14224,“petroleum and natural gas industries: collection and exchange of reliability and maintenance data for equipment“. British Standards Institution, UK, 1999. Google Scholar
  23. Menon Economics. Requirements as cost drivers in the Norwegian petroleum industry. (visited on 2021-02-19).
  24. Mike Mintz, Steven Bills, et al. Distant supervision for relation extraction without labeled data. In ACL/AFNLP, volume 2, pages 1003-1011, 2009. Google Scholar
  25. Farhad Nooralahzadeh, Jan Tore Lønning, and Lilja Øvrelid. Reinforcement-based denoising of distantly supervised ner with partial annotation. In DeepLo Workshop, 2019. Google Scholar
  26. Farhad Nooralahzadeh, Lilja Øvrelid, and Jan Tore Lønning. Evaluation of domain-specific word embeddings using knowledge resources. In LREC 2018, 2018. Google Scholar
  27. Alexander Ratner, Stephen H Bach, et al. Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment, 11(3), 2017. Google Scholar
  28. Benedetta Rosadini, Alessio Ferrari, et al. Using NLP to detect requirements defects: An industrial experience in the railway domain. In REFSQ, pages 344-360, 2017. Google Scholar
  29. Sebastian Ruder, Matthew E Peters, Swabha Swayamdipta, and Thomas Wolf. Transfer learning in natural language processing. In NAACL Tutorials, pages 15-18, 2019. Google Scholar
  30. SIEMENS. Polarion REQUIREMENTS. (visited on 2021-03-08).
  31. SIRIUS. DREAM and READI: Cooperation to Manage Digital Requirements. (visited on 2021-03-08).
  32. SIRIUS and DNV GL. On the READI method. Personal communication. Google Scholar
  33. Jonas Winkler and Andreas Vogelsang. Automatic Classification of Requirements Based on Convolutional Neural Networks. In RE Workshops, pages 39-45, 2016. Google Scholar