4 Search Results for "Anderson, Paul"


Document
Media Forensics and the Challenge of Big Data (Dagstuhl Seminar 23021)

Authors: Irene Amerini, Anderson Rocha, Paul L. Rosin, and Xianfang Sun

Published in: Dagstuhl Reports, Volume 13, Issue 1 (2023)


Abstract
With demanding and sophisticated crimes and terrorist threats becoming more pervasive, allied with the advent and widespread of fake news, it becomes paramount to design and develop objective and scientific-based criteria to identify the characteristics of investigated materials associated with potential criminal activities. We need effective approaches to help us answer the four most important questions in forensics regarding an event: "who," "in what circumstances," "why," and "how." In recent years, the rise of social media has resulted in a flood of media content. As well as providing a challenge due to the increase in data that needs fact-checking, it also allows leveraging big-data techniques for forensic analysis. The seminar included sessions on traditional, deep learning-based methods, big data, benchmark and performance evaluation, applications, and future directions. It aimed to orchestrate the research community’s efforts in such a way that we harness different tools to fight misinformation and the spread of fake content.

Cite as

Irene Amerini, Anderson Rocha, Paul L. Rosin, and Xianfang Sun. Media Forensics and the Challenge of Big Data (Dagstuhl Seminar 23021). In Dagstuhl Reports, Volume 13, Issue 1, pp. 1-35, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@Article{amerini_et_al:DagRep.13.1.1,
  author =	{Amerini, Irene and Rocha, Anderson and Rosin, Paul L. and Sun, Xianfang},
  title =	{{Media Forensics and the Challenge of Big Data (Dagstuhl Seminar 23021)}},
  pages =	{1--35},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2023},
  volume =	{13},
  number =	{1},
  editor =	{Amerini, Irene and Rocha, Anderson and Rosin, Paul L. and Sun, Xianfang},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagRep.13.1.1},
  URN =		{urn:nbn:de:0030-drops-191177},
  doi =		{10.4230/DagRep.13.1.1},
  annote =	{Keywords: Digital forensics, Image and video authentication, Image and video forensics, Image and video forgery detection, Tampering detection}
}
Document
muPuppet: A Declarative Subset of the Puppet Configuration Language

Authors: Weili Fu, Roly Perera, Paul Anderson, and James Cheney

Published in: LIPIcs, Volume 74, 31st European Conference on Object-Oriented Programming (ECOOP 2017)


Abstract
Puppet is a popular declarative framework for specifying and managing complex system configurations. The Puppet framework includes a domain-specific language with several advanced features inspired by object-oriented programming, including user-defined resource types, ‘classes’ with a form of inheritance, and dependency management. Like most real-world languages, the language has evolved in an ad hoc fashion, resulting in a design with numerous features, some of which are complex, hard to understand, and difficult to use correctly. We present an operational semantics for muPuppet, a representative subset of the Puppet language that covers the distinctive features of Puppet, while excluding features that are either deprecated or work-in-progress. Formalising the semantics sheds light on difficult parts of the language, identifies opportunities for future improvements, and provides a foundation for future analysis or debugging techniques, such as static typechecking or provenance tracking. Our semantics leads straightforwardly to a reference implementation in Haskell. We also discuss some of Puppet’s idiosyncrasies, particularly its handling of classes and scope, and present an initial corpus of test cases supported by our formal semantics.

Cite as

Weili Fu, Roly Perera, Paul Anderson, and James Cheney. muPuppet: A Declarative Subset of the Puppet Configuration Language. In 31st European Conference on Object-Oriented Programming (ECOOP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 74, pp. 12:1-12:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{fu_et_al:LIPIcs.ECOOP.2017.12,
  author =	{Fu, Weili and Perera, Roly and Anderson, Paul and Cheney, James},
  title =	{{muPuppet: A Declarative Subset of the Puppet Configuration Language}},
  booktitle =	{31st European Conference on Object-Oriented Programming (ECOOP 2017)},
  pages =	{12:1--12:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-035-4},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{74},
  editor =	{M\"{u}ller, Peter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2017.12},
  URN =		{urn:nbn:de:0030-drops-72656},
  doi =		{10.4230/LIPIcs.ECOOP.2017.12},
  annote =	{Keywords: configuration languages, Puppet, operational semantics}
}
Document
Tagging Historical Corpora - the problem of spelling variation

Authors: Paul Rayson, Dawn Archer, Alistair Baron, and Nicholas Smith

Published in: Dagstuhl Seminar Proceedings, Volume 6491, Digital Historical Corpora- Architecture, Annotation, and Retrieval (2007)


Abstract
Spelling issues tend to create relatively minor (though still complex) problems for corpus linguistics, information retrieval and natural language processing tasks that use "standard" or modern varieties of English. For example, in corpus annotation, we have to decide how to deal with tokenisation issues such as whether (i) periods represent sentence boundaries or acronyms and (ii) apostrophes represent quote marks or contractions (Grefenstette and Tapanainen, 1994; Grefenstette, 1999). The issue of spelling variation becomes more problematic when utilising corpus linguistic techniques on non-standard varieties of English, not least because variation can be due to differences in spelling habits, transcription or compositing practices, and morpho-syntactic customs, as well as "misspelling". Examples of non-standard varieties include: - Scottish English1 (Anderson et al., forthcoming), and dialects such as Tyneside English2 (Allen et al., forthcoming) - Early Modern English (Archer and Rayson, 2004; Culpeper and Kytö, 2005) - Emerging varieties such as SMS or CMC in weblogs (Ooi et al., 2006) In the Dagstuhl workshop we focussed on historical corpora. Vast quantities of searchable historical material are being created in electronic form through large digitisation initiatives already underway e.g. Open Content Alliance3, Google Book Search4, and Early English Books Online5. Annotation, typically at the part-of-speech (POS) level, is carried out on modern corpora for linguistic analysis, information retrieval and natural language processing tasks such as named entity extraction. Increasingly researchers wish to carry out similar tasks on historical data (Nissim et al, 2004). However, historical data is considered noisy for tasks such as this. The problems faced when applying corpus annotation tools trained on modern language data to historical texts are the motivation for the research described in this paper. Previous research has adopted an approach of adding historical variants to the POS tagger lexicon, for example in TreeTagger annotation of GerManC (Durrell et al, 2006), or "back-dating" the lexicon in the Constraint Grammar Parser of English (ENGCG) when annotating the Helsinki corpus (Kytö and Voutilainen, 1995). Our aim was to develop an historical semantic tagger in order to facilitate similar studies on historical data to those that we had previously been performing on modern data using the USAS semantic analysis system (Rayson et al, 2004). The USAS tool relies on POS tagging as a prerequisite to carrying out semantic disambiguation. Hence we were faced with the task of retraining or back-dating two tools, a POS tagger and a semantic tagger. Our proposed solution incorporates a corpus pre-processor for detecting historical spelling variants and inserting modern equivalents alongside them. This enables retrieval as well as annotation tasks and to some extent avoids the need to retrain each annotation tool that is applied to the corpus. The modern tools can then be applied to the modern spelling equivalents rather than the historical variants, and thereby achieve higher levels of accuracy. The resulting variant detector tool (VARD) employs a number of techniques derived from spell-checking tools as we wished to evaluate their applicability to historical data. The current version of the tool uses known-variant lists, SoundEx, edit distance and letter replacement heuristics to match Early Modern English variants with modern forms. The techniques are combined using a scoring mechanism to enable preferred candidates to be selected using likelihood values. The current known-variant lists and letter replacement rules are manually created. In a cross-language study with English and German texts we found that similar techniques could be used to derive letter replacement heuristics from corpus examples (Pilz et al, forthcoming). Our experiments show that VARD can successfully deal with: - Apostrophes signalling missing letter(s) or sound(s): ’fore ("before"), hee’l ("he will"), - Irregular apostrophe usage: again’st ("against"), whil’st ("whilst") - Contracted forms: ’tis("it is"), thats ("that is"), youle ("you will"), t’anticipate ("to anticipate") - Hyphenated forms: acquain-tance ("acquaintance") - Variation due to different use of graphs: <v>, <u>, <i>, <y>: aboue ("above"), abyde ("abide") - Doubling of vowels and consonants -e.g. <-oo-><-ll>: triviall ("trivial") By direct comparison, variants that are not in the modern lexicon are easy to identify, however, our studies show that a significant portion of variants cannot be discovered this way. Inconsistencies in the use of the genitive, and "then" appearing instead of "than" or vice versa require contextual information to be used in their detection. We will outline our approach to resolving this problem, by the use of contextually-sensitive template rules that contain lexical, grammatical and semantic information. Footnotes 1 http://www.scottishcorpus.ac.uk/ 2 http://www.ncl.ac.uk/necte/ 3 http://www.opencontentalliance.org/ 4 http://books.google.com/ 5 http://eebo.chadwyck.com/home

Cite as

Paul Rayson, Dawn Archer, Alistair Baron, and Nicholas Smith. Tagging Historical Corpora - the problem of spelling variation. In Digital Historical Corpora- Architecture, Annotation, and Retrieval. Dagstuhl Seminar Proceedings, Volume 6491, pp. 1-2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)


Copy BibTex To Clipboard

@InProceedings{rayson_et_al:DagSemProc.06491.15,
  author =	{Rayson, Paul and Archer, Dawn and Baron, Alistair and Smith, Nicholas},
  title =	{{Tagging Historical Corpora - the problem of spelling variation}},
  booktitle =	{Digital Historical Corpora- Architecture, Annotation, and Retrieval},
  pages =	{1--2},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6491},
  editor =	{Lou Burnard and Milena Dobreva and Norbert Fuhr and Anke L\"{u}deling},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06491.15},
  URN =		{urn:nbn:de:0030-drops-10553},
  doi =		{10.4230/DagSemProc.06491.15},
  annote =	{Keywords: Corpus annotation, spelling variation, historical variants}
}
Document
Subjectivity in Clone Judgment: Can We Ever Agree?

Authors: Cory Kapser, Paul Anderson, Michael Godfrey, Rainer Koschke, Matthias Rieger, Filip van Rysselberghe, and Peter Weißgerber

Published in: Dagstuhl Seminar Proceedings, Volume 6301, Duplication, Redundancy, and Similarity in Software (2007)


Abstract
An objective definition of what a code clone is currently eludes the field. A small study was performed at an international workshop to elicit judgments and discussions from world experts regarding what characteristics define a code clone. Less than half of the clone candidates judged had 80% agreement amongst the judges. Judges appeared to differ primarily in their criteria for judgment rather than their interpretation of the clone candidates. In subsequent open discussion the judges provided several reasons for their judgments. The study casts additional doubt on the reliability of experimental results in the field when the full criterion for clone judgment is not spelled out.

Cite as

Cory Kapser, Paul Anderson, Michael Godfrey, Rainer Koschke, Matthias Rieger, Filip van Rysselberghe, and Peter Weißgerber. Subjectivity in Clone Judgment: Can We Ever Agree?. In Duplication, Redundancy, and Similarity in Software. Dagstuhl Seminar Proceedings, Volume 6301, pp. 1-5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)


Copy BibTex To Clipboard

@InProceedings{kapser_et_al:DagSemProc.06301.12,
  author =	{Kapser, Cory and Anderson, Paul and Godfrey, Michael and Koschke, Rainer and Rieger, Matthias and van Rysselberghe, Filip and Wei{\ss}gerber, Peter},
  title =	{{Subjectivity in Clone Judgment:  Can We Ever Agree?}},
  booktitle =	{Duplication, Redundancy, and Similarity in Software},
  pages =	{1--5},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6301},
  editor =	{Rainer Koschke and Ettore Merlo and Andrew Walenstein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06301.12},
  URN =		{urn:nbn:de:0030-drops-9701},
  doi =		{10.4230/DagSemProc.06301.12},
  annote =	{Keywords: Code clone, study, inter-rater agreement, ill-defined problem}
}
  • Refine by Author
  • 2 Anderson, Paul
  • 1 Amerini, Irene
  • 1 Archer, Dawn
  • 1 Baron, Alistair
  • 1 Cheney, James
  • Show More...

  • Refine by Classification
  • 1 Applied computing → Computer forensics
  • 1 Computing methodologies → Image manipulation

  • Refine by Keyword
  • 1 Code clone
  • 1 Corpus annotation
  • 1 Digital forensics
  • 1 Image and video authentication
  • 1 Image and video forensics
  • Show More...

  • Refine by Type
  • 4 document

  • Refine by Publication Year
  • 2 2007
  • 1 2017
  • 1 2023

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail