11 Search Results for "He, Paul"


Document
Semantics for Noninterference with Interaction Trees

Authors: Lucas Silver, Paul He, Ethan Cecchetti, Andrew K. Hirsch, and Steve Zdancewic

Published in: LIPIcs, Volume 263, 37th European Conference on Object-Oriented Programming (ECOOP 2023)


Abstract
Noninterference is the strong information-security property that a program does not leak secrets through publicly-visible behavior. In the presence of effects such as nontermination, state, and exceptions, reasoning about noninterference quickly becomes subtle. We advocate using interaction trees (ITrees) to provide compositional mechanized proofs of noninterference for multi-language, effectful, nonterminating programs, while retaining executability of the semantics. We develop important foundations for security analysis with ITrees: two indistinguishability relations, leading to two standard notions of noninterference with adversaries of different strength, along with metatheory libraries for reasoning about each. We demonstrate the utility of our results using a simple imperative language with embedded assembly, along with a compiler into that assembly language.

Cite as

Lucas Silver, Paul He, Ethan Cecchetti, Andrew K. Hirsch, and Steve Zdancewic. Semantics for Noninterference with Interaction Trees. In 37th European Conference on Object-Oriented Programming (ECOOP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 263, pp. 29:1-29:29, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{silver_et_al:LIPIcs.ECOOP.2023.29,
  author =	{Silver, Lucas and He, Paul and Cecchetti, Ethan and Hirsch, Andrew K. and Zdancewic, Steve},
  title =	{{Semantics for Noninterference with Interaction Trees}},
  booktitle =	{37th European Conference on Object-Oriented Programming (ECOOP 2023)},
  pages =	{29:1--29:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-281-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{263},
  editor =	{Ali, Karim and Salvaneschi, Guido},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2023.29},
  URN =		{urn:nbn:de:0030-drops-182227},
  doi =		{10.4230/LIPIcs.ECOOP.2023.29},
  annote =	{Keywords: verification, information-flow, denotational semantics, monads}
}
Document
We know what you're doing! Application detection using thermal data

Authors: Philipp Miedl, Rehan Ahmed, and Lothar Thiele

Published in: LITES, Volume 7, Issue 1 (2021): Special Issue on Embedded System Security. Leibniz Transactions on Embedded Systems, Volume 7, Issue 1


Abstract
Modern mobile and embedded devices have high computing power which allows them to be used for multiple purposes. Therefore, applications with low security restrictions may execute on the same device as applications handling highly sensitive information. In such a setup, a security risk occurs if it is possible that an application uses system characteristics to gather information about another application on the same device.In this work, we present a method to leak sensitive runtime information by just using temperature sensor readings of a mobile device. We employ a Convolutional-Neural-Network, Long Short-Term Memory units and subsequent label sequence processing to identify the sequence of executed applications over time. To test our hypothesis we collect data from two state-of-the-art smartphones and real user usage patterns. We show an extensive evaluation using laboratory data, where we achieve labelling accuracies up to 90% and negligible timing error. Based on our analysis we state that the thermal information can be used to compromise sensitive user data and increase the vulnerability of mobile devices. A study based on data collected outside of the laboratory opens up various future directions for research.

Cite as

Philipp Miedl, Rehan Ahmed, and Lothar Thiele. We know what you're doing! Application detection using thermal data. In LITES, Volume 7, Issue 1 (2021): Special Issue on Embedded System Security. Leibniz Transactions on Embedded Systems, Volume 7, Issue 1, pp. 02:1-02:28, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@Article{miedl_et_al:LITES.7.1.2,
  author =	{Miedl, Philipp and Ahmed, Rehan and Thiele, Lothar},
  title =	{{We know what you're doing! Application detection using thermal data}},
  booktitle =	{LITES, Volume 7, Issue 1 (2021): Special Issue on Embedded System Security},
  pages =	{02:1--02:28},
  journal =	{Leibniz Transactions on Embedded Systems},
  ISSN =	{2199-2002},
  year =	{2021},
  volume =	{7},
  number =	{1},
  editor =	{Miedl, Philipp and Ahmed, Rehan and Thiele, Lothar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LITES.7.1.2},
  doi =		{10.4230/LITES.7.1.2},
  annote =	{Keywords: Thermal Monitoring, Side Channel, Data Leak, Sequence Labelling}
}
Document
Non-Rectangular Convolutions and (Sub-)Cadences with Three Elements

Authors: Mitsuru Funakoshi and Julian Pape-Lange

Published in: LIPIcs, Volume 154, 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)


Abstract
The discrete acyclic convolution computes the 2n+1 sums ∑_{i+j=k|(i,j)∈[0,1,2,… ,n]²} a_i b_j in ?(n log n) time. By using suitable offsets and setting some of the variables to zero, this method provides a tool to calculate all non-zero sums ∑_{i+j=k|(i,j)∈ P∩ℤ²} a_i b_j in a rectangle P with perimeter p in ?(p log p) time. This paper extends this geometric interpretation in order to allow arbitrary convex polygons P with k vertices and perimeter p. Also, this extended algorithm only needs ?(k + p(log p)² log k) time. Additionally, this paper presents fast algorithms for counting sub-cadences and cadences with 3 elements using this extended method.

Cite as

Mitsuru Funakoshi and Julian Pape-Lange. Non-Rectangular Convolutions and (Sub-)Cadences with Three Elements. In 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 154, pp. 30:1-30:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{funakoshi_et_al:LIPIcs.STACS.2020.30,
  author =	{Funakoshi, Mitsuru and Pape-Lange, Julian},
  title =	{{Non-Rectangular Convolutions and (Sub-)Cadences with Three Elements}},
  booktitle =	{37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
  pages =	{30:1--30:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-140-5},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{154},
  editor =	{Paul, Christophe and Bl\"{a}ser, Markus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2020.30},
  URN =		{urn:nbn:de:0030-drops-118911},
  doi =		{10.4230/LIPIcs.STACS.2020.30},
  annote =	{Keywords: discrete acyclic convolutions, string-cadences, geometric algorithms, number theoretic transforms}
}
Document
Information Distance Revisited

Authors: Bruno Bauwens

Published in: LIPIcs, Volume 154, 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)


Abstract
We consider the notion of information distance between two objects x and y introduced by Bennett, Gács, Li, Vitanyi, and Zurek [C. H. Bennett et al., 1998] as the minimal length of a program that computes x from y as well as computing y from x, and study different versions of this notion. In the above paper, it was shown that the prefix version of information distance equals max (K(x|y),K(y|x)) up to additive logarithmic terms. It was claimed by Mahmud [Mahmud, 2009] that this equality holds up to additive O(1)-precision. We show that this claim is false, but does hold if the distance is at least logarithmic. This implies that the original definition provides a metric on strings that are at superlogarithmically separated.

Cite as

Bruno Bauwens. Information Distance Revisited. In 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 154, pp. 46:1-46:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{bauwens:LIPIcs.STACS.2020.46,
  author =	{Bauwens, Bruno},
  title =	{{Information Distance Revisited}},
  booktitle =	{37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
  pages =	{46:1--46:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-140-5},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{154},
  editor =	{Paul, Christophe and Bl\"{a}ser, Markus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2020.46},
  URN =		{urn:nbn:de:0030-drops-119071},
  doi =		{10.4230/LIPIcs.STACS.2020.46},
  annote =	{Keywords: Kolmogorov complexity, algorithmic information distance}
}
Document
Connected Search for a Lazy Robber

Authors: Isolde Adler, Christophe Paul, and Dimitrios M. Thilikos

Published in: LIPIcs, Volume 150, 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019)


Abstract
The node search game against a lazy (or, respectively, agile) invisible robber has been introduced as a search-game analogue of the treewidth parameter (and, respectively, pathwidth). In the connected variants of the above two games, we additionally demand that, at each moment of the search, the clean territories are connected. The connected search game against an agile and invisible robber has been extensively examined. The monotone variant (where we also demand that the clean territories are progressively increasing) of this game, corresponds to the graph parameter of connected pathwidth. It is known that the price of connectivty to search for an agile robber is bounded by 2, that is the connected pathwidth of a graph is at most twice (plus some constant) its pathwidth. In this paper, we investigate the connected search game against a lazy robber. A lazy robber moves only when the searchers' strategy threatens the location that he currently occupies. We introduce two alternative graph-theoretic formulations of this game, one in terms of connected tree decompositions, and one in terms of (connected) layouts, leading to the graph parameter of connected treewidth. We observe that connected treewidth parameter is closed under contractions and prove that for every k >= 2, the set of contraction obstructions of the class of graphs with connected treewidth at most k is infinite. Our main result is a complete characterization of the obstruction set for k=2. One may observe that, so far, only a few complete obstruction sets are explicitly known for contraction closed graph classes. We finally show that, in contrast to the agile robber game, the price of connectivity is unbounded.

Cite as

Isolde Adler, Christophe Paul, and Dimitrios M. Thilikos. Connected Search for a Lazy Robber. In 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 150, pp. 7:1-7:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{adler_et_al:LIPIcs.FSTTCS.2019.7,
  author =	{Adler, Isolde and Paul, Christophe and Thilikos, Dimitrios M.},
  title =	{{Connected Search for a Lazy Robber}},
  booktitle =	{39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019)},
  pages =	{7:1--7:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-131-3},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{150},
  editor =	{Chattopadhyay, Arkadev and Gastin, Paul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2019.7},
  URN =		{urn:nbn:de:0030-drops-115695},
  doi =		{10.4230/LIPIcs.FSTTCS.2019.7},
  annote =	{Keywords: Graph searching, Tree-decomposition, Obstructions}
}
Document
Two-Way Parikh Automata

Authors: Emmanuel Filiot, Shibashis Guha, and Nicolas Mazzocchi

Published in: LIPIcs, Volume 150, 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019)


Abstract
Parikh automata extend automata with counters whose values can only be tested at the end of the computation, with respect to membership into a semi-linear set. Parikh automata have found several applications, for instance in transducer theory, as they enjoy a decidable emptiness problem. In this paper, we study two-way Parikh automata. We show that emptiness becomes undecidable in the non-deterministic case. However, it is PSpace-C when the number of visits to any input position is bounded and the semi-linear set is given as an existential Presburger formula. We also give tight complexity bounds for the inclusion, equivalence and universality problems. Finally, we characterise precisely the complexity of those problems when the semi-linear constraint is given by an arbitrary Presburger formula.

Cite as

Emmanuel Filiot, Shibashis Guha, and Nicolas Mazzocchi. Two-Way Parikh Automata. In 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 150, pp. 40:1-40:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{filiot_et_al:LIPIcs.FSTTCS.2019.40,
  author =	{Filiot, Emmanuel and Guha, Shibashis and Mazzocchi, Nicolas},
  title =	{{Two-Way Parikh Automata}},
  booktitle =	{39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019)},
  pages =	{40:1--40:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-131-3},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{150},
  editor =	{Chattopadhyay, Arkadev and Gastin, Paul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2019.40},
  URN =		{urn:nbn:de:0030-drops-116027},
  doi =		{10.4230/LIPIcs.FSTTCS.2019.40},
  annote =	{Keywords: Parikh automata, two-way automata, Presburger arithmetic}
}
Document
Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations

Authors: Chen Dan, Kristoffer Arnsfelt Hansen, He Jiang, Liwei Wang, and Yuchen Zhou

Published in: LIPIcs, Volume 117, 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)


Abstract
Low rank approximation of matrices is an important tool in machine learning. Given a data matrix, low rank approximation helps to find factors, patterns, and provides concise representations for the data. Research on low rank approximation usually focuses on real matrices. However, in many applications data are binary (categorical) rather than continuous. This leads to the problem of low rank approximation of binary matrices. Here we are given a d x n binary matrix A and a small integer k < d. The goal is to find two binary matrices U and V of sizes d x k and k x n respectively, so that the Frobenius norm of A - U V is minimized. There are two models of this problem, depending on the definition of the dot product of binary vectors: The GF(2) model and the Boolean semiring model. Unlike low rank approximation of a real matrix which can be efficiently solved by Singular Value Decomposition, we show that approximation of a binary matrix is NP-hard, even for k=1. In this paper, our main concern is the problem of Column Subset Selection (CSS), in which the low rank matrix U must be formed by k columns of the data matrix, and we are interested in the approximation ratio achievable by CSS for binary matrices. For the GF(2) model, we show that CSS has approximation ratio bounded by k/2+1+k/(2(2^k-1)) and this is asymptotically tight. For the Boolean model, it turns out that CSS is no longer sufficient to obtain a bound. We then develop a Generalized CSS (GCSS) procedure in which the columns of U are generated from Boolean formulas operating bitwise on selected columns of the data matrix. We show that the approximation ratio achieved by GCSS is bounded by 2^(k-1)+1, and argue that an exponential dependency on k is seems inherent.

Cite as

Chen Dan, Kristoffer Arnsfelt Hansen, He Jiang, Liwei Wang, and Yuchen Zhou. Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations. In 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 117, pp. 41:1-41:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{dan_et_al:LIPIcs.MFCS.2018.41,
  author =	{Dan, Chen and Hansen, Kristoffer Arnsfelt and Jiang, He and Wang, Liwei and Zhou, Yuchen},
  title =	{{Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations}},
  booktitle =	{43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)},
  pages =	{41:1--41:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-086-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{117},
  editor =	{Potapov, Igor and Spirakis, Paul and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2018.41},
  URN =		{urn:nbn:de:0030-drops-96239},
  doi =		{10.4230/LIPIcs.MFCS.2018.41},
  annote =	{Keywords: Approximation Algorithms, Low Rank Approximation, Binary Matrices}
}
Document
The Complexity of Transducer Synthesis from Multi-Sequential Specifications

Authors: Léo Exibard, Emmanuel Filiot, and Ismaël Jecker

Published in: LIPIcs, Volume 117, 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)


Abstract
The transducer synthesis problem on finite words asks, given a specification S subseteq I x O, where I and O are sets of finite words, whether there exists an implementation f: I - > O which (1) fulfils the specification, i.e., (i,f(i))in S for all i in I, and (2) can be defined by some input-deterministic (aka sequential) transducer T_f. If such an implementation f exists, the procedure should also output T_f. The realisability problem is the corresponding decision problem. For specifications given by synchronous transducers (which read and write alternately one symbol), this is the finite variant of the classical synthesis problem on omega-words, solved by Büchi and Landweber in 1969, and the realisability problem is known to be ExpTime-c in both finite and omega-word settings. For specifications given by asynchronous transducers (which can write a batch of symbols, or none, in a single step), the realisability problem is known to be undecidable. We consider here the class of multi-sequential specifications, defined as finite unions of sequential transducers over possibly incomparable domains. We provide optimal decision procedures for the realisability problem in both the synchronous and asynchronous setting, showing that it is PSpace-c. Moreover, whenever the specification is realisable, we expose the construction of a sequential transducer that realises it and has a size that is doubly exponential, which we prove to be optimal.

Cite as

Léo Exibard, Emmanuel Filiot, and Ismaël Jecker. The Complexity of Transducer Synthesis from Multi-Sequential Specifications. In 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 117, pp. 46:1-46:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{exibard_et_al:LIPIcs.MFCS.2018.46,
  author =	{Exibard, L\'{e}o and Filiot, Emmanuel and Jecker, Isma\"{e}l},
  title =	{{The Complexity of Transducer Synthesis from Multi-Sequential Specifications}},
  booktitle =	{43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)},
  pages =	{46:1--46:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-086-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{117},
  editor =	{Potapov, Igor and Spirakis, Paul and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2018.46},
  URN =		{urn:nbn:de:0030-drops-96286},
  doi =		{10.4230/LIPIcs.MFCS.2018.46},
  annote =	{Keywords: Transducers, Multi-Sequentiality, Synthesis}
}
Document
On Randomized Generation of Slowly Synchronizing Automata

Authors: Costanza Catalano and Raphaël M. Jungers

Published in: LIPIcs, Volume 117, 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)


Abstract
Motivated by the randomized generation of slowly synchronizing automata, we study automata made of permutation letters and a merging letter of rank n-1 . We present a constructive randomized procedure to generate synchronizing automata of that kind with (potentially) large alphabet size based on recent results on primitive sets of matrices. We report numerical results showing that our algorithm finds automata with much larger reset threshold than a mere uniform random generation and we present new families of automata with reset threshold of Omega(n^2/4) . We finally report theoretical results on randomized generation of primitive sets of matrices: a set of permutation matrices with a 0 entry changed into a 1 is primitive and has exponent of O(n log n) with high probability in case of uniform random distribution and the same holds for a random set of binary matrices where each entry is set, independently, equal to 1 with probability p and equal to 0 with probability 1-p , when np-log n - > infty as n - > infty .

Cite as

Costanza Catalano and Raphaël M. Jungers. On Randomized Generation of Slowly Synchronizing Automata. In 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 117, pp. 48:1-48:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{catalano_et_al:LIPIcs.MFCS.2018.48,
  author =	{Catalano, Costanza and Jungers, Rapha\"{e}l M.},
  title =	{{On Randomized Generation of Slowly Synchronizing Automata}},
  booktitle =	{43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018)},
  pages =	{48:1--48:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-086-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{117},
  editor =	{Potapov, Igor and Spirakis, Paul and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2018.48},
  URN =		{urn:nbn:de:0030-drops-96305},
  doi =		{10.4230/LIPIcs.MFCS.2018.48},
  annote =	{Keywords: Synchronizing automata, random automata, Cern\'{y} conjecture, automata with simple idempotents, primitive sets of matrices}
}
Document
09181 Working Group on Hybridization between R&S, DoE and Optimization

Authors: Chun-Hung Chen, Liu Hong, Paul B. Kantor, David P. Morton, Juta Pichitlamken, and Matthias Seeger

Published in: Dagstuhl Seminar Proceedings, Volume 9181, Sampling-based Optimization in the Presence of Uncertainty (2009)


Abstract
This is the report of the working group on the relation between, or hybrid combination of design experiment optimization and R&S. The rapporteur, Paul Kantor, learned a great deal at the conference which he summarized by sharing the cartoon shown here. ("A student asking the teacher'... may i be excused, my is full" (from a 1986 cartoon by Gary Larson) - omitted here for copyright reasons).

Cite as

Chun-Hung Chen, Liu Hong, Paul B. Kantor, David P. Morton, Juta Pichitlamken, and Matthias Seeger. 09181 Working Group on Hybridization between R&S, DoE and Optimization. In Sampling-based Optimization in the Presence of Uncertainty. Dagstuhl Seminar Proceedings, Volume 9181, pp. 1-14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2009)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:DagSemProc.09181.3,
  author =	{Chen, Chun-Hung and Hong, Liu and Kantor, Paul B. and Morton, David P. and Pichitlamken, Juta and Seeger, Matthias},
  title =	{{09181 Working Group on Hybridization between R\&S, DoE and Optimization}},
  booktitle =	{Sampling-based Optimization in the Presence of Uncertainty},
  pages =	{1--14},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2009},
  volume =	{9181},
  editor =	{J\"{u}rgen Branke and Barry L. Nelson and Warren Buckler Powell and Thomas J. Santner},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.09181.3},
  URN =		{urn:nbn:de:0030-drops-21172},
  doi =		{10.4230/DagSemProc.09181.3},
  annote =	{Keywords: }
}
Document
Tagging Historical Corpora - the problem of spelling variation

Authors: Paul Rayson, Dawn Archer, Alistair Baron, and Nicholas Smith

Published in: Dagstuhl Seminar Proceedings, Volume 6491, Digital Historical Corpora- Architecture, Annotation, and Retrieval (2007)


Abstract
Spelling issues tend to create relatively minor (though still complex) problems for corpus linguistics, information retrieval and natural language processing tasks that use "standard" or modern varieties of English. For example, in corpus annotation, we have to decide how to deal with tokenisation issues such as whether (i) periods represent sentence boundaries or acronyms and (ii) apostrophes represent quote marks or contractions (Grefenstette and Tapanainen, 1994; Grefenstette, 1999). The issue of spelling variation becomes more problematic when utilising corpus linguistic techniques on non-standard varieties of English, not least because variation can be due to differences in spelling habits, transcription or compositing practices, and morpho-syntactic customs, as well as "misspelling". Examples of non-standard varieties include: - Scottish English1 (Anderson et al., forthcoming), and dialects such as Tyneside English2 (Allen et al., forthcoming) - Early Modern English (Archer and Rayson, 2004; Culpeper and Kytö, 2005) - Emerging varieties such as SMS or CMC in weblogs (Ooi et al., 2006) In the Dagstuhl workshop we focussed on historical corpora. Vast quantities of searchable historical material are being created in electronic form through large digitisation initiatives already underway e.g. Open Content Alliance3, Google Book Search4, and Early English Books Online5. Annotation, typically at the part-of-speech (POS) level, is carried out on modern corpora for linguistic analysis, information retrieval and natural language processing tasks such as named entity extraction. Increasingly researchers wish to carry out similar tasks on historical data (Nissim et al, 2004). However, historical data is considered noisy for tasks such as this. The problems faced when applying corpus annotation tools trained on modern language data to historical texts are the motivation for the research described in this paper. Previous research has adopted an approach of adding historical variants to the POS tagger lexicon, for example in TreeTagger annotation of GerManC (Durrell et al, 2006), or "back-dating" the lexicon in the Constraint Grammar Parser of English (ENGCG) when annotating the Helsinki corpus (Kytö and Voutilainen, 1995). Our aim was to develop an historical semantic tagger in order to facilitate similar studies on historical data to those that we had previously been performing on modern data using the USAS semantic analysis system (Rayson et al, 2004). The USAS tool relies on POS tagging as a prerequisite to carrying out semantic disambiguation. Hence we were faced with the task of retraining or back-dating two tools, a POS tagger and a semantic tagger. Our proposed solution incorporates a corpus pre-processor for detecting historical spelling variants and inserting modern equivalents alongside them. This enables retrieval as well as annotation tasks and to some extent avoids the need to retrain each annotation tool that is applied to the corpus. The modern tools can then be applied to the modern spelling equivalents rather than the historical variants, and thereby achieve higher levels of accuracy. The resulting variant detector tool (VARD) employs a number of techniques derived from spell-checking tools as we wished to evaluate their applicability to historical data. The current version of the tool uses known-variant lists, SoundEx, edit distance and letter replacement heuristics to match Early Modern English variants with modern forms. The techniques are combined using a scoring mechanism to enable preferred candidates to be selected using likelihood values. The current known-variant lists and letter replacement rules are manually created. In a cross-language study with English and German texts we found that similar techniques could be used to derive letter replacement heuristics from corpus examples (Pilz et al, forthcoming). Our experiments show that VARD can successfully deal with: - Apostrophes signalling missing letter(s) or sound(s): ’fore ("before"), hee’l ("he will"), - Irregular apostrophe usage: again’st ("against"), whil’st ("whilst") - Contracted forms: ’tis("it is"), thats ("that is"), youle ("you will"), t’anticipate ("to anticipate") - Hyphenated forms: acquain-tance ("acquaintance") - Variation due to different use of graphs: <v>, <u>, <i>, <y>: aboue ("above"), abyde ("abide") - Doubling of vowels and consonants -e.g. <-oo-><-ll>: triviall ("trivial") By direct comparison, variants that are not in the modern lexicon are easy to identify, however, our studies show that a significant portion of variants cannot be discovered this way. Inconsistencies in the use of the genitive, and "then" appearing instead of "than" or vice versa require contextual information to be used in their detection. We will outline our approach to resolving this problem, by the use of contextually-sensitive template rules that contain lexical, grammatical and semantic information. Footnotes 1 http://www.scottishcorpus.ac.uk/ 2 http://www.ncl.ac.uk/necte/ 3 http://www.opencontentalliance.org/ 4 http://books.google.com/ 5 http://eebo.chadwyck.com/home

Cite as

Paul Rayson, Dawn Archer, Alistair Baron, and Nicholas Smith. Tagging Historical Corpora - the problem of spelling variation. In Digital Historical Corpora- Architecture, Annotation, and Retrieval. Dagstuhl Seminar Proceedings, Volume 6491, pp. 1-2, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2007)


Copy BibTex To Clipboard

@InProceedings{rayson_et_al:DagSemProc.06491.15,
  author =	{Rayson, Paul and Archer, Dawn and Baron, Alistair and Smith, Nicholas},
  title =	{{Tagging Historical Corpora - the problem of spelling variation}},
  booktitle =	{Digital Historical Corpora- Architecture, Annotation, and Retrieval},
  pages =	{1--2},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6491},
  editor =	{Lou Burnard and Milena Dobreva and Norbert Fuhr and Anke L\"{u}deling},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06491.15},
  URN =		{urn:nbn:de:0030-drops-10553},
  doi =		{10.4230/DagSemProc.06491.15},
  annote =	{Keywords: Corpus annotation, spelling variation, historical variants}
}
  • Refine by Author
  • 2 Filiot, Emmanuel
  • 1 Adler, Isolde
  • 1 Ahmed, Rehan
  • 1 Archer, Dawn
  • 1 Baron, Alistair
  • Show More...

  • Refine by Classification
  • 1 Computing methodologies → Factorization methods
  • 1 Computing methodologies → Number theory algorithms
  • 1 Hardware → Temperature monitoring
  • 1 Mathematics of computing → Combinatorial algorithms
  • 1 Mathematics of computing → Combinatorics
  • Show More...

  • Refine by Keyword
  • 1 Approximation Algorithms
  • 1 Binary Matrices
  • 1 Cerný conjecture
  • 1 Corpus annotation
  • 1 Data Leak
  • Show More...

  • Refine by Type
  • 11 document

  • Refine by Publication Year
  • 3 2018
  • 2 2019
  • 2 2020
  • 1 2007
  • 1 2009
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail