Title: Exercises in Free Syntax. Syntax Definition, Parsing, and Assimilation of Language Conglomerates
Abstract: In modern software development the use of multiple software languages
to constitute a single application is ubiquitous. Despite the
omnipresent use of combinations of languages, the principles and
techniques for using languages together are ad-hoc, unfriendly to
programmers, and result in a poor level of integration. We work
towards a principled and generic solution to language extension by
studying the applicability of modular syntax definition, scannerless
parsing, generalized parsing algorithms, and program transformations.
We describe MetaBorg, a method for providing concrete syntax for
domain abstractions to application programmers. Since object-oriented
languages are designed for extensibility and reuse, the language
constructs are often sufficient for expressing domain abstractions at
the semantic level. However, they do not provide the right
abstractions at the syntactic level. The MetaBorg method consists of
embedding domain-specific languages in a general purpose host language
and assimilating the embedded domain code into the surrounding host
code. Instead of extending the implementation of the host language,
the assimilation phase implements domain abstractions in terms of
existing APIs leaving the host language undisturbed.
We present a solution to injection vulnerabilities. Software written
in one language often needs to construct sentences in another
language, such as SQL queries, XML output, or shell command
invocations. This is almost always done using unhygienic string
manipulation. A client can then supply specially crafted input that
causes the constructed sentence to be interpreted in an unintended
way, leading to an injection attack. We describe a more natural style
of programming that yields code that is impervious to injections by
construction. Our approach embeds the grammars of the guest languages
into that of the host language and automatically generates code that
maps the embedded language to constructs in the host language that
reconstruct the embedded sentences, adding escaping functions where
appropriate.
We study AspectJ as a typical example of a language conglomerate,
i.e. a language composed of a number of separate languages with
different syntactic styles. We show that the combination of the
lexical syntax leads to considerable complexity in the lexical states
to be processed. We show how scannerless parsing elegantly addresses
this. We present the design of a modular, extensible, and formal
definition of the lexical and context-free aspects of the AspectJ
syntax. We introduce grammar mixins, which allows the declarative
definition of keyword policies and combination of extensions.
We introduce separate compilation of grammars to enable deployment of
languages as plugins to a compiler. Current extensible compilers focus
on source-level extensibility, which requires users to compile the
compiler with a specific configuration of extensions. A compound
parser needs to be generated for every combination. We introduce an
algorithm for parse table composition to support separate compilation
of grammars to parse table components. Parse table components can be
composed (linked) efficiently at runtime, i.e. just before
parsing. For realistic language combination scenarios involving
grammars for real languages, our parse table composition algorithm is
an order of magnitude faster than computation of the parse table for
the combined grammars, making online language composition feasible.
Publication Year: 2003
Publication Date: 2003-11-01
Language: en
Type: dissertation
Access and Citation
Cited By Count: 41
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot