Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Programming Language Design and Implementation for IDEs


   College of Engineering and Computer Science

  Dr Fabian Muehlboeck  Applications accepted all year round  Awaiting Funding Decision/Possible External Funding

About the Project

Programming languages typically have a particular syntax and semantics, chosen by their designers, and both are often variations on those of existing similar languages. The basic tasks in creating a programming language are, first, coming up with the relevant designs, and then implementing a compiler and/or interpreter for it. The latter are typically built in stages - a lexer, followed by a parser, followed by a few layers of checking and transforming an internal representation of the program (typically some form of abstract syntax tree), and ultimately, code generation or interpretation. In such a setting, it is fine for any stage to simply abort the whole process if it runs into an error: if a program does not fully parse, there is no need to go through type checking - we can simply abort and ask the programmer to fix their code first before trying again.

The above approach, while beautifully simple, results in a user experience that leaves much to be desired. For one, aborting on every single error makes it impossible to fix many smaller and independent problems in one swoop. For another, certain modern development tools, in particular Integrated Development Environments (IDEs), are expected to provide live information on any given program text, even if the program is incomplete. As such, they require being able to run a partial type check on programs that do not conform to a language's grammar, simply because the program is incomplete. For example, any text that contains the lines:

if(x==y) {

x.

}

(not inside a commented-out block) would not be a valid Java program, and yet, we expect an IDE like Eclipse or IntelliJ to provide a programmer with a list of potential methods that could follow the dot-operator after x, depending on whatever type x might have.

Creating good IDE support is a major engineering effort that requires specializing the code for the particular language at hand, and therefore only a small number of major programming languages enjoy the highest level of such support. Yet, even IDEs created with vast resources fundamentally cannot provide good help in some situations if the language was not designed for it. For example, generic method definitions have essentially the same meaning in Java and C#, yet slightly different syntaxes:

  • Java: public <K,V> Map<K,V> copyMap(Map<K,V> map) { ... }
  • C#: public Dictionary<K,V> copyDict<K,V>(Dictionary<K,V> dict) { ... }

On the level of abstract syntax that programming language researchers and internal compiler code often deal with, these two ways of specifying generic methods are identical. But the order of specifying things matters for partial programs, especially when we make assumptions about how programmers write their code. For example, it seems reasonable to assume that a programmer would write the signature of the function from left to right. This makes a difference when looking at what information an IDE can provide at the point where the programmer writes the return type: in Java, the type variables K and V have already been declared, and thus can be shown and suggested as type variables that are in scope. In C#, a programmer writes the return type before they declare the type arguments, meaning that the IDE would not know whether K and V are supposed to become type variables or are invalid type names that should possibly be completed to valid ones. In part due to this difficulty, C# has adopted a coding convention that type variables always start with a capital "T", and Visual Studio, the main IDE for C#, will make assumptions accordingly.

The point here is that while there is research into automatically generating IDE functionality from more high-level descriptions in the hope of eventually allowing all languages to have good IDE support [1], that research - along with many modern approaches to parser generator - has followed an approach that tries to accept almost any sort of specification and then simply generating best-effort tooling for it, which may or may not fail in many situations.

The goal of this research project is to create testable formal models of ways in which programmers might edit code, and use those models to check formal models of programming language definitions for compatibility with certain kinds of IDE features. For example, is it possible to write a complete program and get correct and as-complete-as-possible partial information about every intermediate step of writing the program? Is it at least possible to write a program following a given model in such that an IDE would have helpful information at every step?

Such models could in particular be used to check specifications for existing language workbenches (e.g. Spoofax [1]) and get some guarantees about the quality of whatever best-effort tooling it creates, and otherwise, new tooling could be built based upon those ideas.

Successful applicants for this project would have at least some background in parsing and formal programming language theory, for example by having taken relevant advanced courses, or by having done projects in those or closely related areas. More information can be found at https://users.cecs.anu.edu.au/~fabian.muehlboeck/phd.html .

Computer Science (8)

Funding Notes

Possible funding via ANU HDR Scholarship or through School of Computing


References

[1] The Spoofax language workbench: rules for declarative specification of languages and IDEs. Lennart C. L. Kats and Eelco Visser. OOPSLA 2010, 444–463. Reno/Tahoe, Nevada, 2010. ACM. URL: https://doi.org/10.1145/1869459.1869497