Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Efficient textual computations via combinatorial arrangements


   Dept of Computer Science

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities
Prof A Gammerman, Dr J Daykin  Applications accepted all year round  Self-Funded PhD Students Only

About the Project

One way of handling the avalanche of digital information, arising for instance in `big data’ contexts, is to arrange it so as to increase the efficiency of computations – a classic example is organizing directories in lexicographical (dictionary) order for fast searching; similarly indexes are lex-ordered for information retrieval procedures.
In addition to the class of lex-orders, alternative ordering strategies are increasingly finding applications in computer science. Gray coded permutation sequences are recursively generated in a minimum change order; the order of binary Gray codes is used for error correction in digital communications. The classic lex-order Burrows-Wheeler Transform (BWT) is a text transformation scheme applied in pre-processing for lossless compression algorithms, and is embedded in HTS bioinformatics alignment technologies: Bowtie, BWA and SOAP. Recently, an alternative BWT has applied V-order rather than lex-order. De Bruijn sequences have deep connections with Lyndon words, and factoring strings of text into lex-ordered Lyndon patterns arises in: music analysis for the enumeration of cyclic musical structures; specialized string sorting problems; succinct suffix-prefix matching of highly periodic strings; and digital geometry, the characterization and recognition of discrete objects. Recent research has modified suffix arrays (SA), a powerful data structure, from a lexicographic-based structure to one of V-order. The SA of a string of text can be used as an index to quickly locate every occurrence of a substring within the string; applications of SA’s include example-based machine translation and computing the BWT.
The aim of the project is to study various ordering methods & combinatorial techniques, and apply them to algorithmics and data structures. Computational efficiency will be evaluated theoretically and/or experimentally in the context of text-based (stringology) applications (some application areas are proposed above). For experimentation, data may be accessed from the UCI Machine Learning Repository, on-line music / biology / corpus data sets, etc.
To discuss specific problems suitable for a PhD project email: [Email Address Removed]
Target conferences for presenting results include: PATTERNS, WABI, PSC, CPM, IWOCA, SPIRE, CUBE, WORDS and CIAA.

Where will I study?

 About the Project