Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Text-based Measures of Information Quality in Online Health Information


   Doctoral College

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities
Dr R Evans, Prof P Ghezzi  No more applications being accepted  Competition Funded PhD Project (Students Worldwide)

About the Project

The quality of health information available on the web has become a major issue as inappropriate information can lead people away from evidence-based healthcare and have serious consequences for public health and healthcare services.

Many studies have sought to assess the quality of health information on the Web using instruments such as the Journal of the American Medical Association (JAMA) score, or the Health-on-the-net (HON) criteria. These methods measure information quality (IQ) in terms of the presence of explicit metadata (such as authorship, ownership and currency) or broad textual criteria such as readability, with the primary aim of assessing reliability and trustworthiness.

This project, which is a collaboration between the School of Computing Engineering and Mathematics and the Brighton and Sussex Medical School, aims to identify other useful dimensions of IQ based on a more detailed analysis of the text content of the pages, using techniques of Natural Language Processing (NLP). Such measures might range from relatively superficial analysis of text style or sentiment to deeper ‘understanding’ of the scientific basis of the information provided, particularly in respect to the type of interventions (therapeutic or preventative) presented to the reader.

In preliminary studies, we have explored the trustworthiness of websites of different types (commercial, government organisations etc.) returned by search engines for particular health queries. Our IQ analysis used the JAMA and HON criteria, and distributional properties of the text extracted using the SketchEngine language analysis tools. The present project aims to extend this work to explore more advanced NLP analysis of larger numbers of webpages. The data will then be analyzed using statistical methods including cluster analysis to see whether IQ can be analyzed automatically.
The project will adopt corpus-based machine learning to the development of IQ measures, involving the following key areas:

a) Corpus collection - collection and analysis of example web pages to provide insights into potential measures, and training and evaluation data for machine learning;

b) Feature selection and extraction - identifying properties of texts (‘features’) which may inform the information quality of medical texts, and developing techniques (algorithmic or statistical) to extract features from text documents automatically. These might include simple statistical models, such as n-gram distribution, ‘shallow’ semantic approaches such as sentiment analysis, and deeper semantic approaches;

c) Cluster analysis and interpretation – applying statistical techniques to correlate extracted features with an underlying semantic model of IQ;

d) Application – developing example applications of identified measures and undertaking intrinsic and extrinsic (task-oriented) evaluation of their effectiveness.

The results of this project could be used by policymakers and stakeholders in health information (health authorities, patients’ associations, etc.) to identify topics at risk of misinformation due to low-quality, unreliable websites ranking high in search engines, by commercial companies to support quality control of their web presence (and other documentation), and by search engine providers to improve the quality of the search results. It will support the development of more advanced tools for accessing health information, providing user-adapted, reliable information to allow users of a wide range of abilities to make informed decisions about their health needs.

The student will interact with scientists and medical students at BSMS working on the valuation of health IQ and in the Natural Language Technology Group to develop and apply novel approaches of natural language and semantic analysis.

Funding Notes

Each studentship is worth at least £60,000 over three years, subject to satisfactory progress. For UK/EU students this comprises £4,620 per year to cover annual tuition fees and a contribution towards living expenses of £15,480 per year. For suitable students from outside of the UK/EU the funding will be £14,130 per year to cover annual international tuition fees and a contribution towards living expenses of £6,170 per year.