Measuring the Quality and Cultural Biases of user-generated web content
Science Without Borders Project
It is now more than 10 years since a famous study was published by Nature, comparing the quality of free Wikipedia against Encyclopaedia Britannica, arguably the most scholarly of the expert-based encyclopaedias. The study found how Wikipedia scientific articles had only a small, not significant number of errors more than Britannica articles.
However, no major studies followed that one, despite user-generated content represents nowadays a significant part of all the web content, and despite the success of many other notable examples of user-generated content, such as StackOverflow, StockWiki, OpenMap, TripAdvisor and so forth.
The soundness of user-generated content is still under scrutiny, and indeed it is not only a question of soundness. It is also relevant how information is presented. Content – even when accurate – could suffer from cultural or linguistic prejudices, omissions, subjectivity, bias and lack of neutral point of view.
The aim of this project is to systematically analyse how common topics are presented by user-generated sources against analogous expert-based sources.
In addition to online Encyclopaedias, the project could be extended to other user-generated online repositories to be compared against their expert-based counterparts. For instance, in the field of Computing the project could compare the quality of the Stack Overflow forum with University textbooks or expert websites covering the same topics.
The project will employ both quantitative and qualitative research methods. The candidate will design novel indicators of information quality, tools for objectivity analysis and various tests to discover information bias, not only by considering the text produced but also the network of users creating such a text.
Regarding the computational technique that could be used, the main candidates are represented by Text and Data Mining, Machine Learning and Social Network Analysis. The tools developed could have interesting commercial potential in fields such as reputation systems, automatic content quality and computational trust.
Science Without Borders