Background: Research including our own show that children with single and multiple neurodevelopmental disorders exhibit poorer health, education, and employment outcomes 1 however current delays, of sometimes years, before neurodevelopmental conditions are diagnosed and managed cause preventable distress to the child, family, and teachers and lasting educational and social disadvantage. We hypothesise that developing a risk stratification tool will enable neurodevelopmental multimorbidity to be detected and managed earlier; thereby reducing adversity on affected children and their families.
Aims: To link Scotland-wide health and education data together and develop and validate a tool to predict neurodevelopmental multimorbidity in children and adolescents that can be used clinically to support earlier detection, and hence earlier interventions and support. We will undertake individual-level record linkage of several Scotland-wide education and health databases. Education records (exam results, absenteeism, exclusion, additional support needs and leaver destination) for all pupils attending Scottish schools between 2009 and 2020 will be linked to prescribing data, maternity records, neonatal admissions, child health records, acute and psychiatric hospitalisations, health behaviour in school aged children survey data, and deaths. The Scottish pupil census holds information on all children attending local authority maintained primary, secondary, and special schools in Scotland covering 95% of school aged (4-19 years) children. Data between 2009 and 2020 will comprise records pertaining to over 1 million schoolchildren. We will focus on attention deficit hyperactivity disorder (ADHD), epilepsy, and severe depression (ascertained from prescribing data) and autism, intellectual disabilities, dyslexia, dyscalculia, and language and speech disorders ascertained from school records of special educational need. Children having two or more of those conditions will be classified as having neurodevelopmental multimorbidity.
After data cleaning, merging and recoding, we will determine risk factors associated with neurodevelopmental multimorbidity including maternal medication, maternal antecedents (smoking, age, parity, previous abortions), pregnancy outcomes (birthweight and intrauterine growth restriction, Apgar score, mode of delivery, gestational age), early life hospitalisations (neonatal, acute, psychiatric), early life growth trajectories and development (pre-school cognitive measures), early life injury/trauma (hospitalisations), childhood medication for other chronic conditions, sociodemographic factors and health behaviours, and school progress(absenteeism, exclusion, special educational need, attainment, and unemployment on leaving school).
To develop a risk stratification tool we will randomly split the data into training, validation, and test datasets. After appropriate transformation and scaling of data, we will train and fine-tune several classifiers (e.g. logistic regression, linear discriminant analysis, support vector machines (SVM) and random forests) to predict the outcome of neurodevelopmental multimorbidity, using K-fold cross validation to reduce the risk of overfitting. Each classifier will be evaluated using the confusion matrix to derive estimates of precision (true positives divided by the sum of true and false positives) and recall (true positives divided by the sum of true positives and false negatives). This metric is preferred to receiver operating characteristic curves when the class being predicted is rare. We will select the appropriate threshold for classification based on inspection of precision-recall versus threshold plots and precision versus recall curves. Should the individual classifiers prove a mediocre fit, we will explore further development and evaluation using ensemble methods, which often produce better predictions than one preferred classifier. The preferred model will be useful to clinicians to help identify children who require further investigation to enable earlier diagnoses of neurodevelopmental multimorbidity. Analyses will most likely be performed using R and Anaconda Python.
Training outcomes: The student will undergo training (via courses and self-learning) in the following: Safe researcher training, R programming, statistical methods, data linkage methods, analysing ‘big’ data, machine learning techniques, additional statistical programming packages (if needed) such as SPSS, Stata, SAS and python.
Apply Now