Research interests

Broadly speaking, I am interested in developing and applying statistical and machine learning approaches to describe biological data and uncover novel insights. I have worked with a variety of genome-scale and population-level data sets, in both humans and microbial pathogens.

Machine learning (ML) to improve disease understanding and healthcare

The application of ML to clinical data sets presents numerous challenges; including data handling, cleaning, model construction, interpretation and value extraction. These processes require not only technical expertise, but also a deep understanding of the nature of the data itself. At the MRC Laboratory of Molecular Biology and Cambridge Heart and Lung Research Institute (HLRI), I am working closely with clinicians, statisticians, principal investigators, and funders to use clinical data sets to better understand diseases including Cystic Fibrosis and Bronchiectasis.

/images/ml_plots.png

Genome-wide association (GWAS) and quantitative trait locus (QTL) mapping

The availability of large-scale whole-genome sequencing (WGS) data sets and matched phenotypic or gene expression data allows for the statistical identification of functionally important genetic variants at unprecedented scale. Extending such techniques to new data types, populations and species will enable better functional annotation of genomes. During my time in the Floto group at the MRC Laboratory of Molecular Biology, I have developed a novel approach for the quantification of bacterial gene expression across strains of a single species, and I am currently integrating this strategy with QTL mapping on matched variant data to identify genomic regions associated with altered gene expression profiles and antibiotic resistance.

/images/gwas_plots.png

Ribosome profiling

Sequencing of ribosome-protected RNA fragments (RiboSeq) is a powerful method for monitoring translation of cellular RNA into protein, at the level of individual codons. Detailed quality control and processing must be carried out on RiboSeq data, prior to extracting biological insights from downstream analysis. Whilst working in the Firth group at the University of Cambridge, I worked with world-leading experts in the field to develop and enhance the value of such data, deriving novel conclusions that have enabled a better understanding of the molecular interactions between viruses and their host cells, as well as the fundamental process of translation in human mitochondria.

/images/riboseq_plots.png

Comparative genomics

Understanding genome function can often be aided by comparing across populations, strains, or species. During my PhD studies with Professor Brendan Loftus at UCD, Dublin, I applied comparative genomic and transcriptomic approaches to better understand gene expression in mycobacteria, an important group of bacterial pathogens. More recently, I have also used comparative methods to describe the longest known reverse-strand gene in an RNA virus.

/images/compgenome_plots.png