/images/AD_MicroSoc_bw.png
Research Associate in Bioinformatics

I am a Postdoctoral Research Associate in the Department of Medicine at the University of Cambridge and in the Laboratory of Molecular Biology, Cambridge. I am interested in the development and application of computational methods for functional genomic annotation using large scale data sets from bacteria, viruses, and humans.

I completed my PhD at University College Dublin, where I received a fellowship from the Wellcome Trust as part of the Computational Infection Biology programme.

Since graduating, I have worked with Professor Andrew Firth in the Department of Pathology, University of Cambridge, and more recently with Professor Andres Floto in the Laboratory of Molecular Biology and Department of Medicine, University of Cambridge.

I will use this space to discuss genomics, genetics, bioinformatics, data visualisation, machine learning, and related topics. Please feel free to contact me if you would like to discuss anything further with me.

ChatGPT: the future of writing code?

There has been a lot of discussion recently about OpenAI’s new large language model, ChatGPT. It is essentially a very advanced chatbot, capable of issuing sophisticated, human-like responses to user prompts. It’s trained on a vast quantity of data gathered from the internet (the underlying model, an autocomplete text generator named GPT-3.5, requires 800GB of memory for training), and it can seemingly accomplish everything from writing poetry to tutoring, storing and manipulating data, or solving crossword puzzles.

Using Docker images with NextFlow

Bioinformatics pipelines are often difficult to reproduce, consisting of a mixture of Bash executions and scripting, involving the use of modules and libraries with large lists of dependencies. The problem of reproducibility has come more sharply into focus as science consists increasingly of analyses of large-scale data sets. In the short history of the field of Bioinformatics, it has been common practice for pipelines to be constructed as large, indecipherable and sparsely documented shell scripts.

Introducing codondiffR

codondiffR is an R package for the calculation, visualisation, and comparative analysis of codon usage metrics in user-supplied protein-coding nucleotide sequences. Pre-defined codon usage statistics for reference taxa come from the RefSeq subset of the latest release of the Codon Usage Table Database made by Athey et al. (2017). Mean codon usage frequency difference (MCUFD) metric is calculated as described in Stedman et al. (2013), and linear discriminant analysis is performed using the implementation in the MASS package.