Distinguishing prognostic and predictive biomarkers: an information theoretic approach

K Sechidis, K Papangelou, PD Metcalfe… - …, 2018 - academic.oup.com
K Sechidis, K Papangelou, PD Metcalfe, D Svensson, J Weatherall, G Brown
Bioinformatics, 2018academic.oup.com
Motivation The identification of biomarkers to support decision-making is central to
personalized medicine, in both clinical and research scenarios. The challenge can be seen
in two halves: identifying predictive markers, which guide the development/use of tailored
therapies; and identifying prognostic markers, which guide other aspects of care and clinical
trial planning, ie prognostic markers can be considered as covariates for stratification.
Mistakenly assuming a biomarker to be predictive, when it is in fact largely prognostic (and …
Motivation
The identification of biomarkers to support decision-making is central to personalized medicine, in both clinical and research scenarios. The challenge can be seen in two halves: identifying predictive markers, which guide the development/use of tailored therapies; and identifying prognostic markers, which guide other aspects of care and clinical trial planning, i.e. prognostic markers can be considered as covariates for stratification. Mistakenly assuming a biomarker to be predictive, when it is in fact largely prognostic (and vice-versa) is highly undesirable, and can result in financial, ethical and personal consequences. We present a framework for data-driven ranking of biomarkers on their prognostic/predictive strength, using a novel information theoretic method. This approach provides a natural algebra to discuss and quantify the individual predictive and prognostic strength, in a self-consistent mathematical framework.
Results
Our contribution is a novel procedure, INFO+, which naturally distinguishes the prognostic versus predictive role of each biomarker and handles higher order interactions. In a comprehensive empirical evaluation INFO+ outperforms more complex methods, most notably when noise factors dominate, and biomarkers are likely to be falsely identified as predictive, when in fact they are just strongly prognostic. Furthermore, we show that our methods can be 1–3 orders of magnitude faster than competitors, making it useful for biomarker discovery in ‘big data’ scenarios. Finally, we apply our methods to identify predictive biomarkers on two real clinical trials, and introduce a new graphical representation that provides greater insight into the prognostic and predictive strength of each biomarker.
Availability and implementation
R implementations of the suggested methods are available at https://github.com/sechidis.
Supplementary information
Supplementary data are available at Bioinformatics online.
Oxford University Press