Early prediction of the long-term outcome at the pre-disease stage is often a big challenge since the divergence between pre-disease and healthy is trivial and fluctuant. We developed a novel method using correlation information as features to predict 6-year glycemic status in pre-diabetic individuals, based on data of proteome and metabolomics. We first collected plasma samples of the population that the pre-diabetic individuals at the baseline level. After 6-year follow-up, 45 of these subjects returned to normal fasting blood-glucose level, another 45 remained pre-diabetes and the other 45 developed into type 2 diabetes. Then, we profiled the proteins and carnitines of the blood samples which were collected 6 years ago by mass spectrometry.
Next, to integrate these data, the edge biomarker method which used the correlation of molecules as biomarker was employed. In detail, this method transforms the molecular expression data into the correlation components of each molecular pair, which involving feature selection, classifier training and phenotype prediction on the edge-level data, and finally to construct molecular pairs from proteomics and metabolomics data. High cross-validation accuracy and functional analysis of the selected edge biomarkers suggested its clinical potentials. Evidently, most of the molecules were associated with diabetes reported by previous work. We demonstrated that edge biomarkers of proteomics and metabolomics data could effectively predict long-term outcome in a pre-disease population.