Motivation: Data-independent acquisition mass spectrometry allows for comprehensive peptide detection and relative quantification than standard data-dependent approaches. While less prone to missing values, these still exist. Current approaches for handling so-called missingness have challenges. We hypothesized that non-random missingness is a useful biological measure and demonstrate the importance of analysing missingness for proteomic discovery within a longitudinal study of disease activity.
Results: The magnitude of missingness did not correlate with mean peptide concentration. The magnitude of missingness for each protein strongly correlated between collection time points (baseline, 3 months, 6 months; R = 0.95-0.97, CI = 0.94, 0.97) indicating little time-dependent effect. This allowed for the identification of proteins with outlier levels of missingness that differenti-ate between patient groups characterized by different patterns of disease activity. The association of these proteins with disease activity was confirmed by machine learning techniques.
Conclusion: Our novel approach complements analyses on complete observations and other missing value strategies in biomarker prediction of disease activity.