Penalized joint generalized estimating equations for longitudinal binary data

Research output: Contribution to journalArticlepeer-review


In statistical research, variable selection and feature extraction are a typical issue. Variable selection in linear models has been fully developed, while it has received relatively little attention for longitudinal data. Since a longitudinal study involves within-subject correlations, the likelihood function of discrete longitudinal responses generally can not be expressed in analytically closed form, and standard variable selection methods can not be directly applied. As an alternative, the penalized generalized estimating equation is helpful but very likely results in incorrect variable selection if the working correlation matrix is misspecified. In many circumstances, the within-subject correlations are of interest and need to be modeled together with the mean. For longitudinal binary data, it becomes more challenging because the within-subject correlation coefficients have the so-called Frechet-Hoeffding upper bound. In this paper, we proposed SCAD-based and LASSO-based penalized joint generalized estimating equation (PJGEE) methods to simultaneously model the mean and correlations for longitudinal binary data, together with variable selection in the mean model. The estimated correlation coefficients satisfy the upper bound constraints. Simulation studies under different
scenarios are made to assess the performance of the proposed method. Compared to existing penalized generalized estimating equation (PGEE) methods that specify a working correlation matrix for longitudinal binary data, the proposed PJGEE method works much better in terms of variable selection consistency and parameter estimation accuracy. A real dataset on Clinical Global Impression is analyzed for illustration.

Bibliographical metadata

Original languageEnglish
JournalBiometrical journal. Biometrische Zeitschrift
Publication statusAccepted/In press - 5 Jun 2021