On the comparison of survival curves of two groups of chronic kidney disease patients based on progressively censored data

Introduction: Chronic kidney disease (CKD) is the progressive loss of kidney function. Prevalence of every stage of CKD is rising over the period with increasing number of diabetic, hypertensive and elderly population. It is becoming a problem of epidemic proportions in India. Objectives: Comparison of the survival function of CKD patients with different disease stages criticality grouped on the basis of gender, diabetes and hypertension. Patients and Methods: The retrospective data of 117 patients suffering from CKD during the period March 2006 to October 2016 is used. In the present study, log-rank, Gehan-Wilcoxon, Tarone-Ware, Peto-Peto, modified Peto-Peto and tests belonging to Fleming-Harrington test family with different (p, q) values are applied to test the statistical significance of the difference between two survival functions under different conditions. The parametric test has also been applied to compare the survival time distribution of two groups. Results: Kaplan-Meier method and survival comparison tests suggest no difference between survival experiences of the two groups namely female and male on the basis of grouping variable gender. However, in simulation study as we increase the sample size it is observed that it affects more women than men especially in stage 3 of CKD patients. The survival functions of two groups of CKD patients based on diabetes and hypertension differ significantly. Conclusion: The survival experiences of two groups of CKD patients based on the grouping variables diabetes and hypertension differ significantly on the basis of real data and simulation study. The grouping variable gender as a significant factor becomes evident only when large samples are generated under simulation study.


Introduction
The survival analysis involves a number of statistical analytical methods when the outcome variable is the time until a specified event of interest occurs. Clinical outcome defines the event of interest in clinical research. The associations between the prognostic factors and clinical outcomes can be well examined with the help of survival analysis. It also helps in predicting the individual's risk of developing a clinical outcome. In chronic kidney disease (CKD), most of the patients experience censored event time on account of end of the pre-specified time period or death or withdrawal from the study or some other competing event. The most important aspect of the survival studies is the comparison of survival time of different groups. The need for the comparison of survival time distribution or failure time distribution among two or more groups is always felt in the field of biomedical studies. Rossing et al (1) applied Log rank test to compare the survival curves corresponding to the three levels of albuminuria in diabetic patients who were dependent on insulin. Joss et al (2) used the Kaplan-Meier method to derive the survival curves and applied log-rank test to determine the statistical significance between the differences in estimates of survival functions of diabetic nephropathy patients and type 2 diabetic patients. Clark et al (3) used Kaplan-Meier method for estimating and comparing different groups of ovarian cancer patients. The data were collected from Western general hospital in Edinburgh. They have also used the technique for analyzing data on lung cancer patients. Chiaranda et al (4) assessed the differences in survival in patients with cardiovascular disease by Kaplan-Meier curves. To determine the effect of vein graft intervention on survival times in diabetic case, Ashfaq et al (5) compared the two groups of patients having diabetes and no diabetes by applying log rank test. Villar et al (6) applied Cox proportional hazard model to assess the effect of renal replacement therapy on survival time among three groups of patients, namely patients suffering from type 1 diabetic disease, patients suffering from type 2 diabetic disease and

Implication for health policy/practice/research/ medical education
The survival function of two groups of CKD patients based on grouping variable diabetes, hypertension and gender have been compared by different statistical methods and it was found the survivability is less in case of CKD patients suffering from diabetes and hypertension. It was further observed that survivability is less in female CKD patients. It is advised due care must be taken to control the diabetes and hypertension by changing the life style, modifying the diet, inclusion of physical exercise and proper medication. Steps should be taken to identify and control the early stage of CKD.
non-diabetic patients. Zhao et al (7) applied generalized log-rank test for studying the statistical significance of difference between the survival times of two groups. Akbar et al (8) compared the performances of the logrank and generalized Wilcoxon tests with low and high censoring rates for small and large sample sizes. In case of small sample size, the comparison between log-rank, Gehan-Wilcoxon, Tarone-Ware, Peto-Peto and F-H tests was studied by Jurkiewicz and Wycinka (9). Hsu et al (10) applied survival analysis to evaluate factors associated with time to an event of interest namely end-stage renal disease and mortality among CKD populations.
In this study we estimated the survival function of two groups of CKD patients using different non-parametric tests. Here, the survival time is defined as time from diagnosis of current stage of CKD to the progressed stage of disease, that is, the change of stage till the end of study. The data obtained is subject to type I progressive censoring as the event of interest (change of stage of disease) may not be observed for all patients until the end of study. The censoring time varies for each patient due to difference in time points of joining the study. This article deals with the comparison of the survival function of two groups of CKD patients with different disease stages criticality grouped on the basis of sex, diabetes and hypertension. The non-parametric method namely Kaplan-Meier method is applied to estimate and compare the survival function of two or more groups over time. The nonparametric method will solve the validity issue associated with progressive censored data. Survival curves have been drawn to study the difference between the survival function of two groups of patients. However, these curves provide crude idea only about the difference in survival function. Since, the survival studies involve the data on censored observations too, some special non-parametric tests are required for testing the statistical significance of difference in survival functions of two or more groups. In the present study, Log-rank, Gehan-Wilcoxon, Tarone-Ware, Peto-Peto, Modified Peto-Peto and tests belonging to Fleming-Harrington test family with different (p, q) values are applied to test the statistical significance of the difference between two survival functions under different conditions. The likelihood ratio test has also been applied for the comparison of survival time distribution of two groups by fitting an appropriate distribution. Simulation study has been carried out for the comparison of survival time distributions corresponding to two groups. It also helps in overcoming the limitation of small sample data/ cross sectional data, if any. The methods and procedures used in this article are not confined to CKD only but can also be applied for other biomedical studies requiring the comparison of survival function of two or more independent groups.

Objectives
Comparison of the survival function of CKD patients with different disease stages criticality grouped on the basis of gender, diabetes and hypertension.

Study design
In this study we use the data set comprising of information like time of visit, stage of disease, gender of patient, status of diabetes and hypertension corresponding to 117 CKD patients.
Here, the survival time is time of change of disease stage from the initial diagnosed stage to higher stage till the end of the study period. Let r j denotes the number of CKD patients who can experience the event just before the time t j that is, the number of patients who are at risk at time t j . In addition, let d j denotes number of CKD patients who have experienced the change of stage at time t j . Let

Survival comparison tests
These non-parametric tests for test of hypotheses uses the observed and expected estimator of survival function computed from the underlying model defined under the null hypothesis. Comparison of weighted difference between the observed and expected survival function is always preferred over direct comparison. Under the weighing system, it is possible to put more weight on certain parts of the curve by assigning different set of weights. When different sets of weights are used the test becomes more sensitive to earlier, middle or later differences from the hypothesized relationship defined under null hypothesis. The hypotheses for the comparison of the survival functions of the two groups are defined as: Where, τ is the largest time ensuring at least one individual at risk in both the groups. The objective is to make an inference about the survival function for all the time points less than τ. The sample data consists of right censored observations for both the populations. Let 1 2 t < t < . . . < t D be distinct event times in the pooled sample. Further, let i denotes the group number (i = 1,2) j = 1, 2, . . ., D denotes the time at which event has occurred. d ij denotes the number of individuals experiencing the event in the i th group at time t j r ij denotes the number of individuals at risk in the i th group at time t j d j denotes the total number of individuals in both the groups who experience the event at time t j r j denotes the total number of individuals at risk at time t j in both the groups r 1j denotes the number of individuals at risk at time t j in group 1 r 2j denotes the number of individuals at risk at time t j in group 2 Let W i (t) defines the positive weight function such that W i (t j ) takes the value zero whenever r ij is equal to zero. The test statistic for testing the null hypothesis (2) is based on the following quantity In practice, all the survival comparison tests use the following weight function Where, W (t j ) denotes the common weight which is assigned to each group. Using (7) in (6), we have The variance of Zi(τ) defined in equation (6) is given by The test statistic Z for testing (6.2) is defined as Under null hypothesis, the test statistic Z follows standard normal distribution for large sample size. Alternatively, test statistic can also be expressed as a chi-square statistic with one degree of freedom which is computed as a square of standard normal variate and is defined as: The observed value of the chi-square test statistic is compared with tabulated value of chi-square variate with one degree of freedom (11)(12)(13). Depending on the choice of weight functions, a number of comparison tests have been defined.
According to the studies by Fleming et al (14), Lee (15) and Buyske et al (16), log-rank test is more powerful under the assumption of proportionality of hazard ratio of the groups along the follow-up period. Log-rank test fails to detect the differences between the groups which arise either early or late in the interval in the study by Klein et al (17). Gehan-Wilcoxon and Tarone-Ware tests may be more powerful than log-rank test in the case of non-constant hazard ratio, as shown by Tarone and Ware (18). Pepe and Fleming (19) in their studies. When the condition of proportional hazard functions is not satisfied, the Peto-Peto test is also better than log-rank test, as shown in the study by Kleinbaum and Klein (20). When the underlying assumption of Gehan-Wilcoxon and Peto-Peto test is not satisfied then the Peto-Peto test is more efficient than Gehan-Wilcoxon test. Fleming and Harrington (F-H test) tests provide more flexibility for choosing weights and are designed for crossing of hazard ratios of groups, as shown by Pepe and Fleming (21). The Gehan-Wilcoxon test may provide misleading results when censoring pattern differs in the individual sample.

Log-rank test
Log-rank test assumes the proportionality of hazard functions of the two populations. Weight function and Test statistic for the test are defined as:

Gehan test
Weight function and Test statistic are defined as:

Tarone-Ware test
Tarone and Ware in the year 1977 took the weight function W(t j ) = f(r j ) for all j and suggested a class of tests. Here, f is a fixed function. In particular, they assigned the value of weight function as f(r) = r , and thereby gave more weightage to the time point where number of data are large. The test statistic is given by;

Peto-Peto test
Peto-Peto test (21) can be regarded as an alternative form of Mann-Whitney-Wilcoxon test for censored-data. The weight function and test statistic are defined as Where, The estimate of survival function defined above is close to the pooled product-limit estimator.

Modified Peto-Peto test
Anderson et al (22) suggest modification in weight function given by Peto-Peto. They suggest weight as The weights in case of modified Peto-Peto test and Peto-Peto test are the function of combined survival experience in the pooled sample. The Test statistic is given by:

Fleming-Harrington test family
Fleming and Harrington suggest a class of test. The weight function used in this test family is given by: Here, the weight function is a function of survival function of the previous event time. Hence, the weight function formula requires the information of survival function just before the comparison time. The test statistic is given as: Where, ˆ( ) S t is K-M survival function and is defined aŝ The desired region of the curve can be assigned weight by taking different values of p and q. Some of the well known tests are special case of this test. Log-rank test is a special case of this test when p = q = 0. Mann-Whitney-Wilcoxon version of the test can be obtained by taking p = 0 and q = 1. The early departure and late departure in time can be assigned most weight by taking (p >1 and q = 0) and (p = 1 and q > 0) respectively. An appropriate choice of p and q helps in constructing the most powerful test for different hazard rates at any desired region.

Likelihood ratio test
An appropriate distribution for survival time is selected for each group of CKD patients on the basis of minimum Akaike information criterion (AIC) value by fitting survival distributions like exponential, lognormal, gamma and Weibull. The likelihood ratio test (parametric test) is applied for comparing the distributions of survival time of two groups of CKD patients based on grouping variables sex, diabetes, and hypertension.

Simulation studies
A simulation study has been carried out to compare the survival time distribution of two groups on the basis of likelihood ratio test. The samples of sizes 50, 100, 200, and 500 for each group are generated using the value of the parameter(s) of the selected distribution for the original set of data. The purpose of simulation study is to validate the results obtained earlier and overcome limitation, if any associated with small sample data/cross sectional data.

Statistical analysis
Statistical software SPSS (IBM SPSS Statistics version 25.0), EXCEL (2013) and R (R version 4.0.3) have been used for calculation and analysis. As the sample size is large, the non-parametric tests were approximated by standard normal test (Z-test). Further an alternative to Z-test, chi-square test of goodness of fit was used. For all the tests p-value has been computed for testing the hypothesis and have been shown in the various tables.

Results
In this study, we use the data set comprising of information like time of visit, stage of disease, sex of patient, status of diabetes and hypertension corresponding to 117 CKD patients. As per the grouping variable sex, the number of uncensored cases in females and males are 49 and 44 respectively and the number of censored cases for grouping variable sex among females and males are 18 and 6 respectively. In the grouping variable diabetes, there are 47 uncensored cases out of 55 cases having no diabetes, and the number of uncensored cases in patients having diabetes is 46 out of 62 diabetic cases. Similarly, when grouping is conducted on the basis of hypertension, the number of uncensored cases in non-hypertensive and hypertensive patients is 83 and 10 respectively and that of censored cases are 20 and 04 respectively. These figures are shown in Table 1. The median survival time for female and male CKD patients are 8.170 years and 7.470 years respectively. The estimate of median survival time along with the standard error of the estimate of different groups based on grouping variables sex, diabetes, and hypertension are presented in Table 2.
Survival function curves are drawn by plotting Kaplan-Meier estimator of the survival function against time for female and male groups of CKD patients, non-diabetic and diabetic patients, and non-hypertensive and hypertensive patients and are shown in Figures 1-3 respectively.      Tables 3-5, respectively. The most appropriate distribution on the basis of AIC (Akaike Information Criterion) value and density curve of the fitted distribution for the survival time of different groups of CKD patients is selected by fitting lognomal, normal, gamma, Weibull and Exponential distribution to the survival time. AIC values of the different fitted distribution are shown in Table 6. Survival functions of different groups based on grouping variable sex, diabetes and hypertension along with pooled survival function for different time periods are shown in Table 7. The histogram and theoretical probability curves corresponding to the fitted distribution for different groups are shown in figures from Figure 4A-F.

Comparison of survival functions of two groups based on grouping variable of gender
The selected distribution with the estimated value/s of the parameter/s, AIC value, value of chi-square statistic based on likelihood ratio test and corresponding p-value for each grouping variable are shown in Table 8.
The value of Chi-square statistic and corresponding p-value using likelihood ratio test statistic of the selected distribution for different groups based on grouping variables sex, diabetes, and hypertension from simulation study for different sample size are shown in Table 9 to Table 11.
Survival curves ( Figure 1) drawn with the help of K-M method suggest that there is no difference between survival experiences of the two groups namely female and male on the basis of grouping gender variable for CKD patients. In addition, all the comparison tests namely Log-rank, Gehan, Tarone-Ware, Peto-Peto, Modified Peto-Peto, and Fleming-Harrington test conclude that there is no significant difference between the male and female group of CKD patients with respect to survival experiences ( Table 3). The result obtained from the likelihood ratio test also supports the finding that there is no significant difference in survival time distribution of the female and male group of CKD patients (Table 8). However, the difference between the two groups becomes evident as we increase the size of the sample in simulation study (Table 9). Carrero et al (23) found epidemiology of CKD differs by sex too. It affects more women than men especially in stage 3 of CKD patients.
Survival curves of non-diabetic and diabetic groups ( Figure 2) suggest that there exists a difference in survival experiences of two groups. The results from the survival comparison tests except Gehan test ( Table 4) also conclude that the survival functions of two groups of CKD patients differ significantly. Gehan test is not an appropriate test in this case. Likelihood ratio test also concludes that the two groups of CKD patients based on grouping variable 'diabetes' have statistically significant survival time  distributions ( Table 8). The P value of the test decreases as we increase the size of the sample in simulation studies (Table 10). Thus, simulation study also supports the finding that the survival time distribution of two groups differs significantly. Survival curves (Figure 3) drawn for non-hypertensive and hypertensive groups of CKD patients reveal the fact that there exists a difference among the survival experiences of two groups of CKD patients. The result of survival comparison tests except Gehan test (Table 5) concludes that the survival functions of two groups of CKD patients differ significantly. P-value of the test is less than 0.05 except in Gehan test (not appropriate in this case). The result obtained from the likelihood ratio test is also concludes that the survival time distribution of the two groups differ significantly. The difference between these groups becomes more evident as we increase the size of the sample under simulation study (Table 11).

Conclusion
Real data set and simulation study conclude that there is no significant difference between the two groups of CKD patients based on grouping variable diabetes and hypertension. However, the grouping variable sex is a significant factor when large samples are generated under simulation study. One has to be careful while choosing a method/test for comparing the survival curves of the two groups. Due care must be taken while deciding about the sample size.

Limitations of the study
The data set considered in this study is small. The data has been collected from the CKD patients of Delhi and its surrounding areas. General awareness about health is quite high amongst the people of this area. Moreover, this area has good medical facilities.
Author's contribution SK is the single author of the paper

Conflicts of interest
The author declares no conflict of interest.

Ethical issues
The research followed the tents of the Declaration of Helsinki. Accordingly, written informed consent taken from all participants before any intervention. Ethical issues (including plagiarism, data fabrication and double publication) have been completely observed by the author.

Funding/Support
No financial support has been received during this study from any funding agency, organization or any pharmaceutical company.