Survival time are data that measure the time to a certain event such as death, failure, response, relapse, divorce or the development of a given disease. Survival time has two important components that must be unambiguously defined: a starting point and an endpoint reached either when the event of interest occurs or when the follow-up time has ended. Survival data may include survival time, response to a given treatment, and patient characteristics related to survival, response and the development of disease.
These data can be derived from clinical and epidemiologic studies of humans who have acute or chronic disease. Unlike other statistical methods such as logistic regression, among others, survival analysis considers censoring and time. Censoring can occur when the patients lost to follow up to the end of the study. Possible censoring schemes are said to be right censoring, when the participant is still alive at a specified period of time, left censoring when the participant has experienced the event of interest before the study begin, or where the only information is that the event of interest occurs within a given interval, that is interval censoring.
In analysis of time to event data, censored observations contribute to the total number at risk till the time that the participant is no longer been followed. One advantage here is that the length of time that a participant is followed does not have to be the same for everyone.
All observations could have different amounts of time of follow-up, and the analysis can take that into account. The survival analysis can be conducted in such a way that the participants will be followed at a defined or specified starting-point, and the time needed for the event of interest to emerge will be recorded. Usually, the study ends before all participants have exhibited the event, and the outcome of the remaining participants or patients is unknown.
Also the outcome of those participants who have dropped out of the study is unknown. The time of follow-up is recorded censored data for all these cases. Kaplan Meier is derived from the names of two statisticians; Edward L. Kaplan and Paul Meier, in when they made a collaborative effort and published a paper on how to deal with time to event data. Later on, the Kaplan-Meier curves and estimates of survival data have become a better way of analyzing data in cohort study.
Kaplan-Meier KM is non-parametric estimates of survival function that is commonly used to describe survivorship of a study population and to compare two study populations. KM estimate is one of the best statistical methods used to measure the survival probability of patients living for a certain period of time after treatment.
It is an intuitive graphical presentation approach. In clinical trials or community trials, the intervention effect is assessed by measuring the number of participants saved or survived after that intervention over a period of time.
KM estimate is the simplest procedure of determining the survival over time in spite of all the difficulties associated with subjects or situations. Curves are used in Kaplan Meier estimate to determine the events, censoring and the survival probability. Kaplan-Meier survival curve is used in epidemiology to analyze time to event data and to compare two groups of subjects. The survival curve is used to determine a fraction of patients surviving a specified event, like death during a given period of time.
This can be calculated for two groups of patients or subjects and also their statistical difference in the survivals. Below is an example of Kaplan-Meier survival curve:.
The tick marks on the curve indicate censoring and the curve moves down when the event of interest occurs. The product-limit formula estimates the fraction of organisms or physical devices surviving beyond any age t, even when some of the items are not observed to die or fail, and the sample is rather small.
These successive probabilities will be multiplied by any earlier computed probabilities to determine the final estimate. For example, the probability of a sub-fertile woman surviving the pregnancy three months after laparoscopy and hydrotubation can be considered to be the probability of surviving the first month multiplied by the probabilities surviving the second and third months respectively given that the woman survived the first two months.
The third probability is known as a conditional probability. In survival analysis, intervals are defined by failures. For example, the probability of surviving intervals A and B is equal to the probability of surviving interval A multiplied by the probability of surviving interval B.
For each specified interval of time, survival probability is calculated as the number of participants surviving divided by the number of persons at risk. There are three assumptions used in this analysis.
Secondly , it is assumed that the survival probabilities are the same for participants recruited early and late in the study. Thirdly , it is assumed that the event occurs at the time specified. The limitation of Kaplan Meier estimate is that it cannot be used for multivariate analysis as it only studies the effect of one factor at the time. Log-rank test is used to compare two or more groups by testing the null hypothesis. The null hypothesis states that the populations do not differ in the probability of an event at any time point.
Thus, log-rank test is the most commonly-used statistical test to compare the survival functions of two or more groups. These groups can be treatment and control groups or different treatment groups in a clinical trial. The log-rank test cannot provide an estimate of the size of the difference between a related confidence interval and groups as it is purely a significance test.
The tables below are the tables of fictive data generated from the SPSS software. Table 1 contains the data of treatment group only while table 2 contains the data for both the two groups.
The first group in the second table is the treatment group while the second group is the control group. Each group comprises ten participants who have been followed for the period of 24 months. The participants in the treatment and control groups were given Drug A and placebo respectively and they were given alphabetical names like A, B, C…, T.
The data will be used to determine the Kaplan-Meier estimates the product limit estimate of the both the control and the treatment groups.
From the curve above, the number of events deaths in the treatment group those given drug A is 6 while that of the control group those given placebo is 7. The number of censored for treatment and control groups are 4 and 3 respectively. The curve takes a step down when a participant dies and the tick marks on the curve indicate censoring, that is when they lost to follow-up or dropped out of the study.
In the treatment group, Subject D died at 2 months. Subject A also died at 6 months, therefore the PLI is: 0. Subjects B, Q and H were censored at 7, 8 and 14 months respectively. Subject F died at 19 months, the estimate will be: 0. Subject L died at 20 months, the PLI will be 0. The next subject in the group, which is subject K, was censored at 22 months while subject N, the last subject in the group died at 24 months and that is the last month of the study.
The product limit estimate will be 0. Subject O was censored at 11 months. Subject T was censored at 15 months. Note: censored are assumed to be the participants who lost to followed-up or dropped out during the 24 month study. The curves for two different groups of participants can be compared. For example, compare the survival pattern for participants on a treatment with a control. We can identify the gaps in these curves in a vertical or horizontal direction.
A vertical gap signifies that at a specific period of time, one group had a greater probability of participants surviving while a horizontal gap signifies that it took longer for one group to experience a certain fraction of deaths.
Now the two groups in figure 3 will be compared in terms of their survival curves. The table below generated from the SPSS software will be used to test the hypothesis.
Table 2 indicates that all the three p-values are greater than 0. Therefore, statistically, the survival curves of the treatment and control groups do not differ.
Survival curves here mean the population or the true survival curves. After so much theory and explanations on KM analysis, we shall move into the creation and interpretation of the KM curve. There are six subjects in each group for ease of understanding.
The serial time and the status at the serial time are given in the table below. Status at the serial time of 1 means the occurrence of an event, and 0 means, the subject is censored. The objective is to find the cumulative probability of survival and to find is there any significant difference in the drug between the groups. As discussed earlier, the basic elements required for the analysis are 1.
Serial time, 2. Status at the serial time and the group to which the subject belongs to. The data are entered in a table and is sorted by ascending serial times beginning with the shortest times for each group.
Notice, each group has one censored subject. In a group which has male subjects, it is at the end of the trial, and in the other group, the subject was censored within the study timeline.
Step1: The packages used for the analysis are survival and survminer. Use install. Step2: The next step is to load the dataset and examine its structure. The data we will use for this analysis is the same as shown above. The data is saved as a csv file and the same is imported for the analysis in R.
Step 3: After this we are ready to create the survival object using the function Surv of the survival package. Survival object is basically a compiled version of the serial time and status. Step 4: The next step is to fit the kaplan-Meier curves. For doing this we need to fit the survival function with the survival object and the group of interest. This fitting can be done using the survfit function of the survminer library.
The survival object created in the previous step is given as a function of the group we have considered for the analysis. The table below is the table output of the survival analysis. Step 5: After the above step it is now time to plot the KM curve. This argument is very useful, because it plots the p-value of a log rank test as well, which will help us to get an idea if the groups are significantly different or not. In table 2, it can be seen that the last subject of the female group has no cumulative probability of survival assigned to it, and there is a sudden drop in the probability for the third subject.
Whereas in the other group, the last subject has a probability associated with it and the fall in probability is little lesser than the former group.
It is because in the female group there is a subject that got censored in the middle after the second event and hence there is no subject left at the end to calculate the probability scores. It is because of that the probability has fallen steeply after the second event. In the case of the male group, the subject that got censored is only at the end, and hence the probability will not approach zero.
I know this is a little confusing, but worry not we will get it cleared in the coming pages. Look at the KM curve in the figure. The survival duration of a subject is represented by the length of the horizontal lines along the X-axis of serial times. The occurrence of the event terminates the interval. The vertical lines are the event of interest happening, and the vertical distances between horizontals are important because they illustrate the change in the cumulative probability of surviving a given time as seen in the Y-axis.
The steepness of the curve is determined by the survival durations. Looking at the censored objects, the one subject that censored in group female materially reduced the cumulative survival between the intervals. Whereas, the terminally censored subject in the male group did not change the survival probability and the interval was not terminated by an event.
The table above shows what happens behind the production of the KM curve. When the above table is cross-referenced with the KM curve, it is evident that intervals and the attendant probabilities are only constructed for events of interest and not for censored subjects.
Because an event ends one interval and begins another interval, there should be more intervals than events. The table explains the way the curves end. In group male, the curve ends without creating another interval below. The cumulative probability of surviving this long is determined by the last horizontal, sixth interval and is 0.
In the other group, the curve drops to zero after the fifth interval to cause the sixth interval horizontal to be on the X-axis. Looking at the probabilities of survival, it could be a little confusing that there are two probabilities 1. Cumulative probability 2. Interval probability. The cumulative probability defines the probability at the beginning and throughout the interval. This is graphed along the Y-axis of the curve.
The interval survival rate defines the probability of surviving past the interval. Censoring affects survival rates. Censored observations that coincide with an event are usually considered to fall immediately after the event.
0コメント