- R - Data Reshaping
- R - Packages
- R - Data Frames
- R - Factors
- R - Arrays
- R - Matrices
- R - Lists
- R - Vectors
- R - Strings
- R - Functions
- R - Loops
- R - Decision Making
- R - Operators
- R - Variables
- R - Data Types
- R - Basic Syntax
- R - Environment Setup
- R - Overview
- R - Home
R Data Interfaces
- R - Database
- R - Web Data
- R - JSON Files
- R - XML Files
- R - Binary Files
- R - Excel Files
- R - CSV Files
R Charts & Graphs
R Statistics Examples
- R - Chi Square Tests
- R - Survival Analysis
- R - Random Forest
- R - Decision Tree
- R - Nonlinear Least Square
- R - Time Series Analysis
- R - Analysis of Covariance
- R - Poisson Regression
- R - Binomial Distribution
- R - Normal Distribution
- R - Logistic Regression
- R - Multiple Regression
- R - Linear Regression
- R - Mean, Median & Mode
R Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
R - Survival Analysis
Survival analysis deals with predicting the time when a specific event is going to occur. It is also known as failure time analysis or analysis of time to death. For example predicting the number of days a person with cancer will survive or predicting the time when a mechanical system is going to fail.
The R package named survival is used to carry out survival analysis. This package contains the function Surv() which takes the input data as a R formula and creates a survival object among the chosen variables for analysis. Then we use the function survfit() to create a plot for the analysis.
Install Package
install.packages("survival")
Syntax
The basic syntax for creating survival analysis in R is −
Surv(time,event) survfit(formula)
Following is the description of the parameters used −
time is the follow up time until the event occurs.
event indicates the status of occurrence of the expected event.
formula is the relationship between the predictor variables.
Example
We will consider the data set named "pbc" present in the survival packages installed above. It describes the survival data points about people affected with primary bipary cirrhosis (PBC) of the pver. Among the many columns present in the data set we are primarily concerned with the fields "time" and "status". Time represents the number of days between registration of the patient and earper of the event between the patient receiving a pver transplant or death of the patient.
# Load the pbrary. pbrary("survival") # Print first few rows. print(head(pbc))
When we execute the above code, it produces the following result and chart −
id time status trt age sex ascites hepato spiders edema bip chol 1 1 400 2 1 58.76523 f 1 1 1 1.0 14.5 261 2 2 4500 0 1 56.44627 f 0 1 1 0.0 1.1 302 3 3 1012 2 1 70.07255 m 0 0 0 0.5 1.4 176 4 4 1925 2 1 54.74059 f 0 1 1 0.5 1.8 244 5 5 1504 1 2 38.10541 f 0 1 1 0.0 3.4 279 6 6 2503 2 2 66.25873 f 0 1 0 0.0 0.8 248 albumin copper alk.phos ast trig platelet protime stage 1 2.60 156 1718.0 137.95 172 190 12.2 4 2 4.14 54 7394.8 113.52 88 221 10.6 3 3 3.48 210 516.0 96.10 55 151 12.0 4 4 2.54 64 6121.8 60.63 92 183 10.3 4 5 3.53 143 671.0 113.15 72 136 10.9 3 6 3.98 50 944.0 93.00 63 NA 11.0 3
From the above data we are considering time and status for our analysis.
Applying Surv() and survfit() Function
Now we proceed to apply the Surv() function to the above data set and create a plot that will show the trend.
# Load the pbrary. pbrary("survival") # Create the survival object. survfit(Surv(pbc$time,pbc$status == 2)~1) # Give the chart file a name. png(file = "survival.png") # Plot the graph. plot(survfit(Surv(pbc$time,pbc$status == 2)~1)) # Save the file. dev.off()
When we execute the above code, it produces the following result and chart −
Call: survfit(formula = Surv(pbc$time, pbc$status == 2) ~ 1) n events median 0.95LCL 0.95UCL 418 161 3395 3090 3853
The trend in the above graph helps us predicting the probabipty of survival at the end of a certain number of days.
Advertisements