This dataset, surgerydat_df, is a data frame containing simulated data of surgery procedures performed at multiple hospitals. It includes information on patients, their survival times, and hospital-specific risk characteristics.
Usage
data(surgerydat_df)
Format
A data frame with 32,529 observations and 9 variables:
- entrytime
Numeric vector indicating the patient’s entry time into the study (in days)
- survtime
Numeric vector indicating survival time (in days)
- censorid
Numeric indicator of censoring status
- unit
Numeric vector identifying the hospital unit (1–45)
- exptheta
Numeric vector indicating the true failure rate of the hospital
- psival
Numeric vector indicating the hospital’s patient arrival rate (\(\psi\))
- age
Numeric vector indicating the patient’s age (in years)
- sex
Factor with 2 levels indicating patient sex
- BMI
Numeric vector indicating the patient’s body mass index
Details
The dataset comprises data from 45 simulated hospitals with patient arrivals occurring within the first 400 days after the start of the study. Patient survival times were determined using a risk-adjusted Cox proportional hazards model with coefficients: age = 0.003, BMI = 0.02, and sexmale = 0.2, along with an exponential baseline hazard rate \(h_0(t, \lambda = 0.01) e^\mu\). Hospital-specific hazard rate increases were sampled from a normal distribution:
$$\theta \sim N(\log(1), sd = 0.4)$$
This means that the average failure rate of hospitals in the dataset is the baseline
(\(\theta = 0\)), with some hospitals experiencing higher or lower rates. The true
failure rate is given in the variable exptheta
. Patient arrival rates
(\(\psi\)) differ across hospitals:
Hospitals 1–5 & 16–20: 0.5 patients per day (small hospitals)
Hospitals 6–10 & 21–25: 1 patient per day (medium hospitals)
Hospitals 11–15 & 26–30: 1.5 patients per day (large hospitals)
The dataset name has been kept as 'surgerydat_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the healthmotionR package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.