Simulated Surgery Procedures Data — surgerydat

This dataset, surgerydat_df, is a data frame containing simulated data of surgery procedures performed at multiple hospitals. It includes information on patients, their survival times, and hospital-specific risk characteristics.

Usage

data(surgerydat_df)

Format

A data frame with 32,529 observations and 9 variables:

entrytime: Numeric vector indicating the patient’s entry time into the study (in days)
survtime: Numeric vector indicating survival time (in days)
censorid: Numeric indicator of censoring status
unit: Numeric vector identifying the hospital unit (1–45)
exptheta: Numeric vector indicating the true failure rate of the hospital
psival: Numeric vector indicating the hospital’s patient arrival rate ($\psi$)
age: Numeric vector indicating the patient’s age (in years)
sex: Factor with 2 levels indicating patient sex
BMI: Numeric vector indicating the patient’s body mass index

Source

Data taken from the success package version 1.1.1.

Details

The dataset comprises data from 45 simulated hospitals with patient arrivals occurring within the first 400 days after the start of the study. Patient survival times were determined using a risk-adjusted Cox proportional hazards model with coefficients: age = 0.003, BMI = 0.02, and sexmale = 0.2, along with an exponential baseline hazard rate $h_0(t, \lambda = 0.01) e^\mu$. Hospital-specific hazard rate increases were sampled from a normal distribution:

$$\theta \sim N(\log(1), sd = 0.4)$$

This means that the average failure rate of hospitals in the dataset is the baseline ($\theta = 0$), with some hospitals experiencing higher or lower rates. The true failure rate is given in the variable exptheta. Patient arrival rates ($\psi$) differ across hospitals:

Hospitals 1–5 & 16–20: 0.5 patients per day (small hospitals)
Hospitals 6–10 & 21–25: 1 patient per day (medium hospitals)
Hospitals 11–15 & 26–30: 1.5 patients per day (large hospitals)

The dataset name has been kept as 'surgerydat_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the healthmotionR package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.