^{1}

^{2}

^{3}

^{1}

^{2}

^{3}

When randomised controlled trials are not feasible, researchers often employ observational study designs to evaluate the impact of an intervention. Change is usually investigated using statistical analysis that compares preintervention to postintervention data. Typically, statistical methods range from simple group comparisons that ignore temporal trends to more sophisticated interrupted time-series (ITS) analyses [

Group comparison is generally considered unreliable as it can be influenced by secular trends which are often too subtle to detect by data inspection alone [

When the outcome under consideration is a binary event, modelling of the time-series usually involves logistic (logarithm of the odds) regression to ensure that the parameters of the model are mathematically sound. Linear regression of a binary variable may result in predicted probabilities greater than 1 or less than 0. Logistic regression avoids this by finding the logarithm of the odds of the binary event (logit). Despite being mathematically sound, any change in the logit represents a change in the

The aim of this study was to attempt to rectify these issues by proposing a novel modelling and model-fitting method which describes change patterns in the time-series of a random binary event without bias to the point of the intervention. The proposed method uses piecewise linear sections and finds the best combination of these sections using a systematic procedure that addresses three important limitations of commonly used change detection techniques:

Enforcing a specific model to fit the data: an ideal modelling technique must allow multiple models to be tested to determine which is the best description of the time-series.

Modelling the logarithm of the odds of a binary variable: directly modelling the binary variable rather than the logarithm of its odds allows for any temporal variation/change to be expressed as that of the probability of the random event.

Bias to the point of the studied intervention: an ideal modelling technique will allow an inflection point between two or more temporal segments at time-points separate to the intervention. This is a significant limitation of the classic ITS technique [

The proposed modelling method is applied to a retrospective study investigating the trends in patient mortality following fragility neck of femur fractures at a Level 1 Major Trauma Centre over a period of six years.

We have previously published a modelling technique employing segmented least-squares linear regression to fit a set of progressively more complex models to the time-series of outcome measures in a large retrospective study [

Adjoining linear segments to model the time-dependence of a variable are known as “splines” and have received considerable attention in scientific literature [

In this study, we utilise the same set of progressively more complex segmented linear regression models but employ maximum likelihood regression rather than least-squares regression and discuss its advantages when modelling binary variables.

We model the binary dependent variable _{i}, _{i}) of the binary data,

“Plateau”: a simple average value

This simplest of models assumes that the probability of the event remains unchanged over the study period and uses the average value of the time-series of events to represent its probability.

“Line”: a single straight line of non-zero gradient

This model determines two parameters (

“Line-plateau/plateau-line”: a straight line joined to a plateau or a plateau joined to a straight line.

Line-plateau:

where

Plateau-line:

This model joins a linear section to a plateau at a knot to model the temporal variation of probability. The plateau can precede or follow the linear section. The model is described by three parameters, two of which are the parameters of the straight line (_{j} that corresponds to the knot. Constraints must be placed to ensure that the model yields values

“Line-line”: a straight line joined to another straight line

This model fits two straight line sections which are joined at a knot to model probability. Both sections can have non-zero slopes and are joined at the time instant _{j}, thus requiring a total of four parameters to describe it. Similarly, constraints must be placed to ensure that values

The parameters (intercepts, slopes, and plateaus) of the proposed set of models are derived to maximise the likelihood of the experimental data and, as such, their values are maximum likelihood estimates (MLE) [

More specifically, we seek values of the parameters _{1}, _{2}, ^{th} binary event and _{i}.

Practically, we minimise the logarithm of the likelihood (equation (

We used the

Finding the MLEs of the parameters in models (iii) and (iv) involves selecting the knot that corresponds to the largest among the MLEs of the parameters for

A detailed description of the constrained optimisation procedure that we used, or its background, is beyond the scope of this work as these are well documented and featured in most public domain programming languages [

Once all four models are fitted, the best model must be selected to represent the best description of how the time-series changes over the period studied.

More complex models (those with more parameters and/or more segments) expectedly fit the data better than those with fewer parameters, yielding larger likelihood values [_{q} is the number of parameters used by the ^{th} model, and ^{th} model. It is readily deduced that _{(i)} = 1 (plateau), _{(ii)} = 2 (single line), _{(iii)} = 3 (line-plateau or plateau–line), and _{(iv)} = 4 (twin line). The AIC is a compromise between goodness of fit and simplicity and is a widely accepted tool in model selection [

Although the model with the smallest AIC prevails, the other models need not be discarded. They are compared to the best-fitting model by noting their relative likelihood _{(q)} which is obtained as per Keith and Allison [

The relative likelihood _{(q)} < 0.05 for all alternative models (

Finally, the best model is used to describe the time-series whereby it is possible to detect change and reveal secular trends.

This modelling technique was applied to a time-series of data from a Level 1 Major Trauma Centre in the United Kingdom. As part of a retrospective study, patient survival data were collected from April 2011 to September 2016 for patients sustaining fragility fractures of the proximal femur. In July 2015, on the 1551^{st} day (2179^{th} fracture) of the study, a dedicated hip fracture unit (HFU) was introduced within the trust. Results from this study, including an evaluation on the effectiveness of the introduction of the HFU using segmented least-squares linear regression (without the adaptation for binary variables), have been previously published [

We applied the modelling technique to three time-series: 30-day, 120-day, and 365-day patient mortality. Specifically, our data consisted of two sets of 2851 binary values for 30-day and 120-day mortality and a set of 2494 binary values for 365-day mortality over a period of 1995 days (365-day mortality was monitored up to 12^{th} January 2016 resulting in fewer postintervention data points).

We used basic statistical tests to compare patient mortality before the intervention (pre-HFU) to that following the intervention (post-HFU). Since the pre-HFU and post-HFU mortality data are unpaired and categorical, we used Fischer’s exact test for this purpose. Subsequently, we compared the conclusions drawn from these basic statistical tests with those drawn using our proposed modelling technique, to assess the potential benefits of the technique.

Scatter diagrams of binary event series are much less informative when compared to scatter diagrams of continuous variables when investigating time-dependent trends. Expectedly, with data values being grouped at

Scatter diagram of the time-series of 30-day mortality. Dashed vertical line is the onset of the HFU. Mortality data values are shown at either

By fitting the set of four piecewise linear models to each time-series, it is possible to discern trends. These are shown in Figures

Modelling of the time-series of 30-day mortality. Solid red line is the best model. Solid black lines are the other models. Dashed vertical line is the onset of the HFU. Data values are not shown as they are points at either

Modelling of the time-series of 120-day mortality. Solid red line is the best model. Solid black lines are the other models. Dashed vertical line is the onset of the HFU. Data values are not shown as they are points at either

Modelling of the time-series of 365-day mortality. Solid red line is the best model. Solid black lines are the other models. Dashed vertical line is the onset of the HFU. Data values are not shown as they are points at either

Using Fischer’s exact test, we found a significant reduction in average 30-day mortality from 5.47% pre-HFU to 3.13% post-HFU (

The best model to describe the time-series is the line-line (iv) model (Figure

Using Fischer’s exact test, we found a non-significant drop in 120-day mortality from 12.68% pre-HFU to 10.13% post-HFU (

The best model to describe the time-series is the plateau-line (iii) model. Plateau is 0.013 as likely, line is 0.2144 as likely, and line-line is 0.7011 as likely.

Using Fischer’s exact test, we found a small and non-significant reduction in 365-day mortality from 21.46% pre-HFU to 20.57% post-HFU (

The best model to describe the time-series is the line (ii) model. Plateau is 0.185 as likely, line-plateau is 0.6269 as likely, and line-line is 0.7098 as likely.

Using a novel technique for modelling binary variables in retrospective time-series, this study demonstrates the advantage of piecewise linear sections in conveying meaningful information. We employed the presented technique to model change in hip fracture patient outcomes to evaluate the effectiveness of introducing a dedicated HFU.

Following pre- and post-intervention group comparison, we inferred that there was a significant reduction in average 30-day mortality from 5.47% pre-HFU to 3.13% post-HFU (^{th} day reaching a near-zero value at the end of the study period. The models help explain that the difference found by group comparison via statistical testing was not an immediate consequence of the HFU but the result of a gradual decline which was nonetheless accelerated about a year after the HFU. Bearing in mind that models are not necessarily exclusive, the single line model is a close second-best model. It can therefore be concluded that 30-day mortality did not stay unchanged (

Pre- and postintervention group comparison inferred that there was a nonsignificant drop in 120-day mortality from 12.68% pre-HFU to 10.13% post-HFU (

Pre- and postintervention group comparison inferred that there was a small and nonsignificant reduction of average 365-day mortality from 21.46% pre-HFU to 20.57% post-HFU (

The current study demonstrates how temporal analysis using the proposed modelling method can elucidate the outcomes of group comparisons which are known to be unreliable especially when the data spans a long period. Importantly, by modelling the entire time-series without bias toward the point of intervention, the proposed modelling method offers an unbiased picture of the temporal evolution of the outcome measures and provides a valuable tool in the retrospective assessment of interventions. It allows delayed or anticipatory effects that may be connected to the intervention to be revealed without extra computation [

We have previously published the use of segmented linear regression and demonstrated its application to hip fracture patient outcomes [

In this study, we developed the segmented modelling technique further and tailored it to binary variables by using MLE. This exhibits the following advantages:

Using linear regression, it is possible that the best-fit lines will predict unrealistic values of greater than 1 or smaller than 0. By using MLE, we avoid this possibility.

When using F-tests it is necessary to ensure normality of residuals, though this is impossible when studying binary variables. By using AIC instead of F-tests, we overcome this hurdle.

Previously, we employed F-tests to determine the best model that was significantly better than any other model. However, it is pragmatic to conclude that more than one model may be a good descriptor of the time-series. The technique presented in this paper allows us to exclude unlikely models (

The set of proposed models are limited to two adjoining linear segments and as such may be unable to track more complex change over a long time-period. To address this, the method could be extended to include more adjoining linear segments, but this needs to be undertaken with caution to prevent overcomplicating a simple and meaningful approach to modelling trends.

Second, the method does not always yield certain “yes/no” answers for determining the effectiveness of an intervention; more than one model can be deemed “acceptable” (

Finally, application of the method requires some dedicated programming as most statistical packages do not allow users to fit more than one linear section.

The proposed sequence of models ranges from a single plateau to more complex forms, including a twin line that can track more complex temporal change. The method can be extended to include higher order models with three of more segments to accommodate yet more complex change. Moreover, disjointed segments can be allowed to model sudden change [

The proposed segmented linear regression modelling technique can be used to detect trends in time-series of binary variables in retrospective studies. This can be used to evaluate the effectiveness of healthcare interventions and to highlight secular trends.

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that they have no conflicts of interest.

We gratefully acknowledge the support of the SPRINT charity who provided funding towards the publication cost of the article.