Department of Public and Environmental Health, Hawassa University, Ethiopia

Centre for International Health, University of Bergen, Norway

Abstract

Background

Malaria transmission is complex and is believed to be associated with local climate changes. However, simple attempts to extrapolate malaria incidence rates from averaged regional meteorological conditions have proven unsuccessful. Therefore, the objective of this study was to determine if variations in specific meteorological factors are able to consistently predict

Methods

Retrospective data from 42 locations were collected including

Results

Of 35 models, five were discarded because of the significant value of Ljung-Box Q statistics. Past

Conclusions

This study describes

Background

Over 100 million people worldwide are affected by malaria and

The possible association of changes in temperatures to variations in malaria epidemiology is merited by the well-defined biological effects on life-cycle stages of the

Many researchers, therefore, have proposed developing improved tools to forecast malaria epidemics by using variations in regional temperatures. These efforts have resulted in the medical literature using vastly inconsistent terminology to describe malaria risks, and to distinguish between long-term forecasts, early warning and early detection of epidemics.

Long-term epidemic forecasting is usually based on climate forecasting, and relies on such datasets as the El Niño Southern Oscillation indices to predict epidemic risk months in advance over large geographical areas. Such a forecast allows time for the population to prepare for a possible epidemic in the upcoming malaria season.

Malaria epidemic early warning is based on surveying transmission risks to predict timing of an increase based on abnormal rainfall or temperatures. Often, such risks are influenced by population vulnerability, such as history of low rates of malaria transmission. Such predictions of malaria epidemics can provide lead times of weeks to months.

The long-term and early warning approaches should, however, be distinguished from epidemic early detection, which involves noting the beginning of an unusual epidemic. As such, this surveillance approach is limited in that is offers little lead time (days to weeks) for preparation and implementation of preventive measures. When used in an effective manner, it is able to prevent sickness and death.

The aim of this study was to examine if the spatio-temporal distribution of surface temperature and rainfall are useful factors to predict changes in malaria incidence, as a malaria epidemic early warning strategy. This evaluation was based on an assumption that the link between climate and occurrence of malaria is constant and similar for different regional settings.

Incorporating prediction and forecasting approaches, however, calls for sound understanding of the complex factors involved in malaria transmission. It has been suggested that the major driving force of malaria transmission is climate

Nonetheless, the impact of climate on malaria transmission has yet to be firmly established. Thus, there exists a need to consider local variations in climates in order to fully understand the relationship between climate and malaria transmission

In addition to the incorporation of climatic causes, some researchers have suggested building models that consider non-climatic factors such as land use, population movement, immunity, topography, parasite genotypes, vector composition, drug resistance, vector control measures and availability of healthcare services

Methods

Data inputs and inclusion criteria

A total of 42 locations in the southern region of Ethiopia were examined for data on varying serial length of

Microscopically-confirmed

Coordinates of the malaria affected locations of interest in south Ethiopia

**Coordinates of the malaria affected locations of interest in south Ethiopia**. A map of Ethiopia has been sub-divided into administrative regions that include the Southern Nations and Nationalities People's Region where we conducted this study.

Data source

A health centre provides basic curative and preventive health services for a population of about 25,000 people. Each health centre is staffed by nurses and health officers, and by trained laboratory technicians. The institutions routinely performed thick and thin blood film examinations for malaria parasites. Rapid diagnostic tests for malaria were not used. Each month, all health institutions reported suspected malaria cases and confirmed

Microscopically-confirmed

The meteorological data used for analyses were obtained from the Southern Branch office of the National Meteorological Agency of Ethiopia. This agency operates over 200 meteorological stations, with records spanning 15 to over 50 years. From the year 1970 onward, the proportion of missing data is low

Missing data handling

The Box-Jenkins method

Assumptions

1. The underlying data of malaria transmission was assumed to be stochastic, whereby local variations and other unmeasured causes play important roles. Others have reported local variations in the association between climate and malaria incidence

2. The quality of data obtained through routine reporting in developing countries may be questionable, mainly because of under-reporting. However, the data sets were assumed to hold the basic elements of malaria transmission like trend, seasonality or monthly variations, which could suffice for modelling exercise

3. The meteorology station correctly captures climate data within a 10 km radius [Southern Branch office of National Meteorological Agency of Ethiopia, personal communication], and this matches to the service area coverage of the corresponding health centre. This assumption does not apply for the two district hospitals since the service area coverage of a district hospital is beyond the 10 km radius

4. In Ethiopia, malaria transmission is largely unstable

Scope

This paper sought to unveil the local variations in the predictive power of lagged effects of the number of past

Data processing and analysis

SPSS version 17.0 Expert Modeler (Chicago, IL, USA) was used to automatically determine the best-fitting model. Malaria incidence was the dependent variable, and all available climatic variables were fed into the model as predictors. The Expert Modeler keeps the predictor series in the model only if it is significant. The resultant model was checked for consistency by inserting the model criteria set and significant predictor identified by the Expert Modeler. To do this, custom ARIMA models were used and several logical combinations of criteria to look for better models were considered. The best-fitting model built by the Expert Modeler was subsequently used. For the locations of Cheleklektu and Buee, a constant value of 1 was added to the dependent series to enable log transformation. Automatic detection of outliers was made and the outliers were modelled accordingly, thus trimming was not performed. The same procedure was followed for all data sets.

Goodness of fit

The R-squared measurement was used as an indicator of goodness of fit for the models if there was no differencing. The R-squared coefficient of determination suggests the proportion of variance of the dependent variable explained by the model. The stationary R-squared was used instead whenever the Expert Modeler considered differencing. The stationary R-squared was used to capture trend or seasonality, which is the basis for differencing. The stationary R-squared and the ordinary R-squared values were the same when there was no data transformation to any form. It is noted that if the series was log transformed without differencing, stationary R-squared would overestimate the ordinary R-squared and underestimate for the square root transformation.

Diagnostic statistics

The Ljung-Box Q statistic, also known as the modified Box-Pierce statistic, was used to provide an indication of whether the model was correctly specified. A significant value less than 0.05 was considered to acknowledge the presence of structure in the observed series which was not accounted for by the model; therefore, we ignored the model if it had significant value.

The residual autocorrelation function was expected to agree with the white noise assumption. White noise, the most common model of noise in time series analysis, is a stationary time series or a stationary random process with zero autocorrelation. In other words, in white noise _{1}) and _{2}) taken at different moments _{1 }and _{2 }of time are not correlated; that is, the correlation coefficient _{1}), _{2})) is equal to null. The SPSS 17.0 forecasting menu provides autocorrelations that provides

The model

Since meteorological variables were used as predictors, addition of the Transfer Function (TF) model to the basic univariate ARIMA model was considered. Whenever the Expert Modeler dropped the predictor series, the model was found to take on the univariate ARIMA form.

ARIMA orders

In ARIMA (

Transfer function orders

The seasonal orders were built using the same strategy as that for the ARIMA orders.

Delay

Setting a delay is known to cause the predictor's influence to be delayed by the number of intervals specified. For instance, a delay of 4 implies the value of the predictor at time

See Additional file

**Details of the model**. Formulae and main features of Transfer Function and univariate ARIMA models.

Click here for file

Data transformation

The ARIMA model is an analysis in the temporal domain applied to stationary data series. Thus, the presence of outliers, random walk, drift, trend, or changing variance in the series might have resulted in nonstationarity. And the stationarity of the series could be achieved when both the mean and the variance remained constant over time. For this, variance stabilizing transformations, like natural log (LN) and square root (SQR), and detrending using differencing were used when necessary. In addition, the Expert Modeler was set to detect outliers (if any) and model them automatically.

Results

Model inclusion and exclusion

Data from 35 locations were analysed using Time Series modelling. Models of five locations were ignored because of the significant results of the diagnostic statistics, the Ljung-Box Q, including models built for the two hospital locations.

Data description

We analysed 210 659 microscopically-confirmed

The pattern of meteorological variables and

**Sequence charts**. Sequence charts for each of the 35 locations examined, mean meteorological conditions of 23 and 14 locations. The separate sheets in the Excel file are labeled by the name of the locations corresponding to the data. Data displayed include altitude, available meteorological variable(s) and

Click here for file

Past

Of 30 models, 21 were based on lagged effect of incidence data alone (17 locations) or coupled with meteorological predictors (4 locations). Among those 21 models, 16 had a non-seasonal AR order of 1 (13 locations) or 2 (3 locations). Three locations had both seasonal and non-seasonal AR orders of 1. Two locations had only a seasonal AR order of 1. Non-seasonal and seasonal first order differencing was used for five and three locations, respectively. Five locations had a non-seasonal MA order of range 1-6, and there was no seasonal MA order. Seasonal ARIMA orders were specified for six locations of altitude 1742 m or higher, constituting one-third of the locations above this altitude (Additional file

**Tables S1 to S4: Time series models to predict Plasmodium falciparum malaria incidence at different locations in south Ethiopia**. The 35 locations were divided among four tables for ease of presentation. All tables included data on location, altitude, available data used, model structure, goodness of fit, significant variables and model description, serial length and average incidence per month.

Click here for file

Meteorological data

Rainfall data were available from all locations, however, it was found to be a significant predictor for only four of the locations (altitude: 1182, 1431, 1618 and 2054 m). A delay of 2 months was significant for 2 of these. A delay of 2 months with numerator TF order of 0 refers to a 2 months lagged effect (Additional file ^{th }and 3^{rd }lagged months (delay 2) from the series mean. One model specified numerator TF order of 2, 1 and 0 without setting a delay; that is, rainfall data of the last two consecutive months coupled with the current one were used to predict incidence (Additional file

Minimum and maximum temperatures were available for 17 of the locations. Minimum temperature was found to be a significant predictor in five locations. Delays of 2, 4 and 5 months with numerator TF order 0 predicted incidence in three locations (altitudes: 2582, 1220 and 2331 m). Of those, first order non-seasonal differencing was required for the location with the lowest altitude (1220 m). Incidence (two locations) and maximum temperature (one location) were included in the models (Additional file

Maximum temperature at a lag of 4 months coupled with the deviations of a lag of 5 and 6 months from the series mean predicted incidence at an altitude of 1221 m (Additional file

Only three locations had data available on relative humidity, but none proved significant.

Goodness of fit of models

Except for one model which produced a negative value, the range of the R-squared was 16-97%. Of 30 models, 20 had values greater than 50% and seven had values exceeding 85%. The range for models with any of the seasonal ARIMA orders was 60-97%. The models were reasonably good for explaining the total variations of the data sets. According to the Spearman's rho correlation coefficient, there was no significant correlation between the R-squared values and the serial length (r = 0.29) or the average incidence per month (r = -0.01).

Model similarities and variations

The model predicted incidence fairly well by its lagged values in most locations. Models of seven locations were similar with ARIMA (1, 0, 0) (0, 0, 0) with no transformation. Nevertheless, the other incidence models applied different forms of transformation (LN, SQR or differencing) or incorporated different meteorological variables. Some models did not contain incidence at any AR or MA orders. Meanwhile, meteorological variables were significant predictors for only seven of the locations without any apparent reiteration in line with the altitude. For two of the data sets, the Expert Modeler revealed the absence of a significant predictor with reasonable goodness of fit statistics. And for five of the data sets, the model did not comply with the criterion of diagnostic statistics. In summary, the variations outweighed the similarities of the models made for different locations for the given incidence and meteorological data.

Mean meteorological conditions

It was not possible to engage all (thirty) data sets to evaluate the utility of taking mean meteorological conditions for prediction because aggregates of

Discussion

Statistical modelling is used for understanding and prediction of multifactorial based events; as such, reproducibility, biological plausibility and robustness govern the applicability and effectiveness of each resultant model. Malaria transmission is one such complex event as many underlying causes have been associated with its frequency and duration, including regional factors. The Malaria Early Warning System (MEWS) has been established to enable reliable predictions of

The Ljung-Box Q provided the diagnostic statistics to check the presence of structure in the observed series which was not accounted for by the model. Five models were ignored with significant values according to this diagnostic statistic, but the underlying reasons were not immediately clear. It was likely that the only two data sets that came from hospitals might not have properly coincided with the station-specific climate data since the catchment area of those hospitals was wider than that of the meteorological stations. Thus, it remains to be seen whether linking hospital data with wider catchment to station-specific meteorological data would benefit evaluations of the proposed meteorology-malaria link.

Malaria transmission is known to be associated with gametocyte prevalence in a population

As has been shown by others

Considering mean conditions or aggregated data might disguise real effects

Conclusions

This study shows that models of climate-malaria link varied from place to place, and one model could not fit all locations. In several locations, it was found that past

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

EL conceived the study, collated, analysed and interpreted the data, and prepared the draft manuscript. BL conceived the study, guided the analysis, interpreted the data and helped to draft the manuscript. Both authors have read and approved the submitted version of the manuscript.

Acknowledgements

We thank the Southern Nations and Nationalities Peoples Regional Health Bureau for kindly providing the retrospective malaria morbidity data.