Objective and main findings of the Marie Curie IEF SMPHS Project


The statistical monitoring of public-health or biological data over time, is a scientific area closely related to the domain of statistical process control (SPC); the primary tool of SPC is the control chart. In case of public-health surveillance (or in the monitoring of biological processes), the data are often discrete (count data), assuming to follow a specific, discrete, probability distribution. Usually, in these kind of data an excessive number of zeros exists and, consequently, the application of standard attributes control charts is not valid. Moreover, the distribution of the data cannot be well approximated by the Normal distribution. Thus, new improved schemes are needed in order to deal effectively with the health-related and/or biological outcomes, without leading to incorrect assessments. The general objective of the two-year SMPHS project was (i) to develop improved statistical monitoring techniques that can be effectively used in the detection of changes in health-related and biological processes and (ii) to assess the effect of several violations of the main assumptions on the performance of the proposed schemes, providing effective solutions as well.

During the first year of the SMPHS project, one-sided CUSUM-type control charts for monitoring zero-inflated Binomial processes (ZIB-CUSUM charts) were proposed and studied. Zero-inflated distributions are discrete probability models that take into account the excess of zero values in the data. The proposed charts, are suitable for the detection of assignable causes that lead to process deterioration (upper-sided ZIB-CUSUM chart) or to process improvement (lower-sided ZIB-CUSUM chart). A general model for the statistical design of the proposed control charts was given while the exact performance of the proposed schemes, was evaluated by means of an appropriate Markov chain technique. Programs in Scilab (http://www.scicoslab.org/) and in R (http://www.r-project.org/) that implement the statistical design and evaluate the performance of the one-sided ZIB-CUSUM charts have been developed. Extensive numerical comparisons with other competitive control charts for monitoring zero-inflated Binomial (ZIB) processes revealed that the ZIB-CUSUM is the chart with the best performance in the detection of a specific shift in process parameters, compared to the ZIB-Shewhart and the ZIB-EWMA control charts.

Next, the performance of the upper-sided Shewhart-type charts for zero-inflated Poisson (ZIP) and zero-inflated Binomial (ZIB) processes was studied in the case of estimated parameters and compared to the known parameters case. Substantial differences in the performance of the schemes between these two cases were noticed and, consequently, these charts must not be used when there is a violation of the assumption of known parameters. In order to assist practitioners, practical guidelines concerning the size of the preliminary samples that has to be collected for estimating process parameters and the value of the design parameter for each chart, were provided. Programs written in R that evaluate the performance of the ZIP- and ZIB-Shewhart charts and implement their statistical design in the estimated parameters case, are also available.

Apart from CUSUM-type control charts, a new EWMA-type scheme was proposed and studied. The proposed scheme is suitable for count data and its main characteristic is that all of its design parameters are integer-valued. Consequently, its theoretical properties can be exactly evaluated via an appropriated Markov chain technique and any kind of approximation is not necessary. Even though it can be used for monitoring any kind of attributes data, an extensive numerical study was conducted in the case of Poisson observations, a very common assumption in practice. The results showed that this new scheme outperforms, for small and moderate shifts in the rate of the Poisson distribution, several competitive schemes. The improvement is clear in the case of downward shifts; this means that it can be used for detecting process improvement, usually after applying a specific corrective action e.g., vaccination to an infected population, replacement of the equipment in a manufacturing environment etc. Codes in R are available for evaluating the performance, as well as for the statistical design of the proposed integer-valued EWMA-type chart (abbr. IN-EWMA).

In the second year of the SMPHS project, the aim was to examine the robustness of the improved schemes (mainly, the CUSUM- and EWMA-type ones), under several non-standard situations. First, a more general inflated probability model was defined, as an appropriate parametric model for non-typical count data. Hence, the zero-inflated probability models were generalized in order to take into account a possible excess in zero and non-zero values, as well. Starting from the standard Poisson model, two models were studied, a general one (r-First Inflated Poisson distribution, FIPr) and a special case of it (Geometrically Inflated Poisson distribution of order r, GIPr). For each model, three estimation procedures were provided (moment estimation, maximum likelihood estimation, EM algorithm).

The usefulness of the FIPr and GIPr models, was demonstrated by fitting them to real data. The analysis showed that these models are useful in applied research and in the modeling of various types of count data, with different dispersion levels (e.g., over- and underdispersed). Moreover, the GIPr model has one parameter (r) that controls the number of inflated values, a parameter () that controls the inflation on the first r+1 values {0,1,...,r} and a third parameter , which coincides with the rate parameter of the ordinary Poisson distribution. Even though, only the case of a general inflated Poisson distribution was considered, the proposed model serves as a general mechanism for inflating the values of any other discrete distribution (e.g., binomial, negative binomial, etc) and, thus, offering a broad class of parametric models that can be used for describing various types of count data.

A more general model was available, so the next step in the research, was to develop appropriate control charts for monitoring FIPr and GIPr processes, since the ordinary schemes, either for the zero-inflated processes or not, do not take into account the overdispersion in the data and the inflation mechanism. Apart from the standard one-sided Shewhart-type charts, control charts with runs rules and two CUSUM-type schemes were developed. All these schemes can be used for monitoring a process with an excessive number of zero and non-zero values; they are upper-sided and able to detect increases in the mean number of counts. Numerical comparisons between the various schemes revealed that the upper-sided CUSUM chart that is based on the successive likelihood ratios, has the best performance in the detection of small and moderate shifts in process average. The usefulness and applicability of the techniques was further demonstrated in the monitoring of the monthly number of poliomyelitis cases in the USA, from January 1973 to November 1983 (https://datamarket.com/data/set/22u4/monthly-us-polio-cases).Almost all of the research works related to this part of the project, were developed in collaboration with Assistant Prof. Petros Maravelakis from the University of Piraeus, Piraeus, Greece.

The final part of the SMPHS project, was dedicated to the examination of the robustness of the improved control schemes, when the available observations have a specific correlation structure. For that purpose, we considered four different integer-valued time-series models, two with finite range (first-order integer-valued autoregressive binomial and beta-binomial) and two with infinite range (first-order integer-valued autoregressive zero-inflated Poisson and first-order integer-valued autoregressive conditional heteroskedasticity zero-inflated Poisson). This part of the project was developed in collaboration with Prof. Christian Weiss from the Helmut Schmidt University, Hamburg, Germany. These models are used for describing a count data process with an excessive number of zeros and/or overdispersion. It was not difficult to verify that the ordinary schemes cannot be effectively used, due to an increased false alarm rate. Consequently, proper adjustments were necessary. For that purpose, we modified properly the values of the design parameters for the Shewhart- and CUSUM-type schemes. Practical guidelines were also provided. The effect of the overdisperion and/or zero-inflation on the performance of the schemes was also examined and the results showed that either the Shewhart-type or the CUSUM-type scheme are heavily affected, especially in the case of increased overdispersion and zero-inflation. From a practical point of view, our research showed that practitioners should, first, be very careful in the selection of the appropriate model and then, based on the selected model, with the design of the control scheme they intend to use, in order to avoid excessive false alarm rates and poor performance in the detection of out-of-control situations. Especially, for the selection of the appropriate model, we employed classical model selection criteria, such as the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC) and the Pearson's chi-square goodness-of-fit test.

The usefulness and the practical implementation of the proposed CUSUM schemes (as the most efficient technique) were further demonstrated via two real-data examples. For the zero-inflated correlated Poisson counts, we used the monthly numbers of submissions to animal health laboratories from a region in New Zealand. The data1 refer to the submissions of animals that experienced sudden death, during the period January 2003 to December 2009. Furthermore, the CUSUM charts for monitoring correlated binomial counts with extra-binomial variation, were applied in a data set from the area of economics; the monthly data about the price stability in the Euro area between 2000 and 2013 were used. In 2013, the Euro area consisted of n = 17 European countries and the scope was to monitor the (monthly) number of countries in the group of the 17 that show price stability, i.e., the inflation rate should be below 2%. The data have been published by the European Union's statistical office (Eurostat) and they can be found at http://appsso.eurostat.ec.europa.eu/nui/%20show.do?wai=true&dataset=prc_hicp_manr. Even though this application was not from the area of public-health surveillance, it serves as a general, practical, example for monitoring processes from any other area of applied research with similar characteristics. In both cases, the new, adjusted, control schemes was found to be the best for monitoring such kind of processes, compared to the conventional schemes, since they take into account the excess in zero values and/or the inherent overdispersion in the data.

A table is provided below with the studied techniques (i.e., control charts) and the purpose they have been developed for. More technical details can be found in the corresponding published papers and/or in the corresponding unpublished material (e.g., working papers, conference proceedings, talks/slides). A complete list with all the published and unpublished work during the SMPHS project is provided at the end. Note also that for all the below mentioned techniques, codes in R are available in (https://sites.google.com/site/arakitz/rcodes). These codes can be directly used from the practitioners, for designing the schemes according to their needs as well as for evaluating their performance. Moreover, they can be used for reproducing all the numerical results that appear in the papers and the unpublished reading material.

Techniques

Paper

Unpublished work

CUSUM charts for zero-inflated binomial processes

See [P2]

See [C2]

Shewhart charts for zero-inflated Poisson and zero-inflated binomial processes with estimated parameters

See [P1]

See [CP1]

Integer-valued EWMA-type chart

See [P3]

See [C4]

Control charts with runs rules for general inflated Poisson processes

See [P4, P5]

CUSUM charts for general inflated Poisson processes

Submitted

See [C7]

CUSUM charts for correlated zero-inflated Poisson counts

Submitted

See [C8]

CUSUM charts for correlated binomial counts with overdispersion

Submitted

See [C9]

Summary

The SMPHS project lasted 24 months and during this period, several new improved statistical techniques for process monitoring were developed and studied. Also, a new general inflated probability distribution was proposed, as a possible model for non-typical count data and its theoretical properties were studied. The considered models and techniques can be used for monitoring processes with special characteristics, such as excessive number of zero and non-zero values, correlated observations, unknown parameters. All these features are frequently met in real practical problems. Practitioners from every area of process monitoring (manufacturing, public-health surveillance, financial surveillance, etc) have a collection of tools, accompanied with the necessary technical details and the relevant programming codes, which can be directly modified and used, according to their needs. The multidisciplinary interest on the techniques that have been developed as a part of the project, is further enhanced by the various real applications that arise from different areas of applied research. Finally, in an era where data and information are collected at every moment and several characteristics are monitored for preventing unpleasant situations or for understanding the continuous changes in our world, these simple but efficient techniques, can be used as a part of an integrated monitoring system that would detect accurately, quickly and efficiently undesirable and uncommon situations.

Partners



Grant Agreement number : 328037
Project acronym : SMPHS
Project title : Improved Statistical Monitoring Procedures for Attributes with Applications in Public Health Surveillance
Funding Scheme : FP7-MC-IEF
Project start date : 01/09/2013
Project end date : 31/08/2015

Person in charge of scientific aspects:
Prof. Philippe Castagliola
E-mail: philippe.castagliola@univ-nantes.fr

Researcher:
Dr. Athanasios Rakitzis
E-mail: athanasios.rakitzis@univ-nantes.fr

Laboratory involved : IRCCyN

IRCCyN, ?Institut de Recherche en Communications et Cybernétique de Nantes is a Joint Research Unit of CNRS (Centre National de la Recherche Scientifique), UMR CNRS 6597.

Research at IRCCyN not only aims at producing new knowledge, but is also deeply technologically empowered in the sense that it focuses on the development of tools and methods in order to bring solutions to concrete problems raised by economical and social partners.

Due in part to the collaboration with companies, members of IRCCyN also contribute to patents deposits, and to Start-up or Spin-of creations.

The research leading to these results has received funding from the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme (FP7/2007-2013) under REA grant agreement n° 328037. The content of this webpage reflects only the author's views. The European Union is not liable for any use that may be made of the information contained therein

List of scientific (peer reviewed) publications in connection with the project


Original Publications in International Peer-Reviewed Journals
[P1] Rakitzis, AC, Castagliola P. The effect of parameter estimation on the performance of one-sided Shewhart control charts for zero-inflated processes. Communications in Statistics - Theory and Methods, 2014.

[P2] Rakitzis, AC, Maravelakis, PE, Castagliola, P (2015). CUSUM Control Charts for the Monitoring of Zero-Inflated Binomial Processes. Quality and Reliability Engineering International, DOI: 10.1002/qre.1764.

[P3] Rakitzis, AC, Castagliola, P, Maravelakis, PE (2015). A new memory-type monitoring technique for count data. Computers & Industrial Engineering, 85, 235-247.

[P4] Rakitzis, AC, Castagliola, P, Maravelakis, PE (2015). A two-parameter general inflated Poisson distribution: Properties and applications. Statistical Methodology (under revision).

[P5] Rakitzis, AC, Castagliola, P, Maravelakis, PE (2015). On the modeling and monitoring of general inflated Poisson processes. Quality and Reliability Engineering International (under minor revision).

Invited and Peer-Reviewed Oral Presentations to Internationally Established Conferences
[C1] Rakitzis AC. Control charts for zero-inflated processes with estimated parameters. 3rd Stochastic Modeling Techniques and Data Analysis International Conference, Lisbon, Portugal, 11-14 June 2014 (Invited).

[C2] Rakitzis AC. On the statistical design of one-sided CUSUM charts for zero-inflated binomial processes. Joint Statistical Meetings, Boston, USA, 2-7 August 2014 (Invited).

[C3] Rakitzis AC. Comparative study of control charts for zero-inflated binomial processes. 9th International Conference on Availability, Reliability and Security, Fribourg, Switzerland, 8-12 September 2014 (Invited).

[C4] Rakitzis AC. A new control charting technique for monitoring Poisson observations. 14th Annual Conference of the European Network of Business and Industrial Statistics, Linz, Austria, 21-25 September 2014

[C5] Rakitzis AC. A new memory-type control chart for count data. 26Th Annual Conference of the Greek Statistical Institute, Athens, Greece, 15-18 April 2015 (Invited).

[C6] Rakitzis AC. An EWMA-type chart for monitoring Poisson counts. 14th Workshop on Quality Improvement Methods, Dortmund, Germany, 5-6 June 2015 (Invited).

[C7] Rakitzis AC. Control charts for correlated Poisson observations with zero inflation. 16th Conference of the Applied Stochastic Models and Data Analysis International Society, Piraeus, Greece, 30 June - 4 July 2015 (Invited).

[C8] Rakitzis AC. Controlling processes of generally inflated Poisson counts. 4th International Symposium on Statistical Process Monitoring, Padua, Italy, 7-9 July 2015 (Invited).

[C9] Rakitzis AC. New control charts with memory for the monitoring of correlated counts with finite range. 60th World Statistics Congress - ISI2015, Rio de Janeiro, Brazil, 26-31 July 2015 (Invited).

Publications in International Peer-Reviewed Conference Proceedings

[CP1] Rakitzis AC, Castagliola P. Control Charts for Zero-Inflated Processes with Estimated Parameters. 3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon, Portugal, 631-641.

[CP2] Rakitzis AC, Maravelakis PE, Castagliola P. A Comparative Study of Control Charts for Zero-Inflated Binomial Processes. In Proceedings of the 9th International Conference on Availability, Reliability and Security (ARES), pp.420-425, 8-12 Sept. 2014, DOI: 10.1109/ARES.2014.63.