Small Business Resources, Business Advice and Forms from AllBusiness.com

A statistical process control approach to business activity monitoring.

By Jiang, Wei,Au, Tom,Tsui, Kwok-Leung
Publication: IIE Transactions
Date: Thursday, March 1 2007

1. Introduction

Activity monitoring is an automated process that applies operational intelligence and application integration technologies to alert individuals to changes in complex systems that may require action (see Fawcett and Provost (1999)). It has been widely implemented in business and information industries to audit business processes and business process management systems to reduce revenue loss and enhance efficient resource allocation. For example, service companies use activity

monitoring in a continuous, iterative effort to fine tune operations ranging from claims processing to inventory replenishment. Government agencies now use it to identify and thwart money laundering schemes and bioterrorist attacks. Other examples of activity monitoring include churn detection, credit card or insurance fraud detection, computer intrusion detection, network performance monitoring, and monitoring of news stories (Fawcett and Provost, 1997; DuMouchel and Schonlau, 1998), where thousands or even millions of time series streams have to be monitored simultaneously.

Although activity monitoring has recently begun to receive significant attention from information industries, similar problems and solutions were identified and developed in manufacturing industries, a long time ago, where they are referred to as Statistical Process Control (SPC). SPC methods classify the root causes of process variations as either a common cause or a special cause, and its basic objective is to quickly detect the occurrence of special cause variation (or process shifts) so that the process can be investigated and corrective action taken before the quality deteriorates and defective units are produced. SPC techniques are routinely used for on-line process control and monitoring and they are highly successful in manufacturing applications (Woodall et al., 1997; Montgomery, 2001). Montgomery and Woodall (1997) provide a comprehensive panel discussion on SPC and multivariate methods are reviewed by Hayter and Tsui (1994) and Mason et al. (1997). Recently, Jiang et al. (2003) successfully applied univariate and multivariate control chart techniques to monitor the stability of market segmentations.

Although the principles of SPC can be applied to service industry tasks such as business process monitoring, little research has been done on the application of SPC methods to the monitoring of customer activities so that appropriate marketing campaigns and service customizations can be developed. This paper develops a SPC framework for activity monitoring to allow business planning and forecasting in telecommunications industries. The proposed SPC approach monitors customer profile evolution through a set of state space equations to capture dynamic changes in profiles and detects abnormal events that deviate significantly from a customer's historical profile based on statistical testing principles.

It is shown that the success of the proposed framework critically depends on efficient and robust profiling algorithms to quantify each customer's business trend and volatility. Due to various business and operational reasons, outliers and change points tend to be widely spread in customer databases. To model and track thousands of diversified customer behaviors, it is important to develop simple and unified robust modeling tools, which can accommodate different behavior patterns including business changes, structural breakdowns, as well as unnecessary operational errors. In this paper, in addition to some well-known exponential smoothing methods, we propose a dummy change-point model based on Bayesian Model Averaging (BMA) to estimate customer profiles robustly. Both the simulation and industrial examples from the telecommunications sector show that the dummy change-point model is more resistant to business changes and observational outliers than other smoothing methods when estimating customer profiles.

The aim of this paper is to evaluate SPC-based activity monitoring methods that make use of different profiling algorithms to detect abnormal events such as customer churns and frauds in service industries. The rest of this paper is organized as follows. Section 2 presents an overview of the Business Activity Monitoring (BAM) concept and applications. Section 3 proposes the concept of a three-stage SPC-based activity monitoring framework which consists of customer profiling, monitoring/updating, and event diagnosis. Section 4 introduces a dynamic process model for the implementation of this activity monitoring framework. Section 5 discusses several well-known smoothing and outlier resistant methods for time series profiling. A dummy change-point model based on BMA is proposed to accommodate frequent change points and outliers. Section 6 compares the performance of the profiling algorithms in a simulation example and demonstrates the application of this activity monitoring framework to telecommunications industries. Conclusions are drawn in Section 7 and future research directions are highlighted.

2. BAM

The service sector is currently highly competitive in that it offers numerous competing services and products. For example, in the battlefield between service providers, customer churn is critical for many service companies including telephone service providers, credit card providers, banking, internet service providers, and cable service operators, etc. Customer churn refers to the tendency of customers to cease doing business with a company in a given time period. The annual churn rate in the wireless telecommunications industry is reported to range from 23-46% (Anon, 2000; Anon, 2002) while that in the internet service industry is 21-63% (Anon, 2001; Kolko and Gordon, 2002). Tremendous revenue loss thus occurs due to this customer churn and consequently considerable research effort has been focused on how to predict churn activities before and after their occurrences (see e.g., Lu (2002) and Neslin et al. (2004)).

Many service companies analyze corporate performance by looking at customer activities in a rear view mirror: crunching historical data to link results with causes, develop new promotions and resolve inefficiencies and problems after the fact. To maintain business competitiveness, it is crucial for these companies to develop BAM modules which help track customer profiles and continually refine business models based on feedback that comes directly from knowledge of operational events. By analyzing business activities such as churns in real time, companies can make better decisions, more quickly address problem areas, and reposition the organization to take full advantage of emerging opportunities.

Real-time BAM usually begins with profile modeling which is sometimes difficult due to the complexity of business activities and profile recording processes. For example, in a typical telecommunications company, many customer records of telephone usage exhibit diversified business trends with serial correlations over time and involve frequent business changes brought about by technology innovations, product substitutions, and various financial/accounting adjustments. In addition, the business recording process may often be interrupted by erroneous observations, e.g., flat stretches, bursts of activity, and outliers, which make smoothing and profiling techniques a necessity to aggregate huge transactional data sets, reduce data storage space, and speed up information retrieval from databases.

For illustration, Fig. 1 presents nine representative examples of monthly customer telephone usage patterns from October 2001 to September 2003 in a telecommunications company. Although detailed daily telephone usage is available for each customer, it is always more meaningful to store and analyze aggregated monthly usage as shown in Fig. 1. Moving average and a linear regression fits of the data are also plotted with the observations to act as a reference. The three graphs on the top panel from left to right represent customers whose usage remains approximately constant (or locally constant), constant contaminated by outliers, and constant with behavior changes. Similarly, the second panel represents customers whose usage exhibits trend/local trend with/without outliers and change-point contaminations. The bottom panel shows several more complicated patterns where customers may be new, or undergoing significant changes recently, and/or of high volatility with mixtures of change points and outliers.

The contaminated observations in the database may contain important business information for a company. While outliers may be caused by fraud, recording or billing errors, change points often correspond to marketing campaigns, churn activities, and/or even consolidation of internal customer accounts. If these outliers and change points can be detected promptly, the company will benefit from reduced business errors, improved customer retention, and improved fraud detection. Moreover, detecting these events is also crucial to adapt process models to robustly capture short-term and/or long-term business trends.

[FIGURE 1 OMITTED]

The complexity of customer profiles results in critical challenges to BAM in both model estimation and adaptation, especially when thousands of customer streams are monitored simultaneously. Fortunately, as discussed in the Introduction, SPC has proven to be a great success in manufacturing industries leading to continuous quality and productivity improvements. In the next section, we will introduce the concepts and techniques behind applying SPC methods to activity monitoring.

3. SPC-based activity monitoring

SPC techniques are routinely used for on-line process control and monitoring in manufacturing industries to achieve process stability through variation reduction. The basic idea in SPC is a binary view of the state of a process as developed by Shewhart (1931), i.e., either it is running satisfactorily or it is failed. Common-cause variations refer to variations inherent in the process that are impossible or hard to eliminate. In other words, "the future behavior can be predicted within probability limits determined by the common-cause system" (Box and Kramer, 1992). Special cause variations refer to "something special, not part of the system of common causes" (Deming, 1982). The objective of SPC methods is to isolate the special cause of variations from the common-cause model, look for and then eliminate the cause of the special cause variations.

Conventional SPC methods employ a two-stage implementation procedure to monitor process parameters. In phase I which is called the retrospective stage, an in-control process model is obtained with anomalous observations being identified and eliminated. The process model can either be a time series model or in another functional form. Once the process model is built, in phase II which is referred to as the prospective stage, it is deployed to monitor abnormal events that drive deviations from the in-control process model so that the root causes of these events can be removed. While phase II is an on-line procedure, phase I is usually carried out off-line with many iterations and strongly relies on intuition and intervention by humans. Although many techniques have been developed for phase II monitoring based on hypothesis testing and statistical change-point theory, little work has been done for phase I modeling in the presence of process change points and outliers (Boyles, 2000; Montgomery, 2001).

Most SPC methods assume a single utilization of this two-stage procedure. That is, the in-control process model is assumed to remain unchanged unless abnormal events happen. However, unlike typical manufacturing processes, most business processes are usually time variant due to either reactive factors such as the state of the economy or proactive activities such as promotion campaigns, holiday effects, etc. For example, in the telecommunications industry, customer usage generally follows either a local constant pattern or a local linear trend pattern partially due to the customers' relatively stable behavior and/or short sampling intervals. Moreover, customer usage patterns are often contaminated by outliers and abrupt change points. These unusual observations, without appropriate handling, will severely bias the in-control process model and render it useless for future activity monitoring.

Analogous to SPC, we here propose a three-stage activity monitoring framework to trigger and identify significant business changes for customer relationship management. The proposed activity monitoring framework starts from a definition of business events, which can be customer churn activities, frauds, purchasing pattern changes, or even data processing errors in the data warehouse. Similar to SPC applications, in phase I a customer profile is first estimated from individual customer historical information. Since thousands of customers are involved in the model computation, supervised learning is effectively impossible since most historical data are contaminated by outliers and change points. Therefore, a universally robust profile estimation method is necessary for off-line modeling without excessive human intervention. Second, when customer profiles are created, in phase II an on-line algorithm is developed to update customer profiles and monitor any significant departures from the anticipated profile models. Customer monitoring is somewhat analogous to statistical hypothesis testing and updating algorithms are required to adapt to potential minor process changes if there are any. Finally, if any special events are found in a customer profile, diagnosis and reprofiling are called for in phase III to identify significant events so that prompt marketing actions can be taken. Generally this subset of customers will be small and a careful investigation can be conducted individually and explicitly. Figure 2 summaries this three-stage activity monitoring procedure where several iterations may be necessary to define the significance of events and appropriate measurements after phase III.

[FIGURE 2 OMITTED]

4. Profile modeling and on-line profile monitoring

To implement the activity monitoring procedure for business planning and forecasting in telecommunications, a baseline profile model is required to describe and predict customer behavior. Following the dynamic linear model in West and Harrison (1997), we assume that the usage time series of customer i, {[X.sub.t](i)}, exhibits a local linear trend/growth model. For simplicity, the customer index i is ignored in the following development. Denote as [D.sub.t] the total information available up to time t and [[theta].sub.t] = ([[mu].sub.t], [[beta].sub.t])' the profile vector of local level [[mu].sub.t] and growth [[beta].sub.t] at time t, [M.sub.t] = ([m.sub.t], [b.sub.t])' and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

the conditional mean and variance of profile [[theta].sub.t] given [D.sub.t], respectively. The local trend baseline model consists of the following three equations:

Observation equation: [X.sub.t] = [[mu].sub.t] + [[epsilon].sub.t], (1)

System equation:

[[mu].sub.t] = [[mu].sub.t-1] + [[beta].sub.t-1] + [[omega].sub.t1],

[[beta].sub.t] = [[beta].sub.t-1] + [[omega].sub.t2], (2)

Initial information: ([[theta].sub.0] | [D.sub.0]) ~ N([M.sub.0], [C.sub.0]), (3)

where [[epsilon].sub.t] is the observation error following a normal distribution N(0, [v.sub.t]), [[omega].sub.t] = ([[omega].sub.t1], [[omega].sub.t2])' is the system transition error following a bivariate normal distribution N(0, [W.sub.t]) with system variance:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The initial information needs to be learnt off-line for each customer at the timeframe under consideration.

When [v.sub.t] and [W.sub.t] are known, West and Harrison (1997) provide a set of sequential analyses for the one-step-ahead forecast of [X.sub.t]. Denote:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

as the system transition matrix. The recursive forecast procedure follows the following three steps:

Step 1. Prior for [[theta].sub.t]: ([[theta].sub.t] | [D.sub.t-1]) ~ N([A.sub.t], [R.sub.t]) where [A.sub.t] = ([m.sub.t-1] + [b.sub.t-1], [b.sub.t-1])' and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

with [r.sub.t1] = [c.sub.t-1,1] + 2[c.sub.t-1,3] + [c.sub.t-1,2] + [w.sub.t1], [r.sub.t2] = [c.sub.t-1,2] + [w.sub.t2], [r.sub.t3] = [c.sub.t-1,2] + [c.sub.t-1,3] + [w.sub.t3].

Step 2. One-step forecast: ([X.sub.t] | [D.sub.t-1]) ~ N([f.sub.t], [s.sub.t]) where [f.sub.t] = [m.sub.t-1] + [b.sub.t-1] and [s.sub.t] = [r.sub.t1] + [v.sub.t].

Step 3. Information updating and Posterior for [[theta].sub.t]: [D.sub.t] = {[X.sub.t], [D.sub.t-1]} and ([[theta].sub.t] | [D.sub.t]) ~ N([M.sub.t], [C.sub.t]) where [c.sub.t1] = [[lambda].sub.t1][v.sub.t], [c.sub.t2] = [r.sub.t2] - [[lambda].sub.t2][r.sub.t3], [c.sub.t3] = [[lambda].sub.t2][v.sub.t]. The mean vector follows [M.sub.t] = [L.sub.2][M.sub.t-1] + [[LAMBDA].sub.t][e.sub.t], where [[LAMBDA].sub.t] = ([[lambda].sub.t1], [[lambda].sub.t2])' = ([r.sub.t1]/[s.sub.t], [r.sub.t2]/[s.sub.t])' is the adaptive vector for the forecast error [e.sub.t] = [X.sub.t] - [f.sub.t].

It can be shown that the updating equations lead to an alternative representation of a second-order difference equation of the observations in terms of forecast errors:

[X.sub.t] - 2[X.sub.t-1] + [X.sub.t-1] = [e.sub.t] - [[v].sub.t1][e.sub.t-1] + [[v].sub.t2][e.sub.t-2],

where [[v].sub.t1] = 2 - [[lambda].sub.t-1,1] - [[lambda].sub.t-1,2] and [[v].sub.t2] = 1 - [[lambda].sub.t-2,1].

Theoretically, when a new observation [X.sub.t] becomes available on-line, the profile model (l)-(3) should be validated and updated accordingly. If the profile model is adequate, the new observation should be within a specified range of one-step forecast [f.sub.t]:

|[X.sub.t] - [f.sub.t]| < K[square root of [s.sub.t]], (4)

where K is some positive constant to maintain an approximate type I error of the test, e.g., K = 2 corresponds to a confidence probability of around 95%. This is analogous to the Shewhart chart in traditional SPC applications and is used in the numerical examples later. West and Harrison (1986) adopt a different test based on Bayesian factors, which is very similar to a CUSUM chart (Page, 1954).

If no signal is issued, customer usage is considered stable/predictable under the baseline model at time t. However, when Equation (4) is violated, a significant departure from the customer baseline model is suspected and a signal is issued requesting further investigation. Generally the signaled subset will be relatively small and individual diagnosis can be used to identify the cause of the signal. Once the root cause of the change is identified, corrective action can be taken to retain the customer. It is important to point out that any additional information [I.sub.t] of the diagnosis, e.g., promotion, contract renewal, etc., can be easily incorporated into [D.sub.t] by updating [D.sub.t] = {[X.sub.t], [I.sub.t], [D.sub.t-1]} (West and Harrison, 1997).

In practice, since the system variance [W.sub.t] is unknown, West and Harrison (1997) suggest to approximate the updating equation by a single discount model with [W.sub.t] = [L.sub.2][C.sub.t-1][L'.sub.2](1 - [delta])/[delta] in the limit of t, where [delta] is the discount factor (0 < [delta] [less than or equal to] 1). In the adaptation, the limiting values of [[LAMBDA].sub.t], [C.sub.t], and [R.sub.t], are:

s = 1/[[delta].sup.2], [[lambda].sub.1] = 1 - [[delta].sup.2], [[lambda].sub.2] = (1 - [delta])[.sup.2],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

In this case, the profile updating equations become:

[m.sub.t] = [m.sub.t-1] + [b.sub.t-1] + [[lambda].sub.1][e.sub.t],

[b.sub.t] = [b.sub.t-1] + [[lambda].sub.2][e.sub.t],

[e.sub.t] = [X.sub.t] - [m.sub.t-1] - [b.sub.t-1]. (5)

Generally, a rule-of-thumb selection for [delta] is 0.7-0.9 (Montgomery, 2001). Consequently [[lambda].sub.1] [much greater than] [[lambda].sub.2], i.e., the limiting adaptation is much greater for the level [[mu].sub.t] than for the trend [[beta].sub.t]. This guarantees the stability of the updating equations. In general, a Bayesian double-discount model can provide more flexibility to discount the level and growth separately when using Equation (5) with different [[lambda].sub.1] and [[lambda].sub.2] such that 0 < [[lambda].sub.1] < 1 and 0 [less than or equal to] [[lambda].sub.2] [less than or equal to] [[lambda].sub.1](1 - [square root of (1 - [[lambda].sub.1])]).

If [v.sub.t] is unknown and large relative to the system variance [W.sub.t], West and Harrison (1997) suggest an ad hoc form of the limiting forecast variance when t [right arrow] [infinity] assuming that the variance remains constant. The limiting update of the forecast variance is an Exponentially Weighted Moving Average (EWMA) of the squared errors:

[s.sub.t+1] = [s.sub.t] + [[lambda].sub.3]([e.sub.t.sup.2] - [s.sub.t]). (6)

Based on these discussions, to simplify the application of our SPC method to thousands of customers in the telecommunications example, we define a three-element profile tube for each customer as follows:

{[X.sub.t]} [right arrow] [P.sub.t] = [[m.sub.t], [b.sub.t], [s.sub.t]]'. (7)

This profile refers to the current state at time t of the customer. The updating equations, Equations (5) and (6), can then be recursively used on-line when new observations become available. Minor process changes are expected to be effectively captured by the Bayesian forecasting model with significant deviations being detected through Equation (4). If other trend models are considered, the profile can be defined accordingly (West and Harrison, 1997).

It is important to note that the recurrence equations, Equations (5) and (6), have to stem from an appropriate estimate of the baseline profile obtained off-line. Although there has been considerable discussion on the performance of various SPC monitoring methods, profiling/estimation and signal diagnosis can not be readily adapted from the SPC literature due to the dynamic nature of the business process that includes frequent outliers and change points. As shown in Boyles (2000), inadequate initial profile estimates can unavoidably degrade the performance of SPC methods and prohibit their application in practice. In the next section, we will discuss several statistical algorithms for profile estimation.

5. Off-line profiling algorithms

The proposed three-phase activity monitoring framework attempts to generalize traditional SPC methods to business applications. The success of the SPC-based activity monitoring scheme crucially relies on a robust and efficient estimation of a customer's profile when their data contains outliers and change points. In this section, besides moving average and linear regression models, we will discuss in detail several profiling algorithms including exponential smoothing that can be used in customer activity monitoring. A dummy change-point model is introduced which is shown to be resistant to both change points and outliers.

5.1. Least-square-error methods

Profiling algorithms start from a set of time series data such as those shown in Fig. 1 to derive an initial estimate of the current state information (level, trend, and forecast variance). Since thousands of customer data streams are monitored at the same time, it is necessary to develop universal profile estimation algorithms that possess both statistical and computational efficiency. Many smoothing techniques such as moving average and Least-Square-Error (LSE) methods have been extensively used in engineering and statistical research for time series smoothing and profiling.

Given a time series{[X.sub.k]}[.sub.k=1.sup.n], moving average or finite impulse response filtering of window p is defined as [[alpha].sub.n-k] = [[summation].sub.i=0.sup.p-1] [X.sub.n-k-i] for any time period n - k. It simply uses the past p observations to estimate the current state. Figure 1 shows the moving average of window p = 3 (quarter) of the nine telecommunications examples. When a linear trend occurs as shown in graph 4, the moving average fails to capture the trend which results in significant errors in the state estimation.

To accommodate the linear trend, LSE methods have been proposed which solve optimization problems for a specified additive model. Assuming a linear trend model of the time series{[X.sub.k]}, the LSE estimation can be derived from:

[^.[theta].sub.n] = arg [min.[[theta].sub.n]] [n-1.summation over (k=0)] [[X.sub.n-k] - ([[mu].sub.n] - [[beta].sub.n]k)][.sup.2],

where [[theta].sub.n] = ([[mu].sub.n], [[beta].sub.n])' is the time series profile, [[mu].sub.n] is the customer level estimation at time n and [[beta].sub.n] is the global trend of the time series. Given the estimation [^.[theta].sub.n], the k-step-ahead forecast is[^.X.sub.n+k] = [^.[mu].sub.n] + k[^.[beta].sub.n]. This linear trend model considers all past observations equally, which is not adequate when the underlying observations exhibit serial correlations or process parameters that change significantly over time as discussed in Equations (1) and (2). For example, this model underestimates the current state in graph 6 of Fig. 1.

To capture dynamic changes in a process, Brown (1962) develops an exponentially weighted least-square estimation,

[^.[theta].sub.n] = arg [min.[[theta].sub.n]] [n-1.summation over (k=0)] [w.sub.k][[X.sub.n-k] - ([[mu].sub.n] - [[beta].sub.n]k)][.sup.2], (8)

where [w.sub.k] = (1 - [lambda])[.sup.k] (0 [less than or equal to] k [less than or equal to] n - 1) and [lambda] is a positive constant with 0 [less than or equal to] [lambda] < 1. Theoretically, the weighted linear regression can be thought of as a linear regression with uneven variance 1/[w.sub.k]. Specifically, this estimation is called [L.sub.2] exponential smoothing.

When a local constant model is assumed, i.e., [[beta].sub.n] = 0, Equation (8) reduces to:

[^.[mu].sub.n] = arg [min.[[mu].sub.n]] [n-1.summation over (k=0)] [w.sub.k]([X.sub.n-k] - [[mu].sub.i])[.sup.2],

which has a recursive solution:

[^.[mu].sub.n] = [n-1.summation over (k=0)] (1 - [lambda])[.sup.k][X.sub.n-k]/[[gamma].sub.n]([lambda])

= [[[gamma].sub.n-1]([lambda])(1 - [lambda])[^.[mu].sub.n-1] + [X.sub.n]]/[[gamma].sub.n]([lambda]),

with [[gamma].sub.n]([lambda]) = [[summation].sub.k=0.sup.n-1] (1 - [lambda])[.sup.k]. In the limit n [right arrow] [infinity], [[gamma].sub.n]([lambda]) [right arrow] 1/[lambda]. The mean estimate is equivalent to the EWMA statistic [^.[mu].sub.n] = (1 - [lambda])[^.[mu].sub.n-1] + [lambda][X.sub.n] which has been shown to have an optimal performance when the underlying process follows an autoregressive integrated moving average ARIMA(0,1,1) model (Box et al., 1994). If [lambda] = 0, the EWMA statistic reduces to the mean or average of [X.sub.k] (1 [less than or equal to] k [less than or equal to] n) and every observation receives the same weight in profile estimation, which fails to capture process dynamics. If [lambda] > 0, the most recent observations receive the highest weights and consequently the EWMA statistic is able to effectively adapt to slow and moderate process changes.

When a linear trend is present, the EWMA statistic is always slow to capture the trend. West and Harrison (1997) show that appropriate solutions of Equation (8) can be related via Equation (5) of the limiting forecast of single-discount models, which apply double-exponential smoothing to the observations. In fact, this is the optimal predictor of an ARIMA(0,2,2) process and is able to capture slow process changes (both level and trend). Once the estimation of [[theta].sub.n] is obtained, the forecast error variance can be estimated through:

[^.[sigma].sub.n.sup.2] = [n-1.summation over (k=0)] [w.sub.k][[X.sub.n-k] - ([^.[mu].sub.n] - [^.[beta].sub.n]k)][.sup.2]/(n - 2), (9)

where (n - 2) should be replaced by (n - 1) if [[beta].sub.n] = 0. Equations (8) and (9) can serve as the initial profile estimate in the recursive equations to on-line update customer profiles.

5.2. Least-absolute-deviation methods

Unfortunately, most exponential smoothing techniques are generally sensitive to outliers in the time series profiling and respond slowly to large abrupt changes. Due to frequent outliers in business process databases, outlier resistant methods are very attractive to obtain a robust estimation of the customer profile. Many robust techniques have been developed in the statistics literature since the 1970s (Huber, 1981). An important class of robust statistics is derived by replacing squared-loss functions with other types of score functions such as Huber's function, Tukey's bi-square function, etc. For example, when replacing the squared-loss function by absolute deviations in Equation (8), a robust exponential smoothing statistic can be constructed from the weighted least-absolute-deviation (LAD) optimization:

[^.[theta].sub.n] = arg [min.[[theta].sub.n]] [n-1.summation over (k=0)] [w.sub.k]|[X.sub.n-k] - ([[mu].sub.n] - [[beta].sub.n]k)|. (10)

The LAD or [L.sub.1] regression has a long history (it was proposed prior to least-squares methods) but due to its computational complexity, it was sparingly used until recently. Just as [L.sub.2] regression minimizes the sum of squares of the residuals, [L.sub.1] regression minimizes the sum of absolute values of the residuals.

By analogy with [L.sub.2] exponential smoothing, this is referred as [L.sub.1] exponential smoothing in this paper. [L.sub.1] regression is the maximum-likelihood estimate when the errors are distributed as a double exponential (Laplace), and is the logical objective in some situations. One reason for the increased interest in [L.sub.1] regression is that it is robust with respect to heavy-tailed distributions. It is, however, susceptible to high leverage points, and has an asymptotic efficiency of about 64% for the Gaussian distribution case.

If [w.sub.k] [equivalent to] 1, [L.sub.1] exponential smoothing is essentially the LAD estimate (Huber, 1981). Once [^.[mu].sub.n] and [^.[beta].sub.n] are obtained, a robust estimation of the standard deviation can be derived as:

[^.[sigma].sub.n] = 1.4826 [median.[0[less than or equal to]k[less than or equal to]n-1]]{[square root of [w.sub.k]]|[X.sub.n-k] - ([^.[mu].sub.n] - [^.[beta].sub.n] k)|}, (11)

where the coefficient factor of 1.4826 adjusts the scale for maximum efficiency when the data sample comes from a Gaussian distribution.

It is easy to see that the optimization problem is equivalent to solving the following Linear Programming (LP) problem:

[min.[[theta].sub.n]] [n-1.summation over (k=0)] [w.sub.k]([e.sub.k.sup.+] + [e.sub.k.sup.-]),

subject to

([[mu].sub.n] - [[beta].sub.n]k) + [e.sub.k.sup.+] - [e.sub.k.sup.-] = [X.sub.n-k], 0 [less than or equal to] k [less than or equal to] n - 1, [e.sub.k.sup.+], [e.sub.k.sup.-] [greater than or equal to] 0.

However, it is not necessary to solve the LP problem explicitly in practice. Cipra (1992) develops simple algorithms to calculate [L.sub.1] exponential smoothing statistics. For other complicated trends, the Barrodale-Roberts algorithm, which is a specialized LP algorithm, can be used to solve the LP problem (Barrodale and Roberts, 1973).

When [[beta].sub.n] = 0, [L.sub.1] exponential smoothing has a close relationship with other robust estimators. If [lambda] = 0, i.e., [w.sub.k] [equivalent to] 1, it reduces to the median of {[X.sub.k]}, which is a well-known robust estimator of the mean of {[X.sub.k]} (1 [less than or equal to] k [less than or equal to] n). To capture recent process changes, [lambda] > 0 is always used, i.e., a higher weight is given to recent observations. Although there is a large literature on the efficiency of [L.sub.2] exponential smoothing methods, especially when [[beta].sub.n] = 0 in the local constant model, there has been little research on the statistical performance of [L.sub.1] exponential smoothing statistics.

Although the above exponential smoothing methods provide dynamic models to describe time-varying business processes, they generally respond slowly to abrupt and/or large process changes. When customer profiles significantly change in the past, either [L.sub.2] or [L.sub.1] exponential smoothing methods fail to adapt to dynamic process changes quickly (Yashchin, 1995). This calls for a robust change-point model which can accommodate data changes in the profile.

5.3. Dummy change-point model

Intervention analysis is always required to incorporate change-point information in time series analysis when the data have significant changes. A common response to process changes is increased uncertainty about the future, through uncertainty in existing model parameters. West and Harrison (1997) provide a comprehensive treatment of various feedforward and feedback intervention modes given knowledge of specified changes.

When no process change information is available, however, there is no formal research on model development to accommodate potential changes. We now propose a dummy change-point model for profile estimation based on BMA (Hoeting et al., 1999). Raftery et al. (1997) apply BMA to subset selection in linear regression analysis. The proposed dummy change-point model is essentially a combination of different local forecasts (Clemen, 1989). Unlike the model intervention methods, the dummy change-point model does not assume process change information.

For time series{[X.sub.k]}[.sub.k=1.sup.n], the dummy change-point model assumes that a change point occurs at time p with probability [v.sub.p] = [w.sub.p]/[[gamma].sub.n](1 [less than or equal to] p [less than or equal to] n). To minimize the requirement for process assumptions, assume that the process before time p may not interact with the process after time p. This implies that the current profile is only related to the last p observations. Moreover, assume that the process after each change-point model follows a Gaussian linear model, i.e., the time series [X.sub.t] has a mixed Gaussian distribution [[summation].sub.p=1.sup.n] [v.sub.p] x N([[mu].sub.t](p), [[sigma].sub.t.sup.2](p)), where [[mu].sub.t](p) and [[sigma].sub.t.sup.2](p) are the mean and variance of the pth change-point model at time t. It is easy to see that the mean and variance of [X.sub.t] can be derived as follows:

E([X.sub.t]) = [n.summation over (p=1)] [v.sub.p][[mu].sub.t](p), (12)

and

Var([X.sub.t]) = [n.summation over (p=1)] [v.sub.p][[[mu].sub.t.sup.2](p) + [[sigma].sub.t.sup.2](p)] - [[n.summation over (p=1)] [v.sub.p][[mu].sub.t](p)][.sup.2], (13)

i.e., the current profile is a weighted combination of profiles generated from each change-point model based on the dummy probability. A schematic framework of this model is shown in Fig. 3.

[FIGURE 3 OMITTED]

When no outliers are present, an LSE estimation of the pth change-point linear model can be obtained as

[^.[theta]](p) = arg [min.[theta]] [p-1.summation over (k=0)] [[X.sub.n-k] - ([mu] - [beta]k)][.sup.2]. (14)

Specifically, if a local constant model is appropriate, i.e., [beta] = 0, this reduces to [^.[mu]](p) = 1/p [[summation].sub.k=0.sup.p-1] [X.sub.n-k], which is the mean of the last p observations. The mean and variance of [X.sub.t] are then [^.[mu].sub.t](p) = [^.[mu]](p) - [^.[beta]](p)t and [^.[sigma].sub.t.sup.2](p) = [[summation].sub.k=0.sup.p-1] [[X.sub.n-k] - ([^.[mu]](p) - [^.[beta]](p)k)][.sup.2]/(p - 2).

If outliers are a concern in the data, the above LSE estimation can

be analogously replaced by the LAD method. For example, [mu](p) = median([X.sub.t-p+1], [X.sub.t-p+2],..., [X.sub.t]) if [beta] = 0. However, as is shown in the following numerical examples, the [L.sub.2] dummy change-point model is quite robust against outliers and it is generally not necessary to apply a LAD to Equation (14).

The proposed method looks similar to exponential smoothing methods since it assigns exponential weights to n change-point models. For example, when [beta] = 0, the level estimation is also a weighted average of past observations. However, observations do not receive weights exponentially in the dummy change-point model. Generally, with the same discount/smoothing factor, as shown in Fig. 4, the dummy change-point model puts more weight on recent observations than the exponential smoothing methods. The emphasis on recent observations makes the proposed methods very attractive to estimate recent customer profiles.

6. Performance comparison

There is an extensive literature on the efficiency of forecasting/smoothing algorithms (Box et al., 1994). However, to our knowledge, there has been little research on comparing profiling performances when the underlying process contains outliers and/or change points. In this section, we compare the performance of the aforementioned profiling algorithms and their application to customer activity monitoring. Both prediction errors of the profiling algorithms and associated activity monitoring performance are compared using a simulation example and a telecommunications example.

[FIGURE 4 OMITTED]

6.1. Synthetic example

We now illustrate the performance comparison first with the following simulation example. The underlying process is assumed to follow Equations (1)-(3) with [v.sub.t] = 1 and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The simulation example is constructed here so that profiling efficiency and event detection probabilities can be evaluated with the available process knowledge.

This underlying process is assumed to be potentially interrupted by additive outliers represented as:

[[epsilon].sub.t] = (1 - [[phi].sub.0]) x N(0, 1) + [[phi].sub.0] x N([[delta].sub.0], 1),

and structural changes, which are characterized by additive outliers on [[mu].sub.t] and [[beta].sub.t] respectively, i.e.,

[[omega].sub.t1] = (1 - [[phi].sub.1]) x N(0, [w.sub.t1]) + [[phi].sub.1] x N([[delta].sub.1], [w.sub.t1]),

[[omega].sub.t2] = (1 - [[phi].sub.2]) x N(0, [w.sub.t2]) + [[phi].sub.2] x N([[delta].sub.2], [w.sub.t2]),

where [[phi].sub.i] denotes the outlier occurrence rate and [[delta].sub.i] denotes the outlier/event size (i = 0, 1, 2). Note that an additive outlier on [[mu].sub.t] and [[beta].sub.t] represents a step change in the process mean and slope values respectively. The outlier occurrence rate is assumed to be 0.05, and the sizes are assumed to be ten for observation outliers and three for the mean and slope changes respectively. These simulation scenarios are not intended to represent any particular processes but simply provide as a benchmark for illustration purposes.

6.1.1. Comparison of the mean-squared forecast error

Since robust and efficient estimation of profiles is essential to activity detection based on customer profiles, we now apply the four profiling algorithms: (i) [L.sub.2] exponential smoothing; (ii) [L.sub.1] exponential smoothing; (iii) dummy change-point model with [L.sub.2] linear regression; and (iv) dummy change-point model with [L.sub.1] linear regression, to estimate and predict profiles. Note that this estimation procedure is usually performed off-line and may be time consuming.

One thousand simulated processes are generated from the above synthetic model with 21 observations in each sequence. The first 20 observations are used to estimate the profile and a prediction is obtained from the four profiling algorithms to compare against the 21st observation. It is assumed that there are no changes at the 20th observation so that any special signals at the succeeding observations are interesting. Similarly, the 21st observation is also assumed to be free of outliers in this comparison. The one-step-ahead Mean-Squared Forecast Errors (MSFEs) are shown in Fig. 5(a-d) which includes the cases with no change (Fig. 5(a), cases with outliers of size ten (Fig. 5(b)), cases with step mean changes of size three (Fig. 5(c), and cases with slope changes of size three Fig. 5(d)). The smoothing constant [lambda] for the four profiling methods is chosen from 0.05-0.35.

It is easy to see that the [L.sub.2] exponential smoothing method with [lambda] [approximately equal to] 0.25 is nearly optimal in terms of MSFE when there are no changes in the underlying process. This agrees with the expectation since double-exponential smoothing models are optimal for predicting the dynamic model (1)-(3) in the time limit t as shown in Section 4. On the other hand, the [L.sub.1] exponential smoothing method seems to improve and the two dummy change-point models perform slightly worse when [lambda] gets bigger.

When the underlying process contains 5% outliers of size ten, the [L.sub.2] exponential smoothing method deteriorates significantly and becomes less comparable to the dummy change-point models if [lambda] is small. As expected, the [L.sub.1] exponential smoothing method is resistant to outliers but is still less efficient than the other methods unless [lambda] is close to 0.3. Similarly, when there are step mean changes or slope changes in the underlying process, the dummy change-point models tend to be the most-efficient predictors among the four profiling algorithms. Note that slope changes are always more prominent than step mean changes when estimating profiles of the underlying process. It is interesting to note that the two dummy change-point models often give similar MSFE values when the same discount parameter is used. This clearly demonstrates robustness against various outliers and change points and implies the sufficiency of using [L.sub.2] linear regression in the dummy change-point models.

[FIGURE 5 OMITTED]

6.1.2. Monitoring performance

Accurate profiling algorithms enable the efficient prediction of future profile state and acute future event detection. Once the profile estimation has been established, a recursive forecast procedure can be used to update profile estimation and track erroneous changes in the profile caused by events such as churns, frauds, etc. Although there is an intrinsic relationship between forecasting and event detection in general, it is not clear how the forecast accuracy of the four profiling methods affects their event detection capabilities.

When the four profiling methods are used to estimate process states, Fig. 6(a) presents the one-step event detection probability when the event size is ten. The historical process is now assumed to have been disturbed by all three types of events independently with a 5% of chance each one occurring. An outlier/event is now artificially introduced at the 21st observation to test the detection capability of the four profiling methods. Note that both an outlier and a mean shift have the same impact on the process when they first occur, whereas a slope change has an impact on the process mean following the occurrence. When no change happens at the 21st observation, the misidentification rate of the false positive probability is reported in Fig. 6(b).

[FIGURE 6 OMITTED]

It is clear that the detection rate of the four methods improves when [lambda] gets bigger. This is apparently due to the fact that large [lambda] values are helpful to short-term forecasting when historical observations involve unusual changes. Both the dummy change-point method and [L.sub.1] exponential smoothing method have better event detection probabilities than the [L.sub.2] exponential smoothing method when the discount parameter is small. All methods have a detection rate higher than 95% when [lambda] is larger than 0.2. However, both the [L.sub.2] and the [L.sub.1] exponential smoothing methods have much higher false alarm probabilities than the dummy change-point models when no change actually happens. The dummy change-point model has a similar false alarm rate as the [L.sub.2] exponential smoothing method but is more sensitive to new events.

6.2. Industrial example

The data obtained from a telecommunications company contains 2 years worth of customer monthly usage aggregated from daily activities as shown in Fig. 1. An interesting business problem is to predict customer usage in the following months and identify significant departures from the prediction. The moving average and exponential smoothing techniques are widely used for business forecasting. Here we applied the four profiling algorithms to predict customer usage levels. The forecast is then compared with new incoming data to evaluate the prediction errors. In this study, 1000 customers were randomly selected from the database and their usage was rescaled to [-1, 1] for comparison purposes.

6.2.1. Comparison of MSFE values

Figure 7 shows one-step-ahead MSFE values for the four methods when the smoothing constant is chosen to be from 0.05-0.35. The [L.sub.2] exponential smoothing method always has a lower MSFE value than the [L.sub.1] exponential smoothing method for any given smoothing factor. This is reasonable since the [L.sub.2] exponential smoothing method optimizes the mean-squared errors in model fitting. In addition, they both show a downward trend when [lambda] gets bigger, which implies that a strong linear trend or nonstationarity is present with slight higher-order variations. On the other hand, the two dummy change-point methods again perform very similarly. They both outperform the [L.sub.2] exponential smoothing method when [lambda] [less than or equal to] 0.3. It is important to note that the dummy change-point methods have a robust MSFE performance against [lambda] in this example. This turns out to be a very important feature in practice since we do not need to care too much about parameter selection when profiling each customer.

[FIGURE 7 OMITTED]

[FIGURE 8 OMITTED]

[FIGURE 9 OMITTED]

[FIGURE 10 OMITTED]

[FIGURE 11 OMITTED]

6.2.2. Monitoring performance

As discussed in the simulated example, small MSFE values do not necessarily imply a better SPC monitoring performance. To test how the profiling algorithms can be used for activity monitoring, we implemented our SPC framework to the sampled data to track customer profiles and detect abnormal events. For simplicity, [lambda] was chosen to be 0.2 and the control limits were set at two standard errors for all methods. The first 20 observations were used to initialize the profile updating.

Figures 8-11 present the application of SPC activity monitoring to customer profiles initialized by the four profiling algorithms. It can be seen that the [L.sub.1] exponential smoothing method often has looser control limits than the [L.sub.2] exponential smoothing method when no outliers are present (graphs 1 and 4 in Figs. 8 and 9) since the [L.sub.1] method has a lower statistical efficiency. However, when outliers occur, the [L.sub.1] exponential smoothing method has tighter control limits than the [L.sub.2] exponential method (graphs 2 and 5 in Figs. 8 and 9). When there are change points, the two methods are similar (graphs 3 and 6 in Figs. 8 and 9). On the other hand, the two dummy change-point models often have tighter control limits than the exponential smoothing methods when no outliers exist because of the low MSFE values (graphs 1 and 4 in Figs. 10 and 11). As expected, they often adapt to process changes very quickly (graph 6 in Figs. 10 and 11).

In general, when new data comes into the system, the SPC method with [L.sub.2] exponential smoothing initialization signals 56 abnormal customers out of 1000 customers. The SPC method based on [L.sub.1] exponential smoothing initialization signals 21 customers. The two dummy change-point models generate 78 and 80 signals respectively. Further investigation shows that the abnormal customer lists from the two dummy change-point models include 49 and 52 customers, respectively, of those in the list using [L.sub.2] exponential smoothing initialization and all those in the list using [L.sub.1] exponential smoothing initialization. Examining the 29 customers that are in the list of dummy change-point models but not in the list of exponential smoothing models, they all show somewhat abnormal changes. The seven customers who were not signaled by the dummy change-point model show a very recent change, but it is difficult to determine whether these are outliers or change points.

7. Concluding remarks

This paper proposes a SPC-based activity monitoring framework based on linear dynamic models for customer profiling and monitoring. The framework consists of a reliable profiling method to extract customer state information. Exponential smoothing methods including robust methods are discussed to provide an initial estimate of customer profiles. A dummy change-point model is also proposed to accommodate frequent change points. The development of these methods aims at reducing or removing effects of outliers and change points encountered in customer historical data. The simulated and real-life industrial examples show that the proposed activity monitoring framework is very effective in both estimating customer profiles and detecting abnormal business changes.

Although this paper only discusses applications in telecommunications industries, the proposed framework is generic and potentially suitable for other stream-related data mining applications involving large amounts of real-time information such as credit card fraud detection, computer network management and web mining. For these real-time applications, it is important to note that profiling algorithms are usually done off-line and only profiles are estimated and stored for on-line activity monitoring purposes. Off-line profiling methods not only save storage space but also speed up the on-line profile updating and activity tracking. In industry, scalability of the off-line algorithms may be sometimes not as important as the on-line updating algorithms given that the on-line algorithms can be done in real time or pseudo-real-time. Moreover, the profile modeling and updating system based on the state space model of Equations (1)-(3) can be easily extended to other more sophisticated Bayesian forecasting models if higher variations are expected.

The current paper discusses only profiling algorithms under the SPC framework. Further studies are ongoing to investigate algorithms for identifying the root causes of abnormal customer behaviors using other data mining techniques such as pattern matching and clustering (Hastie et al, 2001). Statistical methods for signal diagnosis will be pursued in another paper.

Acknowledgements

The authors would like to thank the referee whose suggestions have helped improve the quality of this paper. This work was supported by National Science Foundation under grants DMI-0200224 and IIS-0542881.

References

Anon (2000) They love me, they love me not. Wireless Review, 17(21), 38-42.

Anon (2001) What the cost of customer churn means to you. Network World, 18(46), 43.

Anon (2002) Standing by your carrier. Telephony Online website, http://currentissue.telophonyonline.com/. Accessed March 18, 2002.

Barrodale, I. and Roberts, F.D.K. (1973) An improved algorithm for discrete LI linear approximations. SIAM Journal of Numerical Analysis, 10, 839-848.

Box, G.E.P. and Kramer, T. (1992) Statistical process monitoring and feedback adjustment--a discussion. Technometrics, 34, 251-285.

Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (1994), Time Series Analysis: Forecasting and Control, Prentice Hall, Englewood Cliffs, NJ.

Boyles, R.A. (2000) Phase I analysis of autocorrelated processes. Journal of Quality Technology, 32, 395-409.

Brown, R.G. (1962) Smoothing, Forecasting and Prediction of Discrete Time Series, Prentice-Hall, Englewood, Cliffs, NJ.

Cipra, T. (1992) Robust exponential smoothing. Journal of Forecasting, 11, 57-69.

Clemen, R.T. (1989) Combining forecasts: a review and annotated bibliography. International Journal of Forecasting, 5, 559-583.

Deming, W.E. (1982) Quality, productivity and competitive position. Working Paper, MIT Center for Advanced Engineering Study, MIT, Cambridge, MA.

DuMouchel, W. and Schonlau, M. (1998) A fast computer intrusion detection algorithm based on hypothesis testing of command transition probabilities, in Proceedings of KDD-98, pp. 189-193.

Fawcett, T. and Provost, F. (1997) Adaptive fraud detection. Data Mining and Knowledge Discovery, 1, 291-316.

Fawcett, T. and Provost, F. (1999) Activity monitoring: noticing interesting changes in behavior, in Proceedings of KDD-99, pp. 53-62.

Hastie, T., Tibshirani, R. and Friedman, J. (2001) Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer-Verlag.

Hayter, A.J. and Tsui, K.-L. (1994) Identification and quantification in multivariate quality control problems. Journal of Quality Technology, 26, 197-208.

Hoeting, J.A., Madigan, D., Raftery, A.E. and Volinsky, C.T. (1999) Bayesian model averaging: a tutorial. Statistical Science, 14, 382-401.

Huber, P.J. (1981) Robust Statistics, Wiley, New York, NY.

Jiang, W., Au, T., Tsui, K.-L. and Xie, M. (2003) A multivariate control chart for poisson data. (submitted).

Kolko, J. and Gordon, J. (2002) AOL isn't hemorrhaging subscribers. Consumer Technographics North America Brief, Forrester Research.

Lu, J. (2002) Predicting customer churn in the telecommunications industry an application of survival analysis modeling using SAS, in SAS User Group International (SUGI27) Online Proceedings, available at http://www.sas.com/usergroups/sugi/proceeedings/. Accessed

Mason, R.L., Champ, C.W., Tracy, N.D., Wierda, S.J. and Young, J.C. (1997) Assessment of multivariate process control techniques. Journal of Quality Technology, 29, 140-143.

Montgomery, D.C. (2001) Introduction to Statistical Quality Control, 4th edn., Wiley, New York, NY.

Montgomery, D.C. and Woodall, W.H. (1997) A discussion on statistically-based process monitoring and control. Journal of Quality Technology, 29, 121-162.

Neslin, S.A., Gupta, S., Kamakura, W., Lu, J. and Mason, C. (2004) Defection detection: improving predictive accuracy of customer churn models. Technical report, Dartmouth Unversity, Hanover, NH 03755.

Page, E.S. (1954) Continuous inspection schemes. Biometrika, 41, 100-115.

Raftery, A.E., Madigan, D. and Hoeting, J.A. (1997) Bayesian model averaging for linear regression models. Journal of the American Statistical Association, 92, 179-191.

Shewhart, W.A. (1931) Economic Control of Quality of Manufactured Product. Van Nostrand, New York, NY.

West, M. and Harrison, J. (1986) Monitoring and adaptation in Bayesian forecasting models. Journal of the American Statistical Association, 81, 741-750.

West, M. and Harrison, J. (1997) Bayesian Forecasting and Dynamic Models, 2nd edn., Springer-Verlag, New York, NY.

Woodall, W.H., Tsui, K.-L. and Tucker, G.R. (1997) A review of statistical and fuzzy quality control charts based on categorical data, in Frontiers in Statistical Quality Control, Volume 5, Lenz, H.-J. and Wilrich, P. (eds.), Physica-Verlag, Heideiberg, Germany, pp. 83-89.

Yashchin, E. (1995) Estimating the current mean of a process subject to abrupt changes. Technometrics, 37, 311-323.

Biographies

Wei Jiang is an Assistant Professor in the Department of Systems Engineering and Engineering Management at Stevens Institute of Technology. He obtained his Ph.D degree from the Hong Kong University of Science and Technology in 2000 and B.Sc and M.Sc degrees from Xi'an Jiaotong University in 1989 and 1992, respectively. He worked at AT & T Labs before joining Stevens. His current research interests include statistical methods for quality improvement, data mining and enterprise intelligence. He is an Associate Editor of Journal of Statistical Computation and Simulation and is currently serving as a council member for the data mining section and Secretary/Treasurer of quality, statistics, and reliability section at the Institute of Operations Research and Management Sciences (INFORMS). He is a recipient of the NSF CAREER Award in 2006.

Tom Au is currently a Senior Technical Consultant at AT & T Labs, with more than 20 years of experience in applying and supervising projects in statistical modeling, data mining, and forecasting and information technology. He obtained his Ph.D in statistics from Yale University in 1984, and B.S. in Mathematics from the Chinese University of Hong Kong in 1980. His current interests include sales forecasting, churn prediction, time series clustering, data quality and integration. He is currently serving as an adviser to the data mining section of the Institute of Operations Research and Management Sciences. He also served as the chair and vice-chair of the data mining section in 2005 and 2004 respectively.

Kwok-Leung Tsui is a Professor in the School of Industrial and Systems Engineering at the Georgia Institute of Technology. He has a B.Sc. in Chemistry and an M.Ph. in Mathematics both from the Chinese University of Hong Kong, and a Ph.D. in Statistics from the University of Wisconsin at Madison. He worked in the quality assurance center at AT & T Bell Laboratories before joining Georgia Tech in 1990. He is an elected fellow of the American Statistical Association and was a recipient of the 1992 NSF Young Investigator Award. He was the elected President and Vice President of the American Statistical Association Atlanta Chapter in 1992-1993, the chair of the INFORMS quality, statistics, and reliability section in 2000, and the founding chair of the INFORMS data mining section, in 2004. He is also a US representative on the ISO Technical Committee on Statistical Methods (TC 69). He is currently a Departmental Editor of IIE Transactions and on the editorial boards of the Journal of Quality Technology and the International Journal of Reliability, Quality, and Safety Engineering.

WEI JIANG (1), TOM AU (2) and KWOK-LEUNG TSUI (3)

(1) Department of Systems, Engineering & Engineering Management, Stevens Institute of Technology, Hoboken, NJ 07030, USA

E-mail: wjiang@stevens.edu

(2) AT & T, Morristown, NJ 07960, USA

E-mail: sau@att.com

(3) School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

E-mail: ktsui@isye.gatech.edu

Received July 2004 and accepted February 2006