JRC 2006

Planning Information.

 

Tentative Invited Sessions:

 

Theme

Organizer

Empowering Non-Statisticians with Statistical Thinking and Statistical Tools

What does it mean to be an "Empowering Statistician"?

Angie Patterson, GE’s Global Research

Abstract:

The expectation to empower non-statisticians with statistical thinking, methods and tools is enough to put statisticians outside of their comfort zone. After all... how do you "deliver" empowerment? And when/where did we (statisticians) get trained to do this? Through a case study at General Electric, we'll discuss a model for empowerment, the benefits, and the ongoing role of the statistician. 

 

 

Can Statisticians Be Effective in Educating Others in Statistical Thinking?

 

Bill Parr. University of Tennessee

 

Abstract:


Long ago, W. Edwards Deming cautioned us that the need was for large numbers of statistically literate managers, not for a radical increase in the numbers of people with advanced degrees in statistics. We have seen, in the last two decades, a major (if partial) move in this direction with Six Sigma. We examine trends in the actual activity of professional statisticians in industry and government, how these have been affected by the Six Sigma movement, and look at recommendations on how to deal with resistance, how to effectively educate others in the use of statistical thinking, and on what must change to raise the level of statistical thinking in industry and government.

 

    ________________________________________________

 

Accepting Shewhart’s Challenge – Developing Statistically-Minded Leaders

 

Ronald D.  Snee

Principal, Process and Organizational Excellence

Tunnell Consulting

900 East Eighth Avenue

King of Prussia, PA 19406-1324

Snee@TunnellConsulting.com

 

Abstract

 

Almost seven decades ago Walter Shewhart challenged us with admonishment that “The long-range contribution of statistics depends not so much on getting a lot of highly trained statisticians into industry as it does in creating a statistically minded generation of physicists, chemists, engineers and others who will in any way have a hand in developing and directing the productive processes of tomorrow." The importance, benefits and implementation of such action is the subject of this presentation. Attention is focused on how to develop statistically minded leaders. “Leader” is interpreted broadly to mean anyone who is working to improve on how the organization runs its business. Needs of the customer – leaders of our organizations – are addressed. Benefits and fears on the parts of non-statisticians and statisticians alike are discussed. It is argued that one effective method for enabling leaders to make greater use of statistical thinking is to promote the widespread use of the DMAIC approach – Define, Measure, Analyze, Improve, and Control - to process improvement and problem solving.

 

 

Gwen Stimely, Minitab, Inc.

(gstimely@minitab.com)

 

Statistics in Asia: Panel Discussion

 

As Asia becomes a major player in the world economy, how is the statistical profession adapting? A panel of statisticians working in industry and academe will describe their experiences in China, India, Singapore, Taiwan and other countries in the region. At the end, the panel will open up to questions from the audience.

 

Panelists:
 

Veronica Czitrom, PhD

Fellow of the American Statistical Association

Statistical Training and Consulting

 

T N Goh, PhD

Director, Quality and Innovation Research Center

Professor, Industrial & Systems Engineering Department

National University of Singapore

 

Dennis Lin, PhD

University Distinguished Professor

Pennsylvania State University

 

Bovas Abraham, Ph.D

President, International Society for Business and Industrial statistics

University of Waterloo

Canada

 

Ai-Chu Wu, Ph. D.

Statistical Consulting

 

 

Veronica Czitrom, Statistical Training and Consulting

(czitrom@hotmail.com)

 

TN Goh, Director, Quality and Innovation Research Center

Professor, Industrial & Systems Engineering Department

National University of Singapore

 

(isegohtn@nus.edu.sg)

 

Statistics in European Business and Industry

(schedule this session for the 8th or the 9th)

 

The session aims to give an overview of the increasing impact of statistical methods on business and industry in Europe, and the diverse and interesting scientific research this inspires. These developments have resulted in the establishment of the European Network for Business and Industrial Statistics (ENBIS) in 2000. This network aims to provide a forum for the interaction between statistical researchers, consultants and practitioners. Fabrizio Ruggeri, president of ENBIS, will very briefly introduce the network.

 

Presentations:

 

On the Reliability of Repairable Systems: Methods and Applications

Fabrizio Ruggeri

CNR-IMATI, Milan, Italy

President of the European Network for Business and Industrial Statistics (ENBIS)

 

Repairable systems subject to minimal repair are those systems whose reliability is the same just before a failure and after the corresponding  repair. Failures of such systems are often described by means of non-homogeneous Poisson processes (NHPP). We present some results and illustrate them in some case studies.

 

Hypothesis generation in improvement projects

Jeroen de Mast

Institute for Business and Industrial Statistics of the University of Amsterdam, Netherlands.

 

In quality improvement projects — such as Six Sigma projects — an exploratory phase can be discerned, during which possible causes, influence factors or variation sources are identified. In a later, confirmatory phase the effects of these possible causes are experimentally verified. Whereas the confirmatory phase is well understood, in both the statistical sciences and philosophy of science, the exploratory phase is poorly understood. This paper aims to provide a framework for the type of reasoning in the exploratory phase by reviewing relevant theories in philosophy of science, artificial intelligence and medical diagnosis. Data-driven, explanation-driven (or abductive) and coherence-driven discovery will be discussed. Furthermore, the presentation provides a classification and description of approaches that could be followed for the identification of possible causes. Finally, the theory and practice of exploratory data analysis will be briefly reviewed.

 

Optimal two-level split-plot designs

Peter Goos (in joint work with J.M. Lucas)

Antwerp University, Belgium.

 

Split-plot designs are very often used in industrial experimentation, both advertently and inadvertently. The present paper focuses on the optimal design of two-level split-plot designs for the estimation of main effects. In doing so, the attention is not restricted to regular split-plot designs, but non-regular split-plot designs with odd and heterogeneous whole plot sizes are studied too. Simple strategies for outperforming completely randomized designs in terms of A-, D-, G- and V-efficiency are presented, and conditions for optimally arranging the runs of two-level designs in split-plot designs with given numbers and (even and odd) sizes of whole plots are given.

 

Jeroen de Mast, IBIS UvA

(jdemast@science.uva.nl)

 

New Advancements in Variation Modeling, Analysis and Control for Complex Systems


Paper 1:
Title:  "Building Direct Influence Graph for Manufacturing Processes with Complex Topologies"

 

By Li Zeng and Shiyu Zhou, University of Wisconsin

Presenter: Dr. Shiyu Zhou
Assistant Professor
University of Wisconsin-Madison
Department of Industrial Engineering
449 Mechanical Engineering Building
1513 University Avenue
Madison, WI 53706-1572
Phone: (608) 262-9534
Fax:     (608) 262-8454
E-mail: szhou@engr.wisc.edu

Abstract:
This paper presents an iterative model building methodology to identify the underlying interaction among operation units in complex manufacturing processes through the integration of advanced statistical techniques in graphical models and engineering insights to manufacturing processes. This technique lays a foundation for effective quality control of processes with complex topologies.

Paper 2:

Title: Bayesian Spatial Model for Form Error Assessment using Multiple Coordinate Sensor Data

Haifeng Xia, Yu Ding, Jyhwen Wang,  Texas A&M University

Presenter: Dr. Yu Ding
Assistant Professor
Department of Industrial Engineering
Texas A&M University
3131 TAMU
College Station, TX 77843-3131
Voice: (979)845-5448
Fax  : (979)847-9005
yuding@iemail.tamu.edu

Abstract:
We present a Bayesian spatial model for assessing the form errors using coordinate sensor data. The ability to simultaneously characterize systematic and random form errors is essential to developing reliable conformance checking methods. The resulting Bayesian solution can provide better estimates of form errors and its uncertainty than conventional methods.


Paper 3:

Title: Sensor System Reliability Analysis for Manufacturing Variation Control
 
Presenter: Dr.Yong Chen
Assistant Professor
Department of Mechanical and Industrial Engineering
2403 Seamans Center for the Engineering Arts and Sciences
The University of Iowa
Iowa City, Iowa 52242-1527
Tel: 319-335-6106 (O), 319-688-9254(H), 319-400-0035 (M)
Email: yongchen@engineering.uiowa.edu

Abstract
Variation source identification using sensor systems is essential for achieving manufacturing quality and productivity improvement. Sensor failure may result in misdetections and false alarms, leading to inferior manufacturing quality and unexpected downtime. The objective of the research is to develop a systematic methodology for sensor system reliability analysis and optimization.

 

Jianjun (Jan) Shi
Professor       
Industrial and Operations Engineering Dept.
University of Michigan            
Ann Arbor, MI 48109-2117

Tel:  734-763-5321
Fax: 734-764-3451
e-mail: shihang@umich.edu

http://www-personal.engin.umich.edu/~shihang/index.html

 

Computer Experiments and Model Validation in Engineering Applications

 

Input Uncertainty and Potential-to-Validate:

Sampling Plans for Monte Carlo Assessment

 

Max D. Morris, Iowa State University

Leslie M. Moore, Los Alamos National Laboratory

Michael D. McKay, Los Alamos National Laboratory

 

Abstract:  In complex settings, validation of mechanistic computer models is often difficult because appropriate input values are not precisely known.  Input uncertainty limits the degree to which models can be realistically validated.  The most optimistic pre-assessment of model validity, or “potential-to-validate,” is closely associated with ideas and indices used in probabilistic sensitivity and uncertainty analysis.  This talk will review a relevant nonparametric sampling-based approach to sensitivity/uncertainty analysis of computer models, and discuss recent work in input sampling plans that support the assessment of potential-to-validate.

 

 

Error budget for the validation of physics-based predictive models

Roger Ghanem, University of Southern California

John Red-Horse, Sandia National Laboratories, Albuquerque, NM.

Alireza Doostan, Johns Hopkins University, Baltimore, MD.

 

 

The significant recent growth in computing resources has ushered in the new field of prediction science where the objective has evolved from solving a system of governing equations, to approximating reality.  

 

This new research program encompasses the emerging fields of model validation and uncertainty quantification.   Essentially, one wishes to rely on available knowledge to predict the future evolution of a design quantity, in a useful manner.   Available knowledge is typically in the form of physical laws and experimental evidence.  Stochastic representations of both measurements and physical models presents a possible venue for combining the two pieces of knowledge in an integral manner.  Benefits from such a representation include the ability to allocate resources, in a rational manner, between experimental and numerical efforts, as well as the ability to better mitigate risk in the design process.   A more tangible benefit will be a reduction in the cost of production as reliance on full scale tests is significantly shifted to confident reliance on predictive models.   Another tangible benefit will be the ability to meet design criteria with preset confidence.

 

Recent research on the Polynomial Chaos approach to stochastic computational mechanics has enabled us to develop the concept of an error budget that permits the rational allotment of prediction error to experimental, computational, statistical, and modeling sources.  This error budget is the natural evolution of error estimation (originally developed for computational science) to the realm of prediction science.

 

This talk will review the Polynomial Chaos Expansion (PCE) approach to the analysis of stochastic systems.  Particular attention will be given to the assumptions underlying this approach as well as to the computational challenges facing its implementation.   Methods and algorithms will be detailed for addressing both these issues, thus opening the way to the application of PCE to very large scale engineering systems.  The path from stochastic predictions to model validation will also be outlined.

 

Using Computer Experiments and Understanding the Effect of Model Uncertainty in Engineering Design under Uncertainty

 

Wei Chen, Northwestern University

 

The effectiveness of using Computer Aided Engineering (CAE) tools to support design decisions is often hindered by the enormous computational costs of complex analysis models, especially when uncertainty is considered.  Approximations of analysis models, also known as “metamodels” built upon computer experiments are widely used for design concept exploration and optimization.  However, most existing approaches for metamodeling have been developed for deterministic optimization and are not applicable to design under uncertainty.  In this talk, recent developments of using computer experiments and metamodels for engineering design under uncertainty are discussed.  An efficient algorithm for constructing optimal design of computer experiments and the techniques for probabilistic sensitivity analysis (PSA) and uncertainty analysis (UA) via the use of metamodels are presented.  We also present a methodology developed within a Baysian framework for quantifying the impact of interpolation uncertainty due to the use of metamodels in robust design.  The Bayesian prediction interval approach provides a simple, intuitively appealing tool for distinguishing the best design alternative and conducting more efficient computer experiments in robust design.

 

 

Wei Chen, Northwestern University

(weichen@northwestern.edu)

 

Robust parameter design and Variation Reduction

 

  1. C. F. Jeff Wu, School of Industrial and Systems Engineering, Georgia Institute of Technology

 

Title: Improving calibration systems through designed experiments

 

Abstract: Taguchi (1987) advocates the use of designed experiments to improve measurement and calibration systems. In this paper we study some statistical aspects of the problem. An appropriate performance measure is derived that provides us with a deeper insight into Taguchi's signal-to-noise ratio. Two different modeling approaches, namely, performance measure modeling and response modeling are considered. The proposed approaches are illustrated and compared using an experiment on drive shaft imbalance. (Joint work with Arden Miller and Tirthankar Dasgupta)

 

 

  1. Daniel D. Frey, Mechanical Engineering and Engineering Systems, Massachusetts Institute of Technology

 

Title: Adaptive OFAT Applied to Robust Parameter Design

 

Abstract: Previous investigations have explored the performance of adaptive OFAT ("one factor at a time") experimentation establishing that the method exploits main effects with high probability and also tends to exploit two-factor interactions when they are large. The current study applies these results to robust parameter design.  A simple method is proposed in which resolution III factorial designs are used for an outer array of noise factors and adaptive OFAT is used for exploring control factors.  This approach exploits control by noise interactions with high probability and also tends to exploit control by control by noise interactions when they are large.  Model-based assessments and case studies suggest that this approach provides substantially more improvements than alternatives with similar run size including crossed resolution III  arrays and combined arrays with mininum J-aberration.  The approach also provides advantages in flexibility, use of prior knowledge, and costs due to control factor changes. 

 

  1. Judy Jin, Industrial and Operations Engineering, University of Michigan

 

Title: Variance Component Decomposition and Diagnosis for Batch Manufacturing Processes using ANOVA

 

Abstract:

In batch manufacturing processes, the total process variation is generally decomposed into batch-by-batch variation and within-batch variation.  Since different variation components may be caused by different sources, separation, testing and estimation of each variance component are essential to the process improvement.  Most of the previous SPC research emphasized on reducing variations due to assignable causes by implementing control charts for process monitoring. Different from this focus, this talk aims to analyze and reduce inherent natural process variations by applying the ANOVA method.  The key issue of using the ANOVA method is how to develop appropriate statistical models for all variation components of interest.   The paper provides a generic framework for decomposition of three typical variation components in batch manufacturing processes.  For the purpose of variation root causes diagnosis, the corresponding linear contrasts are defined to represent the possible site variation patterns and the statistical nested effect models are developed accordingly.  It shows that the use of a full factor decomposition model can expedite the determination of the number of nested effect models and the model structure.  Finally, an example is given for the variation reduction in the screening conductive gridline printing process for solar battery fabrication. 

 

 

Roshan Vengazhiyil, Georgia Institute of Technology

(roshan@isye.gatech.edu)

 

Bayesian Statistical Models in Pattern Recognition

 

Bayesian Robust analysis for Microarray large data sets

 

Speaker:

Pulak Gosh: Georgia State University, Statistics Department

 

Parametric gaussian distribution is very popular in identifying differentially expressed genes in microarray studies. However, this assumption may not be true for various reasons. For example, if the data contain outliers and skewness test based on gaussian distribution is not robust. We develop a robust Bayesian model for testing differentially expressed genes.

 

____________________________________________

 

Bayesian Mixture model for healthcare expenditure"

 

Munkin Murat: University of Tennessee, Department of Economics

 

Abstract:

 

This paper proposes a model to identify the pure treatment effect of private medical insurance on healthcare expenditure. We address two well-known challenges in dealing with expenditure data in our analysis. First, the insurance status is a choice variable and, therefore, modeling of endogeneity is necessary. Secondly, expenditure data are usually characterized by skewed distributions. We develop a finite mixture model with an unknown number of components to fit such expenditure data patterns. The model is applied to analyzing the effect of health insurance type on overall health care expenditure.

 

 

 

Regularized Mahalanobis distance-based clustering with Genetic algorithm

 

Speaker: James Wicker: University of Tennessee, Physics Department

 

Abstract: We propose a fast Genetic-Algorithm based clustering method that can separate complex structures efficiently. While this was based on the Genetic K-Means algorithm and the Hyperellipsoidal Clustering (HEC) algorithm, we demonstrate how this algorithm can separate more complex structures than the Genetic K-Means algorithm and converge faster than other HEC algorithms. Performance of the algorithm is tested on simulated and real data.

 

_________________________________________________

 

A multi-stage steering algorithm for Clustering large data sets

 

Aruma Buddana, SOMS Department, University of Tennessee

 

Clustering analysis is a renowned data mining technique, which involves dividing a large dataset into meaningful subclasses and thus extracting hidden patterns among the objects. But real time spatial data may not have any of this information available. The shape of the clusters can be very arbitrary such as spherical, linear, ellipsoidal, elongated and the clusters can be populated with as many as 100,000 points or as few as 10 points in a given time.

An automated clustering algorithm may not be sufficient to cluster this type of data. An iterative clustering algorithm along with the capability of visual steering may be a good approach. We propose a new iterative algorithm which is the combination of automated clustering methods like the Bayesian clustering, detection of multivariate outliers, and the visual clustering. Simulated data from a plasma experiment and real astronomical data are used to test the performance of the algorithm.

 

 

 

Halima Bensmail, University of Tennessee

(bensmail@utk.edu)

 

Bayesian Advances in Experimental Design and Analysis

 

Speakers:

Bradley Jones - SAS Institute

 

How Bayesian Thinking Can Help in Designing Experiments

 

The suitability of any given experimental design depends on the complexity of the model that proves to be adequate to describe the system being studied. In factor screening studies, the researcher is not sure which or even how many of the possible factors are driving the responses of interest. Given this level of uncertainty giving much consideration to model complexity may seem premature. Yet, choosing a design without any thought for how one is going to fit the data one obtains is ill advised. What to do?

 

The Bayesian paradigm provides a structure for designing experiments when there are a multiplicity of possible models to consider. This talk reviews some of the research applying Bayesian thinking for choosing designs. Topics include the initial choice of a screening design, augmentation of designs, and design for models that are nonlinear in the parameters.

 

-----------------------------------------------------------

 

Roshan Joseph Vengazhiyil - Georgia Institute of Technology

 

Title: Design and Analysis of Experiments Using Functionally Induced Priors
 
Specifying a prior distribution for the large number of parameters in the statistical model is a critical step in a Bayesian approach to the design and analysis of experiments. We show that the prior distribution can be induced from a functional prior on the underlying transfer function. The functionally induced prior requires the specification of only a few hyper-parameters and therefore can be easily implemented in practice. The prior incorporates the well-known principles such as effect hierarchy and effect heredity, which helps to resolve the aliasing problems in fractional designs almost automatically. The usefulness of the approach is demonstrated through the analysis of some experiments. We also propose a new class of design criteria and establish their connections with the minimum aberration criterion.

 

-------------------------------------------------------------------

 

Roselinde Kessels (Catholic University of Leuven, Belgium),

BradleyJones (SAS Institute, US),

Hans Nyquist(University of Stockholm, Sweden),

PeterGoos (University of Antwerp, Belgium)

MartinaVandebroek(Catholic University of Leuven, Belgium)

 

Title: Bayesian optimal design of choice experiments

Choice experiments are widely used in marketing to measure how the attributes of a product or service jointly affect consumer preferences. In a choice experiment, a product or service is represented by a combination of attribute levels called a profile. Respondents then choose one from a group of profiles called a choice set. The study design is a specified number of choice sets submitted to each respondent. Their preferences provide the basis for estimating the importance of each attribute. The knack of designing an efficient choice experiment involves selecting the choice sets that result in high-quality estimates. Recently, Kessels, Goos and Vandebroek(2006) developed a way to produce Bayesian G-and V-optimal designs for the multinomial logit model. These designs allow for precise response predictions which is the goal of choice experiments. The authors showed that the G-and V-optimality criteria outperform the D-and A-optimality criteria in terms of prediction capabilities. However, their G-and V-optimal design algorithm is computationally intensive, which is a barrier to their use in practice. In this talk, we compare the relative efficiencies of the designs created using various optimality criteria and introduce ways to speed up the calculation of the Bayesian G-and V-optimal designs.

 

Robert Mee, University of Tennessee

(rmee@utk.edu)

 

Reliability Analysis

 

 

Reliability: the other dimension of quality
Luis A. Escobar
Louisiana State University

luis@lsu.edu
 
Abstract
 

During the past twenty years, manufacturing industries have gone through a revolution in the use of statistical methods for product quality. Tools for process monitoring and, particularly experimental design, are much more commonly used today to maintain and improve product quality. A natural extension of the revolution in product quality is to turn focus to product reliability, which is defined as quality over time. This has given rise to programs like Design for Six Sigma. This talk discusses the relationship between engineering quality and reliability, plus outlines the role of statistics and statisticians in the field of reliability. A brief introduction to the statistical tools used in engineering reliability is provided and some predictions for the future of statistics in engineering reliability are made.

 

Bayesian approach on software reliability growth model
Dong Ho
Park
Hallym University
, KOREA

dhpark@sun.hallym.ac.kr
 
Abstract
 

   As far as the software system operates, it does not experience the degradation process as does the hardware system.  Instead, the software system stops operating due to the faults latent in the system; and thus, the software reliability can be improved by removing the faults during the testing phase.  The software reliability is defined as the probability of no failure-occurrence during a certain length of mission period. The fault detection and debugging process are essential to improve software reliability. In this talk, we discuss a new software reliability growth model, which is the extension of the one by Kimura, M., Toyota, T. and Yamada, S. (1999), Economic Analysis of Software Release Problems with Warranty Cost and Reliability Requirement, Reliability Engineering & System Safety, vol. 66, pp.49-55, and apply the Bayesian method to determine the optimal software release time, while minimizing the expected total software cost. Under this growth model, we assume that the intensity function is the mixture of reliability growth and constant reliability after the software is released to the user at the end of testing phase. To apply the Bayesian approach, we treat three parameters, the initial number of faults in the software, the fault detection rate and the weighted factor, as random variables and assign appropriate prior distributions.  Based on such an approach, we propose a Bayesian method to determine the best possible software release time, plus provide comparison with the non-Bayesian method.

.
 
 
 Using data mining tools of decision trees in quality and reliability applications: brief example on modern engineered wood
Timothy M. Young
Tennessee Forest Products Center
, University of Tennessee, tmyoung1@utk
 
Abstract

 
We provide guidance and warnings for using the important data mining tools of decision trees (DT) in quality and reliability applications. A recently developed DT called GUIDE (Generalized, Unbiased, Interaction Detection and Estimation) is discussed. GUIDE, modified with ANCOVA (Analysis of Covariance), modeling is compared to multiple linear regression approaches for assessing and improving reliability. A small case study in the international manufacture of modern engineered wood products is presented to illustrate the usefulness of GUIDE and DT.

 

 

 

Planning of Accelerated Degradation Tests Considering

Robust Design of Manufacturing Quality Parameters

 

Lingyan Ruan and Jye-Chyi Lu

 

School of Industrial and Systems Engineering,

Georgia Institute of Technology, Atlanta, GA 30332

 

Abstract

 

Typical experimental designs for accelerated life or degradation tests assume that products are from the same manufacturing condition.  In searching the best combination of controllable manufacturing variables product reliability is as important as quality characteristic, especially for electronic or semiconductor devices.  The literature in experimental designs considering both quality and reliability metrics is scarce.  This presentation proposes a framework of designing accelerated degradation tests for selecting manufacturing controllable variables that lead to longest product percentile lifetime, minimized variance of lifetime estimates and also least sensitivity of environmental noise factors creating variation of product reliability.


 

 

Organizer: Frank Guess, University of Tennessee

(fguess@utk.edu)

Chair: Stuart Hunter,

Professor Emeritus, Princeton University

 

 

Measurement Studies

 

Must be June 7

 

The Comparison of Two Measurement Devices

Joseph G. Voelkel

John D. Hromi Center for Quality and Applied Statistics

Rochester Institute of Technology

 

Abstract

 

Measurement devices sometimes have no reference standards with which they may be compared, and in these cases they are often compared to each other. This frequently occurs when a new type of device is built and is to be compared to the current best device.

       Frequently-used methods of comparison include regression, correlation, or the so-called Bland-Altman plotting method. We review some of these, including any shortcomings they may have. We also compare our problem to the Gage R&R studies that are commonly performed in industry.

       Under standard assumptions, we illustrate that the problem is non-identifiable when each device can only make one measurement on each unit. In the case where multiple measurements can be made for each device, we show how the devices may be compared by a sequence of likelihood-ratio tests. An example based on two devices that are used to measure intra-ocular pressure of the human eye is used to illustrate the technique.

       These methods and many of the results we present, while not new, do not appear to be commonly used

 

 

The Use of Factor Relationship Diagrams to Illustrate Implications of Restrictions on Randomization in the Unit Structure in Industrial Experimentation

 

By Cheryl R Hild

The University of Tennessee

 

Industrial experiments are often run in split plot mode for reasons of expediency. Often, the experimenter is unaware of restrictions on the error structure, especially when these restrictions are created by non-manipulated sources of variation that are confounded with factor combinations. The Factor Relationship Diagram (FRD) is a method of displaying the relationship between the manipulated and unmanipulated factors a priori to actually running a DOE. Answers to specific engineering questions, driven by the FRDs for various potential experimental strategies, can help the experimenter select the appropriate strategy for the given situation. The degree of certainty in the answers to these questions provides a mechanism for increasing faith in conclusions concerning whole plot effects, regardless of the availability of a formal statistical test.

 

This paper examines a philosophical motivation and methodology for using Factor Relationship Diagrams as a proactive method for developing questions that, when answered a priori to experimentation, develop understanding of those sources of variation contributing to this lack of precision due to restrictions on randomization in the error structure. This understanding can then lead to the selection of an appropriate experiment that mitigates the risks associated with drawing conclusions regarding the whole plot factors. An example is discussed based on the confounding of fixed factors in the measurement process not explicitly identified in the experimental strategy.

A Bayesian Analysis of Interval-Censored Failure Time Data with Measurement Error

 

Sarah Michalak

Michael Hamada

Nicolas Hengartner

 

Statistical Sciences Group, Los Alamos National Laboratory

 

Presenter:  Sarah Michalak

Statistical Sciences Group

Los Alamos National Laboratory

PO Box 1663, MS F600

Los Alamos, NM  87545

Tel:  505-667-2625

Email:  michalak@lanl.gov

 

Measurement error may lead to interval-censored failure data where the interval endpoints are not known exactly.  We consider data with this characteristic that were collected during an experiment assessing the susceptibility of a memory device to soft errors

resulting from cosmic-ray induced neutrons.  (A soft error is a

transient error, i.e., bit flip, that causes no permanent damage to

the memory device.)  We use a Weibull model and take a Bayesian

approach to the analysis.

 

Joseph Vockel, Rochester Institute of Technology

(jgvcqa@rit.edu)

 

Design of Experiments for Discrete Event Simulation

 

Controlled Sequential Factorial Design for SImulation Factor Screening
        Hua Shen and Hong Wan,  Purdue University


Abstract:

 

 We propose controlled sequential factorial design for
discrete-event simulation factor screening. It combines a sequential
hypothesis-testing procedure with the traditional factorial design to
control the Type I Error and power for each factor under heterogeneous variances conditions. The method requires minimum assumptions and demonstrates robust performance with different system conditions.

 

 

An Adaptive Method for Factor Screening for Simulation Experiments
Bruce Ankenman, Northwestern University.

Russell Cheng, and  Sue Lewis, University of Southampton


Abstract:

The sequential method is based on an orthogonal array, but assumes that factor effects have a known direction.  The rows of the orthogonal array are run in a strategic order to allow for group factor screening of these factors. An interior point quadratic programming technique is used to get constrained estimates of the factors and quickly eliminate any groups of factors with null effects allowing all factor effects to be estimated before completing the orthogonal array.  The method will be applied to screening for both location and dispersion effects and will be compared with competing methods such as sequential bifurcation.

 

DOE for Fitting Forward and Inverse Simulation Metamodels

Russell Barton, Penn State University

 

Abstract:

Simulation models predict system performance as a function of one or more design variables.  These models generally operate in a sense that is opposite to the design objective:  given desired performance, identify appropriate values for the design variables.  In many cases there are multiple simulation outputs of interest, and the possibility exists for determining an explicit inverse map that would provide design variable values to produce (approximately) the desired output performance.  Experiment design strategies for this problem will be presented.

 

Bruce Ankenman, Northwestern University

(ankenman@northwestern.edu)

 

Statistics and Information Technology

 

Mixture Modeling with Spatial Components for Active Delay Tomography

Earl Lawrence, Statistics Group, Los Alamos National Lab

 

The field of active network tomography is concerned with the estimation of

link-level performance measures, e.g. delay distributions for packets on a

link, based on measured end-to-end performance of injected traffic, e.g.

total path delay for a probe packet.  One area of continuing research is

the choice of appropriate distributional forms for estimating delay as

many standard parametric distributions are inappropriate.  In particular,

most parametric distributions are inadequate for modeling the tail

behavior of delay distributions.  This talk will explore the use of

mixture modeling to overcome this limitation.  Mixture models provide a

flexible tool for capturing overall shape and tail behavior.  Further, we

will consider the modeling of spatial correlations in order to account for

traffic similarity on neighboring links, a problem much ignored in the

current literature.  Applications to real and simulated data will be

considered.

 

The Characteristics of Voice over IP Traffic

Bowei Xi, Department of Statistics, Purdue University

 

Voice over Internet Protocol (VoIP) is a new and fast developing

technology. Voice data, traditionally carried by the "public switched

telephone network", are transmitted along with other applications on the

IP network. An empirical study of the VoIP data collected from the Global

Crossing Network is presented. Several key factors that play a critical

role in traffic engineering are examined: the multiplexed packet process,

call arrivals and duration distributions, and silence suppression. They

exhibit distinctively different characteristics than the traditional

telephone call traffic.

 

Local-Vote Decision Fusion for Target Detection in Wireless Sensor Networks.

Natallia Katenka, Department of Statistics, The University of Michigan

 

In this talk, we examine the problem of target detection by a wireless sensor network. Sensors acquire measurements emitted from the target that are corrupted by noise and initially make individual decisions about the presence/absence of the target. We propose the Local-Vote Decision Fusion algorithm, in which sensors first correct their decisions using decisions of neighboring sensors, and then make a collective decision as a network. We show that, for a fixed system false alarm, this local correction achieves significantly higher target detection rate. We examine both distance- and nearest neighbor-based versions of the algorithm for grid and random sensor deployments. Further, an explicit formula that approximates the decision threshold for a given false alarm rate is derived, using limit theorems for random fields. 

 

This is joint work with Liza Levina and George Michailidis

 

George Michailidis, University of Michigan

(gmichail@umich.edu)

 

Simulation of  Supply Chains

Speaker:

Kenneth Gilbert,, University of Tennessee.

Abstract:

This session is an experiential simulation of a multi-stage supply chain and a tutorial on ARIMA models of supply chains.  It does not assume any prior knowledge of supply chain models and requires only a rudimentary understanding of time series models.

In the simulation the participants will play the roles of managers in a supply chain. Each will manage an inventory by placing orders with an upstream supplier while filling orders for a downstream customer. The simulation will illustrate the dynamics of multistage supply chains. Then we will demonstrate how Autoregressive Integrated Moving Average Models ARIMA can be used to model these dynamics. Specifically if the customer demand can be characterized as an ARIMA time series, then for a rather general class of ordering policies, the ARIMA time series of the orders and the inventories at each of the upstream stages can be derived. These models can be used to predict the performance of the supply chain and to derive optimal ordering policies.

 

 

Kenneth Gilbert, University of Tennessee

(kgilber1@utk.edu)

 

Six Sigma: What is Missing?

Speakers

1. Roger Hoerl, GE Global Research
Title:  What is Missing in Six Sigma, and What Should We Do About It?


Abstract:

.Six Sigma has been a tremendously successful improvement initiative for close to 20 years now.  Despite its unparalleled success, however, it has its limitations like any other initiative.  In this talk we will attempt to separate the hype from the facts, and better understand what Six Sigma is and is not.  Based on this analysis, we plan to identify improvement opportunities, and suggest ways in which these opportunities might be captured.  In summary, it will be argued the Six Sigma was never designed to be an overall quality management system, and even within the arena of project-by-
 project improvement, it has a "sweet spot", outside of which it
generally is not the best option.

 

Doug Zahn, Statistical Consultant and Coach
Title:  Six Sigma: What's Missing?  Applying it to Ourselves!

Six Sigma is a complex system, including teaching, research, design of experiments, data analysis, consulting, and administration. At the heart of this system is encounter—a purposeful meeting of a Six Sigma professional with another person: colleague, student, client, supervisor, supplier, customer, or member of staff. An encounter is a process consisting of five steps: preparing, beginning, working, ending, and reviewing. There is variation in this process as not all encounters are effective. To understand this variation and systematically reduce the number of ineffective encounters, gather primary data on the event by videotaping it. Analyze the data by using three lenses (interpersonal, intrapersonal, and technical) to learn how to identify and recover from the breakdowns that naturally occur in encounters. I will give you an opportunity to learn how to apply this process to one of your current tough problems by using a videotape of an actual consultation.

 

 
Discussant:
Bill Parr,
University of Tennessee
Extensive floor discussion

 

William Parr, University of Tennessee

(wparr@utk.edu)

 

Computational Techniques for Statistical Inference

 

Talk 1.

 

Improvements on the ROC Curve: Skill Plots for Forecast Evaluation.

 

Dr. William M. Briggs

 

Weill Cornell Medical College 525 E. 68th, Box 46, New York, NY

10021 wib2004@med.cornell.edu

 

We start by reviewing the ROC curve, a standard method in the

literature for evaluating a diagnosis or forecast. Next, the skill

score and skill test of Briggs and Ruppert(2005) are introduced and

advantages of this new technique are discussed.

 

With this background, we apply the skill score to simple

discrimination problems with a single variable. In this context, we

prove that the skill maximizing decision rule for problems such as

classifying patients with a disease coincides with Bayes Rule for

optimal classification. This same separation rule is also indicated

as optimal by ROC curve analysis.

 

Finally, we address the question of inference for this optimal

point, called x_max, and construct two types of confidence

intervals. The first interval is a likelihood ratio interval based

on inverting the skill test mentioned above.  A second interval

based on bootstrapping a logistic regression model is also

introduced and a small coverage study is performed to evaluate the

precision of the estimated optimal cutoff.



Talk 2.

 

Dr. Matthew Tom

 

Emmanual College Dept. of Mathematics and Statistics. Boston,

Massachusetts,

 

A Test for Two Poisson Poisson Processes in the Presence of

Background Events

 

Testing whether the means of two Poisson random

variables have a fixed ratio lambda is a well-known and solved

problem.  The model has sufficient statistics and conditional

inference is possible.  If instead each Poisson mean is a mixture of signal parameter and  a known background noise parameter then we lose

sufficiency and testing the hypothesis that the signal parameters are equal becomes more difficult. In this talk, we will look at different exact tests we can use to compare signal parameters despite the background noise. As an example, we will look at an application from cosmic ray particle physics.

 

Talk 3.

 

Russell Zaretzki

 

Department of Statistics, University of Tennessee, Knoxville,

rzaretzk@utk.edu

 

A Parametric Bootstrap Likelihood Ratio Statistic for time censored

data with applications in Reliability.

 

Building on the work of Jeng, Lahiri and Meeker(2005), we consider

bootstrap based likelihood ratio inference for time censored data. A

simulation study based on data from a Weibull distribution under

Type I censoring computes finite sample coverage probabilities for

bootstrap based inference of a modified signed root statistic. The

results are contrasted with the better performing methods discussed

by Jeng, Lahiri and Meeker such as the ordinary bootstrap signed

root statistic and the bootstrap-t.

 

Heuristic explanations are given to explain why the modified

bootstrap may outperform the ordinary bootstrap.

 

Russell Zaretzki, University of Tennessee

(rzaretzk@utk.edu)

Pannel Discussion:

Role of Optimal Design for Quality Improvement in the 21st  Century

Chair: Robert Mee, University of Tennessee, Knoxville

·         Introduction                 5 Minutes     
Robert  Mee
University of Tennessee,

·         Commentary              15 Minutes  
Chris Natchtsheim

Curtis L. Carlson Professor of Operations and Management Science

Chair, Operations and Management Science Department

Carlson School of Management

University of Minnesota

·         Commentary              15 Minutes     
 
G. Geoffrey Vining

Professor and Head
Department of Statistics              
Virginia Tech   

·         Commentary              15 Minutes     
 
Jeff Wu

Georgia Institute of Technology

School of Industrial and Systems Engineering

·         Commentary              15 Minutes      
Brad Jones
JMP Developer
SAS Institute Inc.

·         Commentary:             15 Minutes      
Dennis Lin
University
Distinguished Professor

Pennsylvania State University

·         Floor discussion       10 Minutes

Ramón V. León, University of Tennessee

(rleon@utk.edu)

Statistical Methods for Analysis of Microarray Data

 

Speaker:

 

Al Bartolucci, Ph.D.
Department of Biostatistics,
University of Alabama at Birmingham


Illustrating the Usefulness of a Mixture Model for Analysis of Microarray Gene Expression Data

 

Al Bartolucci1, David B. Allison1, Sejong Bae2,   Karan P. Singh2

 

1.      Section on Statistical Genetics, Department of Biostatistics                  University of Alabama at Birmingham, Birmingham, AL 35294-0022

2.      Department of Biostatistics, School of Public Health                                                                    University of North Texas Health Science Center at Fort Worth  Fort Worth, TX 76107-2699

 

Abstract:

 

There is no doubt that the analysis of microarray data remains a challenge as one wish to investigate the possibility of expressive genes in a sample of thousands of such genes. Naturally the issue of multiplicity arises as one examines the significance of large numbers of genes. Recently, one of the coauthors, DBA, and colleagues developed a mixed model approach to this very problem with successful application to a mouse data model. In this particular setting one circumvents the false positive issue using a mixture distribution of the p-values. Simultaneously one addresses several issues such as 1) whether we have any statistically significant evidence in any of the genes, 2) what is the best estimate of the number of genes in which there is a true difference in gene expression?, 3) is there a threshold which signals a criteria above which genes should be investigated further?, and  4) what is the possible proportion of false negatives in those genes declared “not interesting” ?

 

This paper investigates this procedure further and illustrates its usefulness and relevance in the current work on microarray data analysis.

 

 


Speaker:

Yoonkyung Lee, Ph.D.
Department of Statistics,
Ohio state University

 

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

 

Abstract:

Cancer diagnosis or prognosis based on gene expression profiles has been studied as a potentially more accurate means of predicting the disease status than standard methods based on histological observations. Presence of much larger number of genes than the sample size in the problem poses a challenge in building reliable and interpretable classification schemes. This talk will present a sparse solution approach for simultaneous gene selection and classification via component penalization of Support Vector Machines. The proposed method selects relevant genes in a principled way by taking into account their joint effects, remedying the limitation of common approaches of filtering genes marginally. Real data analysis will be given for illustration of the method and related issues will be discussed.

 

______________________________________________



Speaker:

David B. Dahl, Ph.D.
Department of Statistics,
Texas A & M University:

 

Using Clustering to Enhance Hypothesis Testing

 

Both multiple hypothesis testing and clustering have been the subjects of extensive research for genomic and other high dimensional data, yet they have traditionally been treated separately.  We propose a hybrid statistical methodology that uses clustering information to increase testing sensitivity.  A test for an object that uses data from all objects clustered with it will be more sensitive than one that uses data from this object in isolation.  While the true clustering is unknown, there is increased power if the clustering can be estimated relatively well.  We first consider a simplified setting which compares the power of the standard Z-test to the power of a test using an estimated cluster.  Theoretical results show that if the cluster is estimated sufficiently well, the new procedure is more powerful.  In the setting of gene expression data, we develop a model-based analysis using a carefully formulated conjugate Dirichlet process mixture model.  The model is able to borrow strength from objects likely to be clustered.  Simulations reveal this new method performs substantially better than its peers. The proposed model is illustrated on a large microarray dataset.

 

 

 

____________________________________________________


Speaker:

Arnold Saxton, Ph.D.
University of Tennessee-Knoxville and Oak Ridge National Lab

 

Statistical Tools are Needed for Microarray Expression and Co-expression Information

 

Arnold M. Saxton, Brynn H. Voy and Michael A. Langston
Genome Science & Technology Program,
University of Tennessee and

Oak Ridge National Lab
Knoxville, TN  37996


Abstract:

Microarray technology provides a measure of the activity of virtually all genes in an organism's genome, for example the estimated 25,000 genes in the mouse genome.  Typical experiments produce millions of observations, and a complex statistical process has evolved
to extract meaning from the data.  The technology is noisy, with CV's of 100%, and correction for background noise, removal of outliers, and loess correction of scanned intensity readings are essential. Following this "normalization", standard statistical models can be used to identify treatment differences (differential expression), and correction for multiple testing (25,000 tests!) is clearly needed.

We will briefly describe the statistical procedures we have developed for differential expression. We will then discuss our current interests in extracting coexpression information to study multivariate biological pathways activated in response to a treatment or condition.  Note that one array measures simultaneous activities of ~ 22,000 genes, and several arrays then allow a 22,000 by 22,000 correlation matrix to be estimated. We have used graph algorithms to identify "cliques", groups of genes that are strongly and completely inter-correlated.  These cliques then must be compared among experimental treatments. The many statistical problems that arise will
be illustrated.



 

 

Speaker:


Don Kulasiri, Ph.D.
Centre for Advanced Computational Solutions (C-fACS)
Lincoln University, New Zealand

A Review of Evolving Clustering Methods for Microarray data Analysis

 

Abstract:

 

Microarray data analysis involves various statistical and computational methods including principal component analysis, k-means clustering, neural networks, and self-organised maps. We review these methods within the context of microarray data analysis. An application of evolving clustering methods based on neural networks is discussed.

 

 

Karan Singh

(ksingh@hsc.unt.edu)

Karan P. Singh, Ph.D.

Professor and Chair

Department of Biostatistics and Epidemiology

School of Public Health, CBS 343

University of North Texas Health Science Center at Fort Worth

3400 Camp Bowie Blvd.

Fort Worth, TX 76107, USA

Phone: (817) 735-0490

Fax #: (817) 735-2314

E-mail Address: ksingh@hsc.unt.edu  

 

Technometrics Session.

 

1)             Speaker: Mu Zhu

                                Department of Statistics and Actuarial Science

                                University of Waterloo

                                Waterloo, ON N2L 3G1

                                Canada

                                m3zhu@uwaterloo.ca

 

Title:       LAGO: A Computationally Efficient Approach for Statistical

                Detection

 

                Co-authors:           Wanhua Su

                                                Department of Statistics and Actuarial Science

                                                University of Waterloo

                                                Waterloo, ON N2L 3G1

                                                Canada

 

                                                Hugh A. Chipman

                                                Department of Mathematics and Statistics

                                                Acadia University, Wolfville, NS B4P 2R6

                                                Canada

                                               

                Abstract: We study a general class of statistical detection problems where the underlying objective is to detect items belonging to a rare class from a very large database. We propose a computationally efficient method to achieve this goal. Our method consists of two steps. In the first step, we estimate the density function of the rare class alone with an adaptive bandwidth kernel density estimator. The adaptive choice of the bandwidth is inspired by the ancient Chinese board game known today as Go. In the second step, we adjust this density locally depending on the density of the background class nearby. We show that the amount of adjustment needed in the second step is approximately equal to the adaptive bandwidth from the first step, which gives us additional computational savings. We name the resulting method LAGO for “locally adjusted Go-kernel density estimator." We then apply LAGO to a real drug discovery data set and compare its performance with a number of existing and popular methods.

 

_________________________________________________________-

 

2)             Speaker: David Mease

                                Department of Marketing and DecisionSciences

                                San Jose State University

                                San Jose, CA 95192-0069

                                mease d@cob.sjsu.edu

 

Title:       Latin Hyper-Rectangle Sampling for Computer Experiments

               

                Co-author:             Derek Bingham

                                                Department of Statistics and Actuarial Science

                                                Simon Fraser University

                                                Burnaby, BC, CANADA V5A 1S6

                                        dbingham@cs.sfu.ca

                         

                Abstract: Latin hypercube sampling is a popular method for evaluating the expectation of functions in computer experiments. However, when the expectation of interest is taken with respect to a non-uniform distribution, the usual transformation to the probability space can cause relatively smooth functions to become extremely variable in areas of low probability. Consequently, the equal probability cells inherent in hypercube methods often tend to sample

an insufficient proportion of the total points in these areas. In this talk we introduce Latin hyper-rectangle sampling to address this problem. Latin hyper-rectangle sampling is a generalization of Latin hypercube sampling which allows for non-equal cell probabilities. A number of examples are given illustrating the improvement of the proposed methodology over Latin hypercube sampling with respect to the variance of the resulting estimators. Extensions to orthogonal-array based Latin hypercube sampling, stratified Latin hypercube sampling and scrambled nets are also described.

 

 

 

 

Randy Sitter <sitter@cs.sfu.ca>

Technometrics Editor