2月 122018
 

In the hype and excitement surrounding artificial intelligence and big data, most of us miss out on critical aspects related to collection, processing, handling and analyzing data. It's important for data science practitioners to understand these critical aspects and add a human touch to big data. What are these aspects? [...]

Why is it important to add a human touch to big data? was published on SAS Voices by Jay Paulson

11月 052016
 

bhutanGalit Shmueli, National Tsing Hua University’s Distinguished Professor of Service Science, will be visiting the SAS campus this month for an interview for an Analytically Speaking webcast.

Her research interests span a number of interesting topics, most notably her acclaimed research, To Explain or Predict, as well as noteworthy research on statistical strategy, bio-surveillance, online auctions, count data models, quality control and more.

In the Analytically Speaking interview, we’ll focus on her most interesting Explain or Predict work as well as her research on Information Quality and Behavioral Big Data, which was the basis of her plenary talk at the Stu Hunter conference earlier this year. I'll also ask about her books and teaching.

Galit has authored and co-authored many books, two of which — just out this year — include some JMP. First is Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro, with co-authors, Peter C. Bruce, Nitin R. Patel, and Mia Stephens of JMP. This first edition release coincides with the third edition release of Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner, with the first two co-authors listed above. As Michael Rappa says so well in the foreword of the JMP Pro version of the book, “Learning analytics is ultimately about doing things to and with data to generate insights.  Mastering one's dexterity with powerful statistical tools is a necessary and critical step in the learning process.”

The second book is Information Quality: The Potential of Data and Analytics to Generate Knowledge, which Galit co-authored with Professor Ron S. Kenett, CEO and founder of KPA and research professor at the University of Turin in Italy (you may recognize Ron and KPA colleagues as guest bloggers on the JMP Blog on the topic of QbD). As David Hand notes in his foreword, the book explains that “the same data may be high quality for one purpose and low quality for another, and that the adequacy of an analysis depends on the data and the goal, as well as depending on other less obvious aspects, such as the accessibility, completeness, and confidentiality of the data.”

Both Ron and Galit will be plenary speakers at Discovery Summit Prague in March. You can download a chapter from their book, which discusses information quality support with JMP and features an add-in for Information Quality, both written by Ian Cox of JMP. You can see a short demo of JMP support for information quality during the Analytically Speaking webcast on Nov. 16.

Whether your analysis is seeking to explain some phenomena and/or to make useful predictions, you will want to hear Galit’s thoughtful perspective on the tensions between these two goals, as well as what Galit has to say on other topics up for discussion. Join us! If Nov. 16 doesn’t suit your schedule, you can always view the archived version when convenient.

tags: Analytically Speaking, Analytics, Books, Discovery Summit, Statistics

The post To explain or predict with Galit Shmueli appeared first on JMP Blog.

11月 022016
 

9781629596709_frontcoverRonald Snee and Roger Hoerl have written a book called Strategies for Formulations Development. It is intended to help scientists and engineers be successful in creating formulations quickly and efficiently.

The following tip is from this new book, which focuses on providing the essential information needed to successfully conduct formulation studies in the chemical, biotech and pharmaceutical industries:

Although most journal articles present mixture experiments and models that only involve the formulation components, most real applications also involve process variables, such as temperature, pressure, flow rate and so on. How should we modify our experimental and modeling strategies in this case? A key consideration is whether the formulation components and process variables interact. If there is no interaction, then an additive model, fitting the mixture and process effects independently, can be used:

c(x,z) = f(x) + g(z), where 1

f(x) is the mixture model, and g(z) is the process variable model. Independent designs could also be used. However, in our experience, there is typically interaction between mixture and process variables. What should we do in this case? Such interaction is typically modeled by replacing the additive model in Equation 1 with a multiplicative model:

c(x,z) = f(x)*g(z) 2

Note that this multiplicative model is actually non-linear in the parameters. Most authors, including Cornell (2002), therefore suggest multiplying out the individual terms in f(x) and g(z) from Equation 2, creating a linear hybrid model. However, this tends to be a large model, since the number of terms in linearized version of c(x,z) will be the number in f(x) times the number in g(z). In Cornell’s (2002) famous fish patty experiment, there were three mixture variables (7 terms) and three process variables (8 terms), but the linearized c(x,z) had 7*8 = 56 terms, requiring a 56-run hybrid design.

Recent research by Snee et al. (2016) has shown that by considering hybrid models that are non-linear in the parameters, the number of terms required, and therefore the size of designs required, can be significantly reduced, often on the order of 50%. For example, if we fit equation 2 directly as a non-linear model, then the number of terms to estimate is the number in f(x) plus the number in g(z); 7 + 8 = 15 in the fish patty case. Snee et al. (2016) showed using real data that this approach can often provide reasonable models, allowing use of much smaller fractional hybrid designs. We therefore recommended an overall sequential strategy involving initial use of fractional designs and non-linear models, but with the option of moving to linearized models if necessary.

Continue reading »

11月 012016
 

When designing an experiment, a common diagnostic is the statistical power of effects. Bradley Jones has written a number of blog posts on this very topic. In essence, what is the probability that we can detect non-negligible effects given a specified model? Of course, there are a set of assumptions/specifications needed in order to do this, such as effect sizes, error, and significance level of tests. I encourage you to read some of those previous blog posts if you’re unfamiliar with the topic.

If our response is continuous, and we are assuming a linear regression model, we can use results from the Power Analysis outline under Design Evaluation. However, what if our response is based on pass/fail data, where we are planning to do 10 trials at each experimental run? For this response, we can fit a logistic regression model, but we cannot use the results in the Design Evaluation outline. Nevertheless, we’re still interested in the power...

What to do?
We could do a literature review to see about estimating the power, and hope to find something that applies (and do so for each specific case that comes up in the future). But, it is more straight-forward to run a Monte Carlo simulation. To do so, we need to be able to generate responses according to a specified logistic regression model. For each of these generated responses, fit the model and, for each effect, check if the p-value falls below a certain threshold (say 0.05). This has been possible in previous versions of JMP using JSL, but requires a certain level of comfort with scripting and in particular scripting formulas and extracting information from JMP reports. Also, you need to find the time to write the script. In JMP Pro 13, you can now perform Monte Carlo simulations with just a few mouse-clicks.

That sounds awesome
The first time I saw the new one-click simulate, I was ecstatic, thinking of the possible uses with designed experiments. A key element needed to use the one-click simulate feature is a column containing a formula with a random component. If you read my previous blog post on the revamped Simulate Responses in DOE, then you know we already have a way to generate such a formula without having to write it ourselves.

1. Create the Design, and then Make Table with Simulate Responses checked
In this example, we have four factors (A-D), and plan an eight-run experiment. I’ll assume that you’re comfortable using the Custom Designer, but if not, you can read about the Custom Design platform here. This example can essentially be set up the same way as an example in our documentation.

Before you click the Make Table button, you need to make sure that Simulate Responses has been selected from the hotspot at the top of the Custom Design platform.

emppower1

2. Set up the simulation
Once the data table is created, we now have to setup our simulation via the Simulate Response dialog described previously. Under Distribution, we select Binomial, with and set N to 10 (i.e. 10 trials for each row of the design). Here, I’ve chosen a variety of coefficients for A-D, with factor D having a coefficient of 0 (i.e., that factor is inactive). The Simulate Response dialog I will use is:

emppower2

Clicking the Apply button, we get a Y Simulated column simulating the number of successes out of 10 trials, and a column indicating the number of trials (which is used in Fit Model). For modeling purposes, I copied the Y Simulated column into Y.

emppower3

If we look at the formula for Y Simulated, we see that it can generate a response vector based on the model given in the Simulate Responses dialog.

emppower4

3. Fit the Model
Now that we have a formula for simulating responses, we need to set up the modeling for the simulated responses. In this case, we want to collect p-values for the effects from repeated logistic regression analyses on the simulated responses. We first need to do the analysis for a single response. If we launch the Fit Model platform, we can add the number of trials to the response role (Y), and change the Personality to Generalized Linear Model with a Binomial Distribution. My Fit Model launch looks like this:

emppower5

Click the Run button to fit the model. The section of the report that we’re interested in is the Parameter Estimates outline.

emppower6

For the initial simulated response, A, B, and C were found to be active, and D was not (which is correct). Of course, this is just for a single simulation. We could keep simulating a new response vector, and keeping track of these p-values for each effect, or, we could use one-click simulate and let it do this for us.

4. Run the simulation
The column we’re interested in for this blog post is the Prob>ChiSq. We right-click on that column to bring up the menu, and (if you have JMP Pro), at the bottom, above bootstrap, we see an option for Simulate.

emppower7

The dialog that pops up has a choice for Column to Switch Out and a Choice for Column to Switch In. For our simulations, instead of using the response Y, we want to use Y Simulated, as it contains the formula with the Random Binomial. Instead of using Y when we first used Fit Model, we could have instead used Y Simulated, and switch it out with itself. The Number of Samples refers to how many times to simulate a response. Here I’ve left it at 2500.

emppower8

Now we just click OK, and let it run. After a short wait, we’re presented with a data table containing a column for each effect from the Fit Model dialog (as well as a simulation ID, SimID), and 2501 rows – the first is the original fit, and marked as excluded, while each other row corresponds to the results from one of our 2500 simulated responses. The values are the p-values for each effect from Fit Model.

emppower9

The one-click Simulate has also pre-populated the data table with a distribution script, and, because it recognizes the results are p-values, another script called Power Analysis. Running the Power Analysis script provides a distribution of the p-values for each effect, as well as a summary titled Simulated Power with the rejection rate at different levels of alpha. For example, if we look at the effect of factor B, we see that at alpha = 0.05, 2103 times out of 2500 the null hypothesis of no effect was rejected for a rejection rate (empirical power) of about 84%.

emppower10

I typically right-click on one of the Simulated Power tables, and select Make Combined Data Table. This provides a data table that provides the rejection rates for each term at the four different alpha levels. This makes it easier to view the results in Graph Builder, such as the results for alpha = 0.05.

emppower11

Now we can see that we have high power to detect the effects for A and B (recall that they had the largest coefficients), while C and the intercept are around 50%. Since D was inactive in our simulation, the rejection rate is around 0.05, as we would expect. We may be concerned with 50% power assuming that the effect of C is correct. With the ease of being able to perform these simulations, it’s simple to go back to Simulate Reponses and change the number of trials for each row of the design before running another one-click simulation. Likewise, we could create a larger design to see how that affects the power. We could even try modeling using generalized regression with a binomial distribution.

Final thoughts
To me, a key aspect of this new feature is that it allows you to go through different “what if” scenarios with ease. This is especially true if you are in a screening situation, where it’s not unusual to be using model selection techniques when analyzing data. Now you can have empirical power calculations that match the analysis you plan to use, and help alert you to pitfalls that can arise during analysis. While this was possible prior to JMP 13, I typically didn’t find the time to create a custom formula each time I was considering a design. In the short time I’ve been using the one-click simulate, the ease with which I can create the formula and run the simulations has led me to insights I would not have gleaned otherwise, and has become an important tool in my toolbox.

tags: Design of Experiments (DOE), JMP 13, Power Analysis, Simulate Responses, Statistics

The post Empirical power calculations for designed experiments with 1-click simulate in JMP 13 appeared first on JMP Blog.

10月 242016
 

The Simulate Responses feature throughout various design of experiments (DOE) platforms has always been a useful tool for generating a set of responses according to a specified model. I use it frequently for the simulated responses in Fit Model (or other appropriate platforms), as a way to check that the model is being fit as expected. Prior to JMP 13, Simulate Responses had limitations:

  • Simulation was limited to linear regression models with normal errors.
  • The ability to simulate responses was tied to the DOE window and the Simulate Responses window. If you closed either window, you would have to make a new data table to simulate responses again.
  • If you wanted to run a Monte Carlo simulation study using simulated responses (that is, simulating a large number of responses from the specified model and collecting results), there was no easy way to do so using the simulated responses from the DOE platform.

Simulate Responses in JMP 13

The look and feel of the Simulate Responses dialog remains the same in JMP 13. But to address the limitations I mentioned above, some new features have been added. That's the focus of the rest of this post.

Different distributions for the response

There are times when conducting an experiment that the response is not continuous, but instead either pass/fail or count data. In JMP 13, in addition to a linear regression model with normal errors, you now also have the ability to simulate responses that follow a Binomial or Poisson distribution.

simres_p01

Relaunching the Simulate Responses dialog

As discussed above, because Simulate Responses was tied to the DOE platform, there was no easy way to relaunch the Simulate Responses dialog once either was closed. In JMP 13, Simulate Responses is tied to the data table. A table script, called DOE Simulate, relaunches the simulation dialog.

simres_p02

Automatic formula creation

In my view, the most powerful new aspect of the revamped Simulate Responses can be easy to miss. I’ll demonstrate with a simple example, a 12-run Custom Design with four continuous factors (X1-X4). When you select Simulate Responses from the hotspot in a DOE platform and you create a data table, you now end up with two columns for each response that have the same values initially. The second one, Y Simulated (where Y is the name of the response), gets updated each time the Apply button is clicked in the dialog. Why the need for two columns?

simres_p03

If you right-click on the column name, you see that Y Simulated is actually a Formula Column. The column Y has no formula – it is simply filled in during the data table creation -- the idea being that you fill your own response values there when collecting data.

simres_p04

If you examine the formula, you see the responses are generated based on the model specified in the Simulate Responses dialog:

simres_p05

This means that you can also simulate responses by clicking the Apply button within the formula editor. Note that this formula is automatically created by the Simulate Responses dialog. Suppose you change the coefficients and distribution, like this:

simres_p06

When you click the Apply button, your Y Simulated column now has count data.

simres_p07

The script updates the underlying formula to reflect the new model:

simres_p08

Simulation studies

Why am I so excited about this aspect of Simulate Responses? Simulation studies.
For users comfortable using JSL, the automatic formula only requires the use of Eval Formula to simulate a new response, and then gathering the desired information via scripting. What’s more, if you have JMP Pro 13, this formula allows you to perform one-click Monte Carlo simulations (akin to the one-click bootstrap).

Stay tuned for a blog post by Ryan Parker explaining how to use this new Simulate feature. In a future blog post, I’ll show just how easy it is to perform empirical power calculations using the new Simulate Responses with one-click Simulate.

tags: Analytics, Design of Experiments (DOE), JMP 13, Simulate Responses, Statistics

The post Simulate Responses is revamped to be more useful in JMP 13 appeared first on JMP Blog.

10月 222016
 

In our previous blog post, we wrote about using designed experiments to develop analytic methods. This post continues the discussion of analytic methods and shows how a new type of experimental design, the Definitive Screening Design[1] (DSD), can be used to assess and improve analytic methods.

We begin with a quick review of analytic methods and a brief summary of the experiment described in that previous blog post, and then show what is learned by using a DSD.

What are analytic methods?

Analytic methods are used to carry out essential product and process measurements. Such measurement systems are critical in the pharmaceutical industry where understanding of the process monitoring and control requirements are important for developing sound analytic methods. The typical requirements for evaluating analytic methods include:

  • Precision: This requirement makes sure that method variability is only a small proportion of the specifications range (upper specification limit – lower specification limit).
  • Selectivity: This determines which impurities to monitor at each production step and specifies design methods that adequately discriminate the relative proportions of each impurity.
  • Sensitivity: This relates to the need for methods that accurately reflect changes in CQA's that are important relative to the specification limits, which is essential for effective process control.

QbD implementation in the development of analytic methods is typically a four-stage process addressing both design and control of the methods[2]. The stages are:

  1. Method Design Intent: Identify and specify the analytical method performance.
  2. Method Design Selection: Select the method work conditions to achieve the design intent.
  3. Method Control Definition: Establish and define appropriate controls for the components with the largest contributions to performance variability.
  4. Method Control Validation: Demonstrate acceptable method performance with robust and effective controls.

We continue here the discussion of how to use statistically designed experiments to achieve robustness, which we began in our previous blog post.

A case study in HPLC development

The case study we presented in our previous post concerns the development of a High Performance Liquid Chromatography (HPLC) method[3]. The specific system consists of an Agilent 1050, with a variable-wavelength UV detector and a model 3396-A integrator. Table 1 lists the factors and their levels used in the designed experiments of this case study. The original experimental array was a 27-4 Fractional Factorial experiment with three center points (see Table 2). The levels "-1" and "1" correspond to the lower and upper levels listed in Table 1, and "0" corresponds to the nominal level. The lower and upper levels are chosen to reflect variation that might naturally occur about the nominal setting during regular operation.

Table 1. Factors and levels in HPLC experiments.

Table 1. Factors and levels in HPLC experiments.

 

Table 2. Original Fractional Factorial experimental array for HPLC experiment.

Table 2. Original Fractional Factorial experimental array for HPLC experiment.

The fractional factorial experiment (Table 2) consists of 11 runs that combine the design factor levels in a balanced set of combinations, including three center points.

What do we learn from the fractional factorial experiment?

In our previous post, we analyzed the data from the factorial experiment and found that the experiment provided answers to several important questions:

  • How sensitive is the method to natural variation in the input settings?
  • Which inputs have the largest effect on the outputs from the method?
  • Are there different inputs that dominate the sensitivity of different responses?
  • Is the variation transmitted from factor variation large relative to natural run-to-run variation?

All of the above answers relate to our ability to assess the effects of factor variation when the factors are at their nominal setting. However, they do not address the possibility of improving robustness by possibly moving the nominal setting to one that is less sensitive to factor variation.

Robustness and Nonlinearity

Robustness has a close link to nonlinearity. We saw this feature in the previous blog post.  There the initial analysis of the factorial experiment showed clear lack-of-fit, which the team attributed to the "gradient" factor.  We used a model with a quadratic term for gradient and found that situating the nominal value near the "valley" of the resulting curve could effectively reduce the amount of transmitted variation. Thus, the added quadratic term gave valuable information about where to set the gradient to achieve a robust method.

The presence of interactions is another form of nonlinearity that has consequences for method robustness. Two factors have an interaction effect on a response when the slope of either factor's effect depends on the setting of the other factor.  In a robustness experiment, the slope is a direct reflection of method sensitivity. So when there is an interaction, we can typically set the nominal level of one of the factors to a level that moderates the slope of the second factor, thereby reducing its contribution to transmitted variation. Exploiting interactions in this manner is a basic tool in the quality engineering experiments of Genichi Taguchi[4].

How can we plan the experiment for improving robustness?

The fractional factorial experiment that we analyzed in the previous post was effective for estimating linear effects of the factors – and this was sufficient for assessing robustness.  However, to improve robustness, we need a design that is large enough to let us estimate both linear and nonlinear effects.  The natural first step is to consider estimating "second order effects", which include pure quadratic effects like the one for gradient in our earlier post and two-factor interactions.

There are three ways we can think about enlarging the experiment to estimate additional terms in a model of the analytic method’s performance. Specifically, we can use a design that is appropriate for estimating:

  1. All two-factor interactions and pure quadratic effects.
  2. All two-factor interactions but no pure quadratics.
  3. All pure quadratics but no interactions.

Effective designs exist for option 1, like the central composite and Box-Behnken designs.  Similarly, the two-factor interactions can be estimated from two-level fractional factorial designs (option 2). The main drawback to both of these choices is that they require too many experimental runs. With K factors in the experiment, there are K main effects, K pure quadratics and K(K-1)/2 two-factor interactions. We also need to estimate the overall mean, so we need at least 1+K(K+1)/2 runs to estimate all the main effects and two-factor interactions. If K is small, this may be perfectly feasible. However, with K=7, as in the HPLC experiment, that adds up to at least 29 runs (and at least 36 to also estimate the pure quadratics). These experiments are about three times as large as the fractional factorial design analyzed in the previous blog post  and would be too expensive to implement.

Here we consider option 3, designs to estimate all the pure quadratics, but no interactions. A very useful class of experimental designs for this purpose is the Definitive Screening Designs (DSD's). We show in the next section how to use a DSD for studying and improving the robustness of an analytic method.

Applying a Definitive Screening Design

A Definitive Screening Design (DSD) for K factors requires 2K+1 runs if K is even and 2K+3 if K is odd (to ensure main effect orthogonality). The design needs to estimate 2K+1 regression terms, so this is at or near the minimum number of runs needed. In such a design, all factors are run at three levels in a factorial arrangement, main effects are orthogonal and free of aliasing (partial or full) with quadratic effects and two-factor interaction effects and no quadratic or two-way interaction effect is fully aliased with another quadratic or two-way interaction effect. With a DSD we can estimate all linear and quadratic main effects.  Further, if some factors prove to have negligible effects, we may be able to estimate some two-factor interactions. The HPLC study had seven factors, so a DSD requires 17 experimental runs (see Table 3). For robustness studies, it is important to estimate the magnitude of run-to-run variation. The DSD in this application has two degrees of freedom for error, so no additional runs are needed. Were K even, it would be advisable to add at least two runs to permit estimation of error. A simple way to do this is to add more center points to the design.

Table 3: Definitive Screening Design with seven factors

Table 3: Definitive Screening Design with seven factors

What do we learn from analyzing the DSD experimental data?

As in the previous blog post, we will illustrate the analysis by looking at the results for the peakHeight response in the HPLC application. Throughout, we divide the actual peakHeights by 1000 for ease of presentation. We proceeded to fit a model to the DSD experimental data that includes all main effects and pure quadratic effects. The analysis shows that the only significant quadratic effect is that for Gradient.  In addition to the Gradient quadratic effect we decided to keep in the model, as linear main effects:  Gradient, Column Temperature, Detection Wavelength and Triethylamine Percentage. In Figure 1, we show parameter estimates from fitting this reduced model to the peakHeight responses. All terms are statistically significant, the adjusted R2 is 87%, and the run-to-run variation has an estimated standard deviation of 2.585.

Figure 1. Parameter estimates of peakHeight with quadratic model.

Figure 1. Parameter estimates of peakHeight with quadratic model.

We show a Profiler for the reduced quadratic model in Figure 2, below.

Figure 2. Profiler of HPLC experiments with reduced quadratic model derived from DSD experiments.

Figure 2. Profiler of HPLC experiments with reduced quadratic model derived from DSD experiments.

Finding a robust solution

In order to improve robustness, we need to identify nonlinear effects. Here the only nonlinear effect is for gradient. Figure 2 shows us that the quadratic response curve for gradient reaches a minimum quite close to the nominal value (0 in the coded units of Figure 2). Consequently, setting the nominal level of Gradient to that level is a good choice for robustness. The other factors can also be kept at their nominal settings. They have only minor quadratic effects, so moving them to other settings will have no effect on method robustness.

We can assess the level of variation, as in the previous post, by assigning normal distributions to the input factors.  As in that post, we use the default option in JMP, which assigns to each input a normal distribution with standard deviation of 0.4 (in coded units). Figure 3 shows the results of this simulation. The standard deviation of peakHeight associated with variation in the factor levels is 2.697, very similar in magnitude to the SD for run-to-run variation from the experimental data. The estimate of the overall method SD is then 3.736 (the square root of 2.6972 + 2.5852).

Figure 3. Profiler of peakHeight at nominal levels with added noise.

Figure 3. Profiler of peakHeight at nominal levels with added noise.

It is instructive to compare the results from analyzing the DSD to those from analyzing the fractional factorial in the previous blog post. Both experiments ended with the conclusion that gradient has a nonlinear effect on peakHeight, and that setting gradient close to its planned nominal level is a good choice for robustness of the analytic method. The fractional factorial was not able to identify gradient as the interesting factor; this happened only after substantial discussion by the experimental team. And even then, there was concern that the decision to attribute all the nonlinearity to the gradient might be completely off the mark. The DSD, on the other hand, with just a few more runs, was able to support a firm conclusion that gradient is the only factor that has a nonlinear effect. There was no need for debate and assumptions; the issue could be determined from the experimental data.

The DSD and the fractional factorial are both able to assess variance from factor uncertainty and both agree that the three factors with the most important contributions are gradient, column temperature and detection wavelength. The DSD identified a fourth factor, the percent of Triethylamine, as playing a significant role.

The DSD, by estimating all the pure quadratic effects, was also able to fully confirm that there would be no direct gain in method robustness by shifting any of the factors to different nominal values. Improvement might still be possible due to two-factor interactions; but as we pointed out, only a much larger experiment could detect those interactions.

Can we still improve method robustness?

The DSD has shown us that changing nominal levels is not a solution. An alternative is to institute tighter control on the process parameters, thereby limiting their natural variation. Moreover, the DSD helps us prioritize the choice of which factors to control.  Figures 1 and 3 show us that the strongest linear effect is due to the column temperature. They also show that the strong and nonlinear effect of gradient may be contributing some of the most extreme high values of peakHeight. Thus these two variables appear to be the primary candidates for enhanced control.  Figures 4 and 5 use the simulator option with the profiler to see the effect of reducing the natural spread of each of these factors, in turn, by a factor of 2. With enhanced control of the column temperature, the SD related to factor uncertainty drops from 2.697 to 2.550. Reducing the variation of the gradient leads to a much more substantial improvement. The SD drops by about 40%, to 1.667.

Figure 4. Profiler of peakHeight at nominal levels with added noise and enhanced control of the column temperature.

Figure 4. Profiler of peakHeight at nominal levels with added noise and enhanced control of the column temperature.

 

Figure 5. Profiler of peakHeight at nominal levels with added noise and enhanced control of the gradient.

Figure 5. Profiler of peakHeight at nominal levels with added noise and enhanced control of the gradient.

Summary

Experiments on robustness are an important stage in the development of an analytic method. These experiments intentionally vary process factors that cannot be perfectly controlled about their nominal value. Experiments that are geared to fitting a linear regression model are useful for assessing robustness, but have limited value for improving robustness.

We have shown here how to exploit nonlinear effects to achieve more robust analytic methods.  The Definitive Screening Design can be especially useful for such experiments. For a minimal experimental cost, it provides enough data to estimate curvature with respect to each input factor. When curvature is present, we have seen how to exploit it to improve robustness. When curvature has been exploited, we have seen how to use the experimental results to achieve further improvements via tighter control of one or more input factors.

Notes

[1] Jones, B. and Nachtsheim, C. J. (2011) “A Class of Three-Level Designs for Definitive Screening in the Presence of Second-Order Effects” Journal of Quality Technology, 43. 1-15.

[2] Borman, P., Nethercote, P., Chatfield, M., Thompson, D., Truman, K. (2007), Pharmaceutical Technology. http://pharmtech.findpharma.com/pharmtech/Peer-Reviewed+Research/The-Application-of-Quality-by-Design-to-Analytical/ArticleStandard/Article/detail/463580

[3] Romero R., Gasquez, D., Sanshez, M., Rodriguez, L. and Bagur, M.  (2002), A geometric approach to robustness testing in analytical HPLC, LCGC North America, 20, pp. 72-80, www.chromatographyonline.com.

[4] Steinberg, D.M., Bursztyn, D. (1994). Dispersion effects in robust-design experiments with noise factors, Journal of Quality Technology, 26, 12-20.

 

About the Authors

This blog post is brought to you by members of the KPA Group: Ron Kenett and David Steinberg. Read the whole QbD Column series.

Ron Kenett

Ron Kenett

David Steinberg

David Steinberg

tags: Analytics, Definitive Screening Design, Design of Experiments (DOE), QbD, Statistics, The QbD Column

The post The QbD Column: Applying QbD to make analytic methods robust appeared first on JMP Blog.

10月 142016
 

Development of measurement or analytic methods parallels the development of drug products. Understanding of the process monitoring and control requirements drives the performance criteria for analytical methods, including the process critical quality attributes (CQAs) and specification limits. Uncovering the characteristics of a drug substance that require control to ensure safety or efficacy determines the identification of CQAs. The typical requirements of analytic methods include:

  • Precision: This requirement makes sure that method variability is only a small proportion of the specifications range (upper specification limit – lower specification limit). This is also called Gage Reproducibility and Repeatability (GR&R).
  • Selectivity: This determines which impurities to monitor at each production step and specifies design methods that adequately discriminate the relative proportions of each impurity.
  • Sensitivity: To achieve effective process control, this requires methods that accurately reflect changes in CQA's that are important relative to the specification limits.

These criteria establish the reliability of methods for use in routine operations. This has implications for analysis time, acceptable solvents and available equipment. To develop an analytic method with QbD principles, the method’s performance criteria must be understood, as well as the desired operational intent of the eventual end-user. Limited understanding of a method can lead to poor technology transfer from the laboratory into use in commercial manufacturing facilities or from an existing facility to a new one. Failed transfers often require significant additional resources to remedy the causes of the failure, usually at a time when there is considerable pressure to move ahead with the launch of a new product. Applying Quality by Design (QbD) to analytic methods aims to prevent such problems.

QbD implementation in the development of analytic methods is typically a four-stage process, addressing both design and control of the methods[1]. The stages are:

  1. Method Design Intent: Identify and specify the analytical method performance.
  2. Method Design Selection: Select the method work conditions to achieve the design intent.
  3. Method Control Definition: Establish and define appropriate controls for the components with the largest contributions to performance variability.
  4. Method Control Validation: Demonstrate acceptable method performance with robust and effective controls.

Testing robustness of analytical methods involves evaluating the influence of small changes in the operating conditions[2]. Ruggedness testing identifies the degree of reproducibility of test results obtained by the analysis of the same sample under various normal test conditions such as different laboratories, analysts, and instruments. In the following case study, we focus on the use of experiments to assess and improve robustness.

A case study in HPLC development

The case study presented here is from the development of a High Performance Liquid Chromatography (HPLC) method[3]. It is a typical example of testing the robustness of analytical methods. The specific system consists of an Agilent 1050, with a variable-wavelength UV detector and a model 3396-A integrator.

The goal of the robustness study is to find out whether deviations from the nominal operating conditions affect the results. Table 1 lists the factors and their levels used in this case study. The experimental array is a 27-4 Fractional Factorial experiment with three center points (see Table 2). The levels "-1" and "1" correspond to the lower and upper levels listed in Table 1, and "0" corresponds to the nominal level. The lower and upper levels are chosen to reflect variation that might naturally occur about the nominal setting during regular operation. For examples of QbD applications of Fractional Factorial experiments to formulation and drug product development, see the second and third blog posts in this series.

Table 1. Factors and levels in the HPLC experiment.

Table 1. Factors and levels in the HPLC experiment.

 

Table 2. Experimental array of the HPLC experiment.

Table 2. Experimental array of the HPLC experiment.

The experimental array consists of 11 experimental runs that combine the design factors levels in a balanced set of combinations and three center points.

What we can learn from the HPLC experiments

In analyzing the HPLC experiment, we have the following goals:

  1. Find expected method measurement prediction variance for recommended setups of the method (the measurement uncertainty).
  2. Identify the best settings of the experimental factors to achieve acceptable performance.
  3. Determine the factors that impact the performance of the method on one or more responses.
  4. Assess the impact of variability in the experimental factors on the measured responses.
  5. Make robust the HPLC process by exploiting nonlinearity in the factor effects to achieve performance that is not sensitive to changes about nominal levels.

If we consider only the eight experimental runs of the 27-4 fractional factorial, without the center points, we get an average prediction variance of 0.417 and 100% optimality for fitting a first-order model. This is due to the balanced property of the design (see Figure 1 Left). The design in Table 2, with three center points, reduces prediction uncertainty near the center of the region and has a lower average prediction variance of 0.38. However, the center points don't contribute to estimating slopes, as seen in the lower efficiency for fitting the first-order model (see Figure 1 Right).

Figure 1. Design diagnostics for 27-4 fractional factorial without (left) and with (right) three center points.

Figure 1. Design diagnostics for this fractional factorial without (left) and with (right) three center points.

 

Figure 2. Prediction variance profile without (top) and with (bottom) center points

Figure 2. Prediction variance profile without (top) and with (bottom) center points.

The JMP Prediction Variance Profile in Figure 2 shows the ratio of the prediction variance to the error variance, also called the relative variance of prediction, at various factor level combinations. Relative variance is minimized at the center of the design. Adding three center points reduces prediction variance by 25%, from 0.12 to 0.09. This is an advantage derived by adding experimental runs at the center points. Another advantage that we will see later is that the center points permit us to assess nonlinear effects, or lack-of-fit for the linear regression model. A third advantage is that the center points give us a model-free estimate of the extent of natural variation in the system.

At each factor level combination, the experiments produced five responses: 1) Area of chromatogram at peak (peakArea), 2) Height of chromatogram at peak (peakHeight), 3) Minimum retention time adjusted to standard (tRmin),  4) Unadjusted minimum retention time (unad tRmin) and 5) Chromatogram resolution (Res).

Our first concern in analyzing the data is to identify proper models linking factors and responses.

What do we learn from analyzing the data from the fractional factorial experiment?

Linear regression models are the simplest models to consider. They represent changes in responses between two levels of factors, in our case this corresponds to levels labeled “-1” and “+1”. Since we also have three center points, at levels labeled “0”, we can also assess nonlinear effects. We do so, as in our second blog post, by adding a synthetic indicator variable designed to assess lack-of-fit (LOF) that is equal to “1” at the center points and “0” elsewhere. The JMP Effect Summary report, for all five responses with linear effects on all seven factors, and the LOF indicator, is presented in Figure 3.

Figure 3. Effect Summary report of seven factors and LOF indicator on five responses

Figure 3. Effect Summary report of seven factors and LOF indicator on five responses.

The Effect Summary table lists the model effects across the full set of five responses, sorted by ascending p-values. The LogWorth for each effect is defined as -log10(p-value), which adjusts p-values to provide an appropriate scale for graphics. A LogWorth that exceeds 2 is significant at the 0.01 level because -log10(0.01)=2. The report includes a bar graph of the LogWorth with dashed vertical lines at integer values and a blue reference line at 2. The displayed p-values correspond to the significance test displayed in the Effect Tests table of the model report. The report in Figure 3 shows that, overall, four factors and LOF are significant at the 0.01 level (Col Temp, Gradient, Buf PH and Dim Perc) and Buf Conc, Det Wave and Trie Perc are non-significant. From the experimental plan in Table 2, one can estimate the main effects of the seven factors and the LOF indicator on the five responses with a linear model.

Figure 4 presents parameter estimates for peakHeight with an adjusted R2 of 93%, a very good fit.

The peakHeight response is most sensitive to variations in Col Temp, Det Wave and Gradient.

Figure 4. Parameter estimates of peakHeight of seven factors and LOF indicator with linear model. For improved readability, peakHeights have been divided by 1000.

Figure 4. Parameter estimates of peakHeight of seven factors and LOF indicator with linear model. For improved readability, peakHeights have been divided by 1000.

 We observe a statistically significant difference between the predicted value at the center point of the experimental design and the three measurements actually performed there (via the LOF variable).

Figure 5 displays a profiler plot showing the linear effects of each factor on all five responses.  The plot is very helpful in highlighting which conditions might affect the HPLC method. We see that Col Temp and Gradient, the two most important factors, affect several different responses.  Buf pH, Buf Conc and Dim Perc have especially strong effects on the retention responses, but have weak effects on the other CQA's.  The factors give good fits to the retention responses and to peakHeight, but not to peakArea or Res, which is reflected in the wide confidence bands for those CQA's and in high p-values for the overall model F-tests in the Analysis of Variance line of the model output.

Figure 5. Profiler of HPLC experiments with linear model.

Figure 5. Profiler of HPLC experiments with linear model.

What should we do about the nonlinearity? Our analysis found a significant effect of the LOF indicator, which points to a nonlinear effect that is not accounted for in the profiler of Figure 5.  The center points we added to the two-level fractional factorial design let us detect the nonlinearity, but they don’t provide enough information to determine what causes it – any one of the seven factors (and possibly several of them) could be responsible for the nonlinear effect on peak Height.  In our next blog, we will discuss some design options to address the problem. For now, we show what we achieved with the current experiment.

After much brainstorming, the HPLC team decided that it was very likely that the Gradient was the factor causing the nonlinearity. This important assumption, based only on process knowledge, is crucial to all our subsequent conclusions.  We proceeded to fit a model to the original experimental data that includes a quadratic effect for Gradient. The team also decided to retain only the factors with the strongest main effects for each response; for peakHeight, the factors were Gradient, Column Temperature and Detection Wavelength. In Figure 6, we show parameter estimates from fitting this reduced model to the peakHeight responses. With this model, all terms are significant with an adjusted R2 of 89%.  The root mean squared error, which estimates run-to-run variation at the same settings of the factors, is 1.754, slightly less than 1% the magnitude of peakHeight itself (after dividing peakHeight by 1000).

Figure 6. Parameter estimates of peakHeight with quadratic model. For improved readability, peakHeights have been divided by 1000.

Figure 6. Parameter estimates of peakHeight with quadratic model. For improved readability, peakHeights have been divided by 1000.

We show a Profiler for the reduced quadratic model in Figure 7.

Figure 7. Profiler of HPLC experiment with reduced quadratic model for peakHeight. For improved readability, peakHeights have been divided by 1000.

Figure 7. Profiler of HPLC experiment with reduced quadratic model for peakHeight. For improved readability, peakHeights have been divided by 1000.

Finding a robust solution

One of the main goals of the experiment was to assess the robustness of the system to variation in the input factors. We explore this question by introducing normal noise to the four factors in the reduced quadratic model. For each of the three factors, we assumed a standard deviation of 0.4 (in the coded units), which is the default option in JMP. This reflects a view that the experimental settings are about 2.5 SD from the nominal level, so reflect rather extreme deviations that might be encountered in practice.

Figure 8 presents the effect of noise on peakHeight for a set-up at the center point which was initially identified as the nominal setting.  We can compute the SD of the simulated outcomes by saving them to a table and using the “distribution” tab in JMP. The SD turns out to be 2.397 and is slightly larger than the run-to-run SD that we computed earlier of 1.754. The overall SD associated with the analytic system involves both of these components. To combine them, we need to first square them, then add them (because variances are additive, not SD) and then take a square root to return to the original measurement scale. The resulting combined SD is 2.970, so the anticipated variation in factor settings leads to an SD about 70% larger than the one from run-to-run variation alone.  The overall SD is less than 1.5% of the typical values of peakHeight and that was considered acceptable for this process.

Is our nominal solution a good one for robustness?

Figure 8 is very helpful in answering this question. The important factor here is Gradient, through its non-linear relationship to peakHeight. The “valley” of that relationship is near the nominal choice of 0. Our simulation of factor variation generates values of Gradient that cover the range from -1 to 1. When those values are in the valley, they transmit very little variation to peakHeight. By contrast, when they are near the extremes, there is substantial variation in peakHeight. So the fact that the bottom of the valley is close to the nominal setting assures us that the transmitted variation will be about as small as possible. We can test this feature by shifting the nominal value of Gradient. When the nominal is -0.5, the simulator shows that the SD from factor variation increases to 4.282, almost 80% more than for the nominal setting at 0.

The dependence of peakHeight on Col Temp and on Det Wave is linear. So regardless of how we choose the nominal settings of these factors, they will transmit the same degree of variation to the peakHeight output. The experiment lets us assess how they affect robustness, but does not provide any opportunity to exploit the results to improve robustness.

Figure 8. Prediction Profiler of peakHeight when the factors are at their nominal settings and the natural SD is 0.4. For improved readability, peakHeights have been divided by 1000.

Figure 8. Prediction Profiler of peakHeight when the factors are at their nominal settings and the natural SD is 0.4. For improved readability, peakHeights have been divided by 1000.

Going back to the original questions

In reviewing the questions originally posed, we can now provide the following answers:

1. What is the expected method measurement prediction variance for recommended set ups of the method (the measurement uncertainty).

Answer: We looked at this question most closely for peakHeight, where we found that the SD is 2970, with roughly equal contributions from run-to-run variation and from variation in the measurement process factors.

2. What setup of the experimental factors will achieve acceptable performance?

Answer: With all factors at their nominal settings (coded value at 0), the SD of 2970 is less than 1.5% the size of the values measured, which is an acceptable level of variation in this application.

3. What are the factors that impact the performance of the method in one or more responses?

Answer: The three factors with highest impact on the method’s performance are gradient profile, column temperature and detection wave.

4. Can we make the setup of the experimental factors robust in order to achieve performance that is not sensitive to changes in factor levels?

Answer: We saw that we can improve robustness for peakHeight by setting the gradient to its coded level of 0 (the nominal level in the experiment).  That setting helps us to take advantage of the non-linear effect of gradient and reduce transmitted variation.

5. Can we assess the impact of variability in the experimental factors on the analytical method?

Answer: As we noted earlier, the natural variability of the input factors is responsible for slightly more than half the variation in peakHeight.

Wrap-Up

In reviewing the questions originally posed, we first fit a linear regression model. After realizing that there is an unaccounted nonlinear effect we used a reduced quadratic model and found that it fits the data well. Inducing variability in the factors of the reduced quadratic (gradient profile, column temperature and detection wave), we could estimate of the variability due to the method and could assess the robustness of the recommended setup.

The team’s assumption that gradient is responsible for the non-linearity is clearly important here.  If other factors also have non-linear effects, there could be consequences for how to best improve the robustness of the method. We will explore this issue further in our next blog post.

Notes

[1] Borman, P., Nethercote, P., Chatfield, M., Thompson, D., Truman, K. (2007), Pharmaceutical Technology. http://pharmtech.findpharma.com/pharmtech/Peer-Reviewed+Research/The-Application-of-Quality-by-Design-to-Analytical/ArticleStandard/Article/detail/463580

[2] Kenett, R.S, and Kenett, D.A (2008), Quality by Design Applications in Biosimilar Technological Products, Accreditation and Quality Assurance, Springer Verlag, Vol. 13, No 12, pp. 681-690.

[3] Romero R., Gasquez, D., Sanshez, M., Rodriguez, L. and Bagur, M.  (2002), A geometric approach to robustness testing in analytical HPLC, LCGC North America, 20, pp. 72-80, www.chromatographyonline.com.

 

About the Authors

This blog post is brought to you by members of the KPA Group: Ron Kenett and David Steinberg.

Ron Kenett

Ron Kenett

David Steinberg

David Steinberg

tags: Analytics, Design of Experiments (DOE), Life Sciences, QbD, Statistics, The QbD Column

The post The QbD Column: Is QbD applicable for developing analytic methods? appeared first on JMP Blog.

10月 062016
 

As data analysts, we all try to do the right thing. When there is a choice of statistical distributions to be used for a given application, it’s a natural inclination to try to find the “best” one.

But beware...

Fishing for the best distribution can lead you into a trap. Just because one option appears to be best – that doesn’t mean that it’s correct! For example, consider this data set:

distribution

What is the best distribution we can use to describe this data? JMP can help us answer this question. From the Distribution platform, we can choose to fit a number of common distributions to the data: Normal, Weibull, Gamma, Exponential, and others. To fit all possible continuous distributions to this data in JMP, go to the red triangle hotspot for this variable in the Distribution report, and choose “Continuous Fit > All”. Here is the result:

fit-all

JMP has compared 11 potential distributions for this data, and ranked them from best (Gamma) to worst (Exponential). The metric used to perform the ranking is the corrected Akaike Information Criterion (AICc). Lower values of AICc indicate better fit, and so the Gamma distribution is the winner here.

Here’s the catch

This data set was generated by drawing a random sample of size 50 from a population that is normally distributed with a mean of 50 and a standard deviation of 10. The Normal distribution is the correct answer by definition, but our fishing expedition gave us a misleading result.

How often is there a mismatch like this? One way we can approach this question is through simulation. I wrote a small JMP script to draw samples of various sizes from a normally distributed population. I investigated sample sizes of 5, 10, 20, 30, 50, 75, 100, 250, and 500 observations; for each of these, I drew 1,000 independent samples and had JMP compute the fit for all possible continuous distributions. Last, for each sample I recorded the name of the best-fitting distribution, as measured by AICc. (JSL script available in the JMP File Exchange).

The results were quite surprising!

results

  • Remember, the correct answer in each case is “Normal”. If our fishing expedition was yielding good results across the board, the line for the Normal distribution should be high and flat, hovering near 100%.
  • Instead, the wrong distribution was chosen with disturbing frequency. For sample sizes under 50, the Normal distribution was not even the most commonly chosen. That honor belongs to the Weibull distribution.
  • For a sample size of 5 observations from a Normal distribution, the correct identification was not made a single time out of 1,000 samples.
  • If you want to have at least a 50% chance of correctly identifying normally distributed data by this method, you’ll need more than 100 observations!
  • Even at a sample size of 500 observations, the likelihood of the normal distribution being correctly called the best is only about 80%.

The moral of the story

When comparing the fit of different distributions to a data set, don’t assume that the distribution with the smallest AICc is the correct one. Relative magnitudes of the AICc statistics are what counts. A rule of thumb (used elsewhere in JMP) is that models whose values of AICc are within 10 units of the “best” one are roughly equivalent.* In our first example above, the Gamma distribution is nominally the best, but its AICc is only .2 units lower than that of the Normal distribution. There is not good statistical evidence to choose the Gamma over the Normal.

More generally, as a best practice it is wise to consider only distributions that make sense in the context of the problem. Your own knowledge and expertise are usually the best guides. Don’t choose an exotic distribution that has a slightly better fit over one that makes sense and has a proven track record in your field of work.

*This rule is used to compare models built in the Generalized Regression personality of the Fit Model platform in JMP Pro. See Burnham, K.P. and Anderson, D.R. (2002), Model Selection And Multimodel Inference: A Practical Information Theoretic Approach. Springer, New York.

tags: Distribution, Statistics, Tips and Tricks

The post Is that the best (distribution) you've got? appeared first on JMP Blog.

9月 282016
 

I recently read a very interesting article describing how analytics is being used to detect cheating/copying/re-use in crossword puzzle creation, in some of the major news publications. This inspired me to try my hand at creating a totally new & unique crossword puzzle ... of course using SAS software! :) My grandmother […]

The post A statistical crossword puzzle to exercise your brain appeared first on SAS Learning Post.

9月 162016
 

The design of experiments (DOE) capabilities of JMP are world-class. You can choose from many designs, such as custom, definitive screening, classical, space-filling, choice and covering arrays. But how do you decide which design to use?

A Fraction of Design Space plot comparing three designs

A Fraction of Design Space plot comparing three designs

It used to be a time-consuming process to compare two designs for an experiment, for example, a definitive screening design compared to a traditional fractional-factorial screening design. Or a one-factor-at-a-time experiment compared to an optimal design produced by Custom Design. But not anymore, with the new Compare Designs platform in JMP 13.

“Now it’s easy to compare two or three designs. The platform does side-by-side comparisons, and you can see which design is better for which diagnostic,” says Ryan Lekivetz, Sr. Research Statistician Developer who worked on the new platform.

Many of the diagnostics in Compare Designs include a color dashboard that helps you quickly determine which design is better. In addition to comparing two or three different designs, you can also see the impact of varying the number of runs on the same type of design – allowing you to perform a trade-off analysis in terms of experimental budget and design diagnostics.

Color indications in power analysis in Compare Designs

The Power Analysis diagnostic in Compare Designs uses color to help you see which design is better

Like many new capabilities in JMP 13, Compare Designs was driven by customer requests. And the feedback has been very positive.

Christine Anderson-Cook, a research scientist and JMP user at Los Alamos National Laboratory (LANL), demonstrates the new platform in a recent Analytically Speaking episode and has high praise for it: “JMP has long been leading the way with its experiment design construction and assessment tools, but the new Compare Designs platform takes choosing the right design for the goals of your specific experiment to a new level.” Christine has also written an article on the new platform for the upcoming JMP Foreword magazine.

Ryan notes that Compare Designs could be useful as a tool to understand and show the value of modern experiment design. “Statisticians often get the results of an experiment that they didn’t design. With this platform, they can compare the design that was used to another design, and they can see how much was lost by not using the better design,” he explains.

Working on DOE in JMP is fun for Ryan. “You can apply DOE techniques to almost anything, like cooking hard-boiled eggs, making iced tea and dyeing toy cars,” he says, referring to a few of the experiments he has blogged about.

And he loves the challenge of helping customers with designing experiments. “I like hearing about the tough problems where people don’t think they can create a design and then showing them how they can do it, thanks to the flexibility of custom designs,” Ryan says.

You can talk to Ryan about your own experiment design problems at Discovery Summit 2016 in a few weeks. He’ll be around at “Meet the Developers” and will present a breakout session on the Simulate Responses option in JMP 13.

To learn more about what’s coming in JMP 13, stop by the preview page on our website. There, you can sign up to watch a live stream of JMP chief architect John Sall’s tour of JMP 13 on Sept. 21, as well as watch short videos about JMP 13 and JMP Pro 13.

tags: Compare Designs, Design of Experiments (DOE), JMP 13, Statistics

The post JMP 13 Preview: Compare Designs for picking the best design for an experiment appeared first on JMP Blog.