十一 072016
 

In a recent blog post, I discussed control limits and specification limits, where they come from and what they are. This post goes into more detail about specification limits.

We will use Process Capability in JMP to generate a capability analysis. We will then use this analysis to make decisions about our process. You can read more about Process Capability in a post by my colleague Laura Lancaster.

Example

In an earlier post on generating control limits using Control Chart Builder, we saw that stability should always be checked prior to performing a capability analysis. I introduced an example that involved a printing process. Let’s review this example.

Variations in the printing process can cause distortion in the line, including skew, thickness, and length problems. For our purposes, we are considering the length of the line. The line is considered good if it has a printed length of 16 cm +/- 0.2 cm. Any longer, and the sentence may run off of the page. Any shorter and there would be a lot of wasted space on the page. For every print run, the first and last books are taken for measurement. The line lengths are measured on a specified page in the middle of each book.

Analysis

We determined in the previous blog post on control charts that the process was stable. Now that we know the process is stable, we can determine whether or not our process is capable. The capability of the process is defined by how well the process produces product that is within specification. In our example, the line is considered good if it has a printed length of 16 cm +/- 0.2 cm. Even though we knew this and defined this prior to creating our control charts, this information was not used in our control chart analysis. This information defines our specification limits. These values are not calculated values. We know given the size of the page, that a good line has the length stated above.

We can use our same data to perform the capability analysis. To generate the analysis, go to Analyze->Quality and Process->Process Capability. Select Length as the Y, Process. In the Column Roles section, select Length. Open the Process Subgrouping section by clicking on the triangle next to Process Subgrouping. Select Run in the Select Columns section and click Nest Subgroup ID Column.  In the Within-Subgroup Variation Statistic section, choose Average of Ranges. Since we used ranges for our control chart, let’s use ranges here as well.

Process Capability dialog

Click OK. You are presented with a dialog that allows you to define the spec limits. You can either select a data table that contains your spec limits or you can enter your spec limits directly. We will enter them directly. Enter 15.98 (16-.02) for the LSL, 16 for the Target, and 16.02 (16+.02) for the USL.  Click OK.

Spec Limits dialog

Goal Plot

You are first presented with a Goal Plot.

Goal Plot

The y axis is the std dev standardized to spec. This shows the variability in the data. We notice that the point falls well above the red triangle, indicating high variability. The x axis is the Mean Shift Standardized to Spec. Since the point occurs near the apex of the triangle, we note that the process is close to target (a small amount below target). The area under the red triangle denotes a non-conformance rate of 0.0027 or better (this corresponds to Ppk =1), if the distribution is normal. We see that our process has a much higher non-conformance rate. The Ppk slider can be adjusted so that the triangle denotes different non-conformance rates.

Box Plot

The Capability Box Plot shows that not all of our data meets the specification limits. This box plot is very wide and spans a width much larger than the green standardized specification limits. We also notice that the process is close to the target value, but a little on the lower side. We see this because the solid black line that appears inside the box plot is slightly to the left of the solid green line which denotes the standardized target value.

capabilityboxplot

Individual Detail Reports

To look at this process in more detail, select Individual Detail Reports from the red triangle next to Process Capability. The histogram suggests that the data is normal. If you wanted to perform an actual goodness of fit test for normality, you could use the Distribution platform.

Individual Detail Report

The blue density curve for Within Sigma falls pretty close to the black dotted density curve for Overall Sigma. This indicates that the process is stable, which we already showed via Control Chart Builder. In the nonconformance report, we see that the total outside is 67.5.  So 67.5% of our measurements do not meet the specification limits.

Further Investigations

To further investigate this process, click on the red triangle next to Process Capability and select Out of Spec Values->Color Out of Spec Values.

capabilitydata

We see in the data table that less than half of the observations met the specification limits. The number of observations that fell above the specification limit is about the same as the number of observations that fell below the spec limit. This problem needs further investigation. Perhaps we need to take a closer look at how the spec limits were determined. Perhaps there are other variables that we did not take into account. Remember in the description of the process that measurements were taken on the first and last book of each run. In our analysis, we only used the variables Length and Run. We did not take into account book which may also be a source of variation that we need to account for.

Conclusions

The printing process is stable, but not capable. So while we can predict what the process is going to do, we can’t consistently produce lines of the appropriate length (between 15.98 and 16.02 cm).  This process needs further investigation.

References

JMP Software: Statistical Process Control Course Notes

tags: Capability, Process Capability, quality, Spec Limits, Specification Limits

The post Creating a capability analysis in JMP using your specification limits appeared first on JMP Blog.

十一 072016
 

Most enterprises employ multiple analytical models in their business intelligence applications and decision-making processes. These analytical models include descriptive analytics that help the organization understand what has happened and what is happening now, predictive analytics that determine the probability of what will happen next, and prescriptive analytics that focus on […]

The post Why analytical models are better with better data appeared first on The Data Roundtable.

十一 072016
 

Rotation matrices are used in computer graphics and in statistical analyses. A rotation matrix is especially easy to implement in a matrix language such as the SAS Interactive Matrix Language (SAS/IML). This article shows how to implement three-dimensional rotation matrices and use them to rotate a 3-D point cloud.

Define a 3-D rotation matrix

In three dimensions there are three canonical rotation matrices:

  • The matrix Rx(α) rotates points counterclockwise by the angle α about the X axis. Equivalently, the rotation occurs in the (y, z) plane.
  • The matrix Ry(α) rotates points counterclockwise by the angle α in the (x, z) plane.
  • The matrix Rz(α) rotates points counterclockwise by the angle α in the (x, y) plane.

Each of the following SAS/IML functions return a rotation matrix. The RotPlane function takes an angle and a pair of integers. It returns the rotation matrix that corresponds to a counterclockwise rotation in the (xi, xj) plane. The Rot3D function has a simpler calling syntax. You specify and axis (X, Y, or Z) to get a rotation matrix in the plane that is perpendicular to the specified axis:

proc iml;
/* Rotate a vector by a counterclockwise angle in a coordinate plane. 
        [ 1    0       0   ]
   Rx = [ 0  cos(a) -sin(a)]        ==> Rotate in the (y,z)-plane
        [ 0  sin(a)  cos(a)]
 
        [ cos(a)  0   -sin(a)]
   Ry = [   0     1      0   ]      ==> Rotate in the (x,z)-plane
        [ sin(a)  0    cos(a)]
 
        [ cos(a) -sin(a) 0]
   Rz = [ sin(a)  cos(a) 0]         ==> Rotate in the (x,y)-plane
        [   0       0    1]
*/
start RotPlane(a, i, j);
   R = I(3);  
   c = cos(a); s = sin(a);
   R[i,i] = c;  R[i,j] = -s;
   R[j,i] = s;  R[j,j] =  c;
   return R;
finish;
 
start Rot3D(a, axis);   /* rotation in plane perpendicular to axis */
   if upcase(axis)="X" then       
      return RotPlane(a, 2, 3);
   else if upcase(axis)="Y" then
      return RotPlane(a, 1, 3);
   else if upcase(axis)="Z" then
      return RotPlane(a, 1, 2);
   else return I(3);
finish;
store module=(RotPlane Rot3D);
quit;

NOTE: Some sources define rotation matrices by leaving the object still and rotating a camera (or observer). This is mathematically equivalent to rotating the object in the opposite direction, so if you prefer a camera-based rotation matrix, use the definitions above but specify the angle -α. Note also that some authors change the sign for the Ry matrix; the sign depends whether you are rotating about the positive or negative Y axis.


3-D rotation matrices are simple to construct and use in a matrix language #SASTip
Click To Tweet


Applying rotations to data

Every rotation is a composition of rotations in coordinate planes. You can compute a composition by using matrix multiplication. Let's see how rotations work by defining and rotating some 3-D data. The following SAS DATA step defines a point at the origin and 10 points along a unit vector in each coordinate direction:

data MyData;               /* define points on coordinate axes */
x = 0; y = 0; z = 0; Axis="O"; output;    /* origin */
Axis = "X";
do x = 0.1 to 1 by 0.1;    /* points along unit vector in x direction */
   output;
end;
x = 0; Axis = "Y";
do y = 0.1 to 1 by 0.1;    /* points along unit vector in y direction */
   output;
end;
y = 0; Axis = "Z";
do z = 0.1 to 1 by 0.1;    /* points along unit vector in z direction */
   output;
end;
run;
 
proc sgscatter data=Mydata;
matrix X Y Z;
run;

If you use PROC SGSCATTER to visualize the data, the results (not shown) are not very enlightening. Because the data are aligned with the coordinate directions, the projection of the 3-D data onto the coordinate planes always projects 10 points onto the origin. The projected data does not look very three-dimensional.

However, you can slightly rotate the data to obtain nondegenerate projections onto the coordinate planes. The following computations form a matrix P which represents a rotation of the data by -π/6 in one coordinate plane followed by a rotation by -π/3 in another coordinate plane:

proc iml;
/* choose any 3D projection matrix as product of rotations */
load module=Rot3D;
pi = constant('pi');
Rz = Rot3D(-pi/6, "Z");    /* rotation matrix for (x,y) plane */
Rx = Rot3D(-pi/3, "X");    /* rotation matrix for (y,z) plane */ 
Ry = Rot3D(    0, "Y");    /* rotation matrix for (x,z) plane */
P = Rx*Ry*Rz;              /* cumulative rotation */
print P;
A rotation matrix is a product of canonical rotations

For a column vector, v, the rotated vector is P*v. However, the data in the SAS data set is in row vectors, so use the transposed matrix to rotate all observations with a single multiplication, as follows:

use MyData;
read all var {x y z} into M;
read all var "Axis";
close;
RDat = M * P`;                    /* rotated data */

Yes, that's it. That one line rotates the entire set of 3-D data. You can confirm the rotation by plotting the projection of the data onto the first two coordinates:

title "Rotation and Projection of Data";
Px = RDat[,1]; Py = RDat[,2]; 
call scatter(Px, Py) group=Axis 
   option="markerattrs=(size=12 symbol=CircleFilled)";
Scatter plot of rotated and projected 3-D data

Alternatively, you can write the rotated data to a SAS data set. You can add reference axes to the plot if you write the columns of the P` matrix to the same SAS data set. The columns are the rotated unit vectors in the coordinate directions, so plotting those coordinates by using the VECTOR statement adds reference axes:

create RotData from RDat[colname={"Px" "Py" "Pz"}];
append from RDat;
close;
 
A = P`;          /* rotation of X, Y, and Z unit vectors */
create Axes from A[colname={"Ax" "Ay" "Az"}];  append from A; close;
Labels = "X":"Z";
create AxisLabels var "Labels";  append; close;
QUIT;
 
data RotData;    /* merge all data sets */
merge MyData Rot Axes AxisLabels;
run;
 
proc sgplot data=RotData;
   vector x=Ax y=Ay / lineattrs=(thickness=3) datalabel=Labels;
   scatter x=Px y=Py / group=Axis markerattrs=(size=12 symbol=CircleFilled);
   xaxis offsetmax=0.1; yaxis offsetmax=0.1; 
run;
Scatter plot of rotated and projected 3-D data

All the data points are visible in this projection of the (rotated) data onto a plane. The use of the VECTOR statement to add coordinate axes is not necessary, but I think it's a nice touch.

Visualizing clouds of 3-D data

This article is about rotation matrices, and I showed how to use matrices to rotate a 3-D cloud of observations. However, I don't want to give the impression that you have to use matrix operations to plot 3-D data! SAS has several "automatic" 3-D visualization methods that more convenient and do not require that you program rotation matrices. The visualization methods include

I also want to mention that Sanjay Matange created a 3-D scatter plot macro that uses ODS graphics to visualize a 3-D point cloud. Sanjay also uses rotation matrices, but because he uses the DATA step and PROC FCMP, his implementation is longer and less intuitive than the equivalent operations in the SAS/IML matrix language. In his blog he says that his macro "is provided for illustration purposes."

In summary, the SAS/IML language makes it convenient to define and use rotation matrices. An application is rotating a 3-D cloud of points. In my next blog post I will present a more interesting visualization example.

tags: Matrix Computations, Statistical Graphics

The post Rotation matrices and 3-D data appeared first on The DO Loop.

十一 072016
 

As a developer, I try to make sure all the most-used features of JMP are where you can find them easily. That’s why the options to change the statistics shown in the crosstab are at the top of Categorical’s red triangle menu. That’s why the new “Redo” and “Save Script” buttons are always in the same spot. But there are always some neat features that I wish I could make more prominent. This blog series will show off a few of them, starting with Aligned Responses.

Many surveys have questions about a person’s attitudes toward different ideas, or their satisfaction with different attributes of a good or service. Most of these are rated on an ordinal scale (e.g., 1-5 where 1 is “Poor”, and 5 is “Excellent”, or 1-5 where 1=”Strongly Disagree” and 5= “Strongly Agree”). Sensory analysis often uses a “Just About Right” (JAR) scale where negative numbers indicate “Not enough” of a particular flavor (Saltiness, Sweetness, etc), positive numbers indicate “Too much” of the flavor, and values near 0 are “Just About Right”.

All of these cases have two features in common: 1) The values have an inherent order, and 2) they are all measured on the same scale. You can use Aligned Responses in the Categorical platform to get a streamlined comparison for many of these columns at once.

Measures of self-perception

As an example, let’s take a look at a famous survey about attitudes and social activities called the “Bowling Alone” data. This data set has a lot of questions that would be good candidates for Aligned Responses. Let’s take a look at a series of questions that measures how people feel about themselves:

  • I am the kind of person who knows what I want to accomplish in life and how to achieve it (ACCOMPLISH)
  • My friends and neighbors often come to me for advice about products and brands (ADVICE)
  • I have better taste than most people (BETTASTE)
  • I don't like to take chances (CHANCES)
  • I would do better than average in a fist fight (FISTFGT).

People were asked to rate how much they agreed with each of these statements on a scale of 1 (Definitely Disagree) to 6 (Definitely Agree). The way the questions are worded, you can think of these as measures of how people feel about themselves: Do they see themselves as competent? Someone who is respected by others? A resource for friends and neighbors?

It’s interesting to compare these different measures of self-perception. Aligned Responses is tailored for this type of analysis.

Exploring the data

On the Analyze menu, go to Consumer Studies and choose Categorical. By default, the first tab you see is the “Simple” tab (you may see a different one if you’ve set your Preferences). Clicking on the “Related” tab, we see that there are several types of responses JMP considers related, including Aligned Responses, Repeated Measures, and Rater Agreement. They all give the same basic Crosstab and Share Charts, but they report different statistics. See the JMP Consumer Research book for more details.

In our case, Aligned Responses is sufficient for what we need.

  • Using Ctrl-Click in the columns pane, select the five columns for the report.
  • Click Aligned Responses to add the crosstab definition.
  • Click “OK” to see the report.

jmp_aligned_responses_1

jmp_aligned_responses_2
The crosstabs from the Related tab look different from the crosstabs created by other tabs in the Categorical platform. Instead of having separate tables for each response column, they are all placed into one. There is a row for each response, and each value is given a column. The values are common across all the responses, because the platform expects Aligned Responses to work this way.

A centered Likert share chart appears below the crosstab and has the same layout. There is a bar for each row in the table. For each row, the mean is considered the "center" of the response, and the bar is shifted left or right, depending on whether there are more responses at the higher levels or the lower levels. The colors for the values across the top of the table match the colors in each section of the bar. You can easily see that more people agree with “I am the kind of person who knows what I want to accomplish in life and how to achieve it (ACOMPLSH)” than with any of the other statements. Most people are not confident about their ability to perform better than average in a fist fight (FISTFGT). The bar for CHANCES is shifted to the right, and the bright red bar is wider than the others, showing that more people “Definitely Agree” that they don’t like to take chances than they “Definitely Agree” with the other statements.

Refining the analysis

It’s often important to see how different categories of people responded to different questions, i.e., if there is a predictor variable that can be used as a proxy for other variables. Let’s change the report slightly. Instead of looking at everything as a response, let’s use one of the questions, “I have better taste than most people” (BETTASTE) as a proxy for a person’s self-esteem. Instead of assigning it as part of the Aligned Responses, we’ll make it an X Column.

jmp_aligned_responses_3

Now when we click OK, we get a series of crosstabs: One for each of the questions designated as a response, with a row for each value of BETTASTE within each table. The tables are stacked on top of each other, creating a long crosstab that can be difficult to read. Sometimes, I turn it off using the option from the Categorical platform’s red triangle menu.

jmp_aligned_responses_4

The Share chart clearly shows what might be difficult to see in the numbers: Each question has its own mean measuring how much, on average, respondents agreed with the statement, but if we examine each of the questions based on how much respondents agreed with “I have better taste than most people”, we see that the bars shift to the right as you move down the row within each sub-table.

jmp_aligned_responses_5
People who responded “Definitely Agree” to the BETTASTE question were more likely to “Definitely Agree” that “I am the kind of person who knows what I want to accomplish in life and how to achieve it” and they were more likely to “Definitely Agree” with “I would do better than average in a fist fight”.

The only question in this group that appears to be unrelated to BETTASTE is the statement “I don’t like to take Chances,” since the bars in that table do not shift with the answer for BETTASTE.

Give it a try!

Aligned Responses is a convenient way to explore surveys that have many questions with the same ordinal coding (such as “Strongly agree” – “Strongly Disagree” or “Poor” to “Excellent” ratings on a product or service). The Share chart, in particular, is a quick and easy visual that can help you spot patterns in your data that might be hidden in a large table of counts and percentages.

tags: Aligned Responses, Categorical, consumer and market research, Consumer Research, Survey research

The post JMP Categorical features you never knew about: Aligned Responses appeared first on JMP Blog.

十一 052016
 

bhutanGalit Shmueli, National Tsing Hua University’s Distinguished Professor of Service Science, will be visiting the SAS campus this month for an interview for an Analytically Speaking webcast.

Her research interests span a number of interesting topics, most notably her acclaimed research, To Explain or Predict, as well as noteworthy research on statistical strategy, bio-surveillance, online auctions, count data models, quality control and more.

In the Analytically Speaking interview, we’ll focus on her most interesting Explain or Predict work as well as her research on Information Quality and Behavioral Big Data, which was the basis of her plenary talk at the Stu Hunter conference earlier this year. I'll also ask about her books and teaching.

Galit has authored and co-authored many books, two of which — just out this year — include some JMP. First is Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro, with co-authors, Peter C. Bruce, Nitin R. Patel, and Mia Stephens of JMP. This first edition release coincides with the third edition release of Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner, with the first two co-authors listed above. As Michael Rappa says so well in the foreword of the JMP Pro version of the book, “Learning analytics is ultimately about doing things to and with data to generate insights.  Mastering one's dexterity with powerful statistical tools is a necessary and critical step in the learning process.”

The second book is Information Quality: The Potential of Data and Analytics to Generate Knowledge, which Galit co-authored with Professor Ron S. Kenett, CEO and founder of KPA and research professor at the University of Turin in Italy (you may recognize Ron and KPA colleagues as guest bloggers on the JMP Blog on the topic of QbD). As David Hand notes in his foreword, the book explains that “the same data may be high quality for one purpose and low quality for another, and that the adequacy of an analysis depends on the data and the goal, as well as depending on other less obvious aspects, such as the accessibility, completeness, and confidentiality of the data.”

Both Ron and Galit will be plenary speakers at Discovery Summit Prague in March. You can download a chapter from their book, which discusses information quality support with JMP and features an add-in for Information Quality, both written by Ian Cox of JMP. You can see a short demo of JMP support for information quality during the Analytically Speaking webcast on Nov. 16.

Whether your analysis is seeking to explain some phenomena and/or to make useful predictions, you will want to hear Galit’s thoughtful perspective on the tensions between these two goals, as well as what Galit has to say on other topics up for discussion. Join us! If Nov. 16 doesn’t suit your schedule, you can always view the archived version when convenient.

tags: Analytically Speaking, Analytics, Books, Discovery Summit, Statistics

The post To explain or predict with Galit Shmueli appeared first on JMP Blog.

十一 042016
 

Elections in the US are a 'target rich environment' for data analysts. There are surveys and forecasts before the election, and the presentation of results during and after the voting. What's your favorite election-related graph of all time? For the current (2016) presidential election, my favorite graphs are on the […]

The post Your chance to vote ... for your favorite election graph! appeared first on SAS Learning Post.

十一 042016
 

JMP supports many date/time formats, but some less conventional (or downright esoteric) formats still crop up from time to time. To many users, converting an oddly formatted date/time from string to numeric form is a frustrating endeavor, requiring custom formulas and an assortment of seldom-used string and numeric operations. With the Custom Date Formula Writer, you can simply point-and-click your date/time troubles away, generating the necessary formula without writing code.

To begin, install the Data Table Tools Add-in and navigate to the formula writer:

launchtabletools

Now the new date column is just four steps away:

  1. Choose the table and column containing the character date/time data.
  2. Point and click to delimit the "words" in the data.
  3. Specify the meaning of each word, and various options, using drop-down menus and radio buttons.
  4. Press the "Build formula column" button.

Here's what the process looks like:

Step 1: Choose the table and date/time column.

 

selectfile


Step 2: Point and click the text to delimit the data, then press the "Apply delimiting and choose words" button.

 

initial-delimiting


Step 3: Complete the dialog using the radio buttons and drop-down menus to select options and word roles.

 

characterdelimiting


Step 4: Click the "Build Formula Column" button to write the new formula column to the data table.

 

datatable
The column formula is written automatically. Isn't that nice? Hopefully, your date worries are now a thing of the past.

formula

This add-in, along with many others, is available for free on the JMP User Community's File Exchange. (A free SAS profile is required for access.)

I'll be blogging on more table tools in the future, so stay tuned!

Note: This blog post is first in a series exploring the various features of the Data Table Tools add-in.

tags: Add-Ins, Date/Time Format, Tips and Tricks

The post Data table tools part 1: Custom Date Formula Writer appeared first on JMP Blog.

十一 042016
 

The digital age has fundamentally changed how brands and organisations interact with consumers. This shift has been a crucial part of the Third Industrial Revolution and helped spark the era of consumers sharing their data with different organisations. But now organisations are heralding the Fourth Industrial Revolution, and data is […]

Analytics: The lifeblood of the Fourth Industrial Revolution was published on SAS Voices.

十一 042016
 

My river walk last week turned into a spectacular fall show. But if it rains this week in San Antonio, like the weatherman predicts, what will I do? In the coming days, I’ll be presenting at two user groups,  one in eastern Canada in Halifax, and the other all the […]

The post The difference between the Subsetting IF and the IF—THEN—ELSE—IF statement appeared first on SAS Learning Post.

十一 032016
 

Business and production systems have become much more capable at collecting data. Equipment collects a variety of sensor and parametric data, and today all kinds of information on buying habits and consumer preferences is available. This level of detail cannot be analyzed and comprehended with static, conventional reporting. Instead, business analysts, engineers and scientists can unlock insights and make discoveries with leverage provided by interactive, visual analytical software.

Analytical software has surfaced a new world of analytics that is characterized by these important traits:

  1. Data are “self-provisioned.” Users are able to get the data they need without assistance and without delay.
  2. The analytics are visual and interactive. As a result …
  3. Users can now conduct advanced analytics without a PhD in statistics.
  4. Analysts conduct their work “in-the-moment.” Insights often surface questions that analysts explore “in-the-moment” creating an active dynamic that further spawns discovery.
  5. Analytical thinking is completely coupled to the business thinking.
  6. More than descriptive, analytics are inferential.

An 'aha' moment

Consider this insurance example. Here demographic information from many thousands of current and potential clients was collected and maintained in a database. The insurance company was able to download the data into a spreadsheet and summarize the data but did they get the best exploitable insights? Answering even the simplest questions took days to acquire, splice and arrange the data.

Today, with integrated, interactive and visual analytics insights are revealed in seconds. The big question when it comes to prospective clients is how many of them were converted to new business and what are the factors that drive the conversion? By knowing this, focus can be brought to business practices that lead to higher rates of success.

screen-shot-2016-10-17-at-7-23-24-pm
We started by loading the data. With only a few clicks, tens of thousands prospective client encounters, including demographic information such as income, education, age, martial status, etc., were loaded. You can see from the image above that overall about 12.5% (the blue area) of these prospects were converted into paying customers.

Now to the question at hand: What factors determine success in winning new business? One more click (on the Split button in the lower-left) and an “aha” moment ensued.

screen-shot-2016-10-17-at-7-24-01-pm

The chart above shows that a particular factor (which, due to confidentiality I can’t disclose so we’ll call it ... ), “factor Xn,” leads to an incredibly high conversion rate (about 90% as seen in the blue bar on the right) for a good number of prospects and that the remaining prospects had little chance of succeeding.

The analysts were stunned at seeing this. This insight had eluded them because the overall conversion rate was masking a major distinction, identified by factor Xn, among the prospects. Keep in mind that these analysts spend day-in and day-out poring over data, but this important insight and others that were to follow remained locked within.

This insight spawned a bunch of questions. First, it appears changing sales representative instructions were in order. Second, why was it that the conversion rate for other customers was so incredibly low? This led to questions about pricing, packaging and the like in combination with demographics that would be investigated with designed experiments.

Why it worked

Looking back at the six traits above, we can see that in this case:

  1. IT established systems that allowed users to get the data themselves: "self-provisioned data."
  2. Indeed the analytics were highly visual. Yes, all the statistical information is provided, but it is made accessible through graphics and interactivity.
  3. No PhD in statistics was necessary. The analysis above involves recursive partitioning with cross-validation. A mouthful to be sure, but that complexity (and statistical jargon) does not get in the way of a business analyst or engineer gaining the highest possible number and quality of exploitable insights. They can focus on their subject matter unfettered. In fact, my experience is that the tool almost becomes invisible as the focus is on the subject matter.
  4. Unlike the old days, when I started in this game, there was no need to submit a request that instructs programmers in IT to amend a report that will arrive several days later. The lapsed time between question-and-answer was gone, and so was the dependency.
  5. The old division of labor between analytics and business was gone. They must be welded together to be effective and efficient at finding exploitable business, engineering and scientific insights.
  6. Notice that the analysis is not simply descriptive, as it was in the old days. It is inferential because it leads analysts to predict future outcomes and ask further questions.

Not only were the analysts impressed with the insight, but they were also excited about how readily it was derived.

Build your own culture of analytics

What does it take to bring the new world of analytics into your organization and support a culture of analytics?

This is where IT comes in -- obviously, they have a major role to play. IT no longer needs to worry about conducting analytics. It’s best left to the analysts. Instead, IT are now enablers of analytics. They can do this by:

  1. Maintaining the hardware and software infrastructure that supports operational and analytical needs.
  2. Making data available in an analytically-friendly way so that data may be self-provisioned. We do lots of work in this area to ensure that analytical data demands do not affect operations. For example, in pharmaceutical, semiconductor, solar and other industries, unimpeded real-time data must be collected for traceability. Analytical demand on IT infrastructure cannot affect operational systems.
  3. Support the likes of our company, Predictum, in developing integrated analytical applications that further facilitate analysis, store and transfer knowledge and insights and gain other efficiencies and cost savings in areas of operations, research and compliance.
  4. Secure all systems.

Securing systems is a rapidly growing and increasingly demanding responsibility for IT -- so much so that we find that IT folks are usually very happy to be relieved of the burden of conducting analytics or involving themselves with analytics that analysts can better support themselves. Their enabling role is much more consistent with their other activities and responsibilities. For example, IT supports order/shipping/billing systems, but they do not order, ship or bill themselves -- so why should they conduct business, science or engineering analytics?

With the Internet of Things, new more capable equipment and the internet’s expanding reach, we can expect an exponential increase in the amount and quality of data well into the future. It’s best to prepare for the opportunities presented by building a culture of analytics now. That involves designing the right data architecture, providing JMP and enabling business analysts, scientists and engineers to advance their subject matter expertise with analytics.

Editor's Note: A version of this blog post first appeared in the Predictum blog. Thanks to Wayne Levin for sharing it here as well.

tags: Analytic Culture, Analytics, Discovery

The post Want scientists and engineers to make more discoveries? Here's how appeared first on JMP Blog.