122016
 

9781629596709_frontcover We want to help scientists and engineers be successful at developing formulations quickly and efficiently. Success requires good strategies to get the right data in the right amount at the right time. That's why we published the book Strategies for Formulation Development: A Step-by-Step Approach Using JMP.

We have worked with formulation scientists and engineers for decades and have seen many different types of formulation development programs. This has shown us what formulation scientists really need to know rather than what is nice to know. Because JMP software is used in the examples in the book, readers get valuable guidance on the software for the proposed methodology. That means JMP users can immediately apply what they learn in the book.

Key takeaways from the book include:

  • Approach the development process from a strategic viewpoint, with the overall end in mind. Don’t necessarily run the largest design possible. An experimentation plan that implements the strategy provides the right road map for developing a successful formulation.
  • Focus on developing understanding how the components blend together. Use designs and models that help find the dominant components, components with large effects, and components with small effects.
  • Use screening experiments early on to identify those components that are most important to the performance of the formulation. This strategy creates a broad view and helps ensure that no important components are overlooked. It also saves significant experimental effort.
  • Analyze both screening and optimization experiments using graphical and numerical methods, which is easily done with JMP. The right graphics can extract additional information from the data.
  • Consider integration of both formulation components and process variables in designs and models, using recently published methods that reduce the required experimentation by up to 50 percent.

This is how you speed up the formulation development process and produce high-quality formulations in a timely manner. Upcoming blog posts will show how to address each of these important issues.

Want more information? You can read a free chapter from the book and learn about authors Ronald D. Snee and Roger W. Hoerl.

tags: Books, Design of Experiments (DOE), Formulation, jmp books

The post Formulation success: Getting the right data in the right amount at the right time appeared first on JMP Blog.

112016
 

Last December, The Every Student Succeeds Act (ESSA) was signed into law to ensure opportunity for all students in the United States. As part of this federal legislation, states now have the flexibility to design their own accountability systems following certain parameters outlined in ESSA. These accountability systems include academic and non-academic indicators. By […]

ESSA – accountability, indicators and analytics to drive informed decision making was published on SAS Voices.

112016
 

This past weekend, Hurricane Matthew came through the Carolinas. Some areas had record flooding, while other areas didn't. I was anxious to get back to work today, so I could use SAS software and create a custom map showing who got how much rain. But before we get to the official […]

The post Where did Hurricane Matthew drop the most water? appeared first on SAS Learning Post.

112016
 

This is a continuation of a series of blog posts on interactive HTML for Graph Builder reports in JMP 13. Here, I'm discussing support for Points, Box Plots, Heat Maps and Map Shapes. These Graph Builder elements are highlighted in the figure below.

toolbar-2

Since this blog post describes interactive web pages output from JMP, images and animations below were captured from a web browser.

Points

Points exist in many graphs in JMP where you can customize the point color, shape, and size, usually by opening a dialog box. Graph Builder’s drag-and-drop interface makes it easy to create colorful graphs with points of all shapes and sizes. The example below using Diamonds Data from the sample data library in JMP sets the following point attributes:

  • Size based on the Table column data
  • Color based on the Depth column data
  • Shape based on the Clarity column data

In addition to these attributes, Price versus Cut and grouping by Carat Weight was employed to understand what influences diamond prices the most. Of course, JMP provides capabilities that specifically target this question, but that’s a topic for another blog post.

DiamondsPoints

This combination of attributes made supporting Graph Builder point plots in Interactive HTML challenging because there are now more ways to determine the size, shape and color of each point. The challenge was increased additionally by the fact that each point in Graph Builder can represent a statistical summary of multiple rows of data.

In the following Interactive HTML example, each point represents diamonds of a given Cut and Clarity. Although the legend is rearranged, the shape and color are still determined by Clarity and Depth respectively. To  accentuate the difference between the diamonds' Table dimensions, a column transform named Relative Table was used in the Size role rather than the raw Table column data. DiamondsPointsMean

Box Plots

The summarized points above may provide too little information and the raw points may be too busy, so how about a compromise using box plots? In this graph, we see the distribution of prices for each Cut, Clarity, and Carat Weight combination. The legend was moved to the bottom and drawn horizontally to match the arrangement of box plots in each group.

DiamondsBoxPlots

Heat Maps

So far, it might be difficult to see what influences diamond prices the most. We’ve only covered three of the four C’s in diamond quality. So, here’s a heat map including all four. Maybe now it’s easier to make some conclusions.

DiamondsHeatMap

Adding support for heat maps in Graph Builder gave us a bonus outside of Graph Builder: The Uplift graph in the Uplift Platform is now interactive and can display X Factors and X/Y ranges.

Uplift

Map Shapes

Map shapes can be used in Graph Builder for location-based data, like population data. Grouping can help the viewer focus on one region at a time. With the ‘Show Missing Shapes’ option enabled, the region of interest can be seen in context of the whole country.

CanadaPopulationByReagion

Map Shapes can be scaled according to a size variable (Population) while being colored by another variable (Vegetable Consumption).

SizedMap

Combinations

To see that some of the interactive power of JMP is available in Interactive HTML, it helps to interact with combined graphs. In JMP this can be accomplished with Combined Windows, Application Builder, or Dashboard Builder. Below are some combination examples using the graph types described above.

This example explores Crime data with a Heat Map and geographical Map Shapes.

CrimeHeatMapsLinkedSelection

The following example uses Points, Box Plots, a Heat Map, and a custom Map Shape to explore office temperatures.

OfficeDashboard

One new feature for Points and Map Shapes in Interactive HTML is the ability to display images in tooltips.

ImagesInTips2

Note that these are just animations. You can interact with the Interactive HTML files shown in this blog here: http://www.jmp.com/jmphtml5/PointsAndMoreBlogExamples.html

tags: Data Visualization, Graph Builder, Interactive HTML, JMP 13

The post Interactive HTML: Points, box plots and more for Graph Builder appeared first on JMP Blog.

102016
 

Data governance must encompass management of the full life cycle of a data policy – its definition, approval, implementation and the means of ensuring its observance - David Loshin, Data Policies and Data Governance I was checking out my Google stats on Data Quality Pro recently and observed that "How […]

The post How I (reluctantly) learned the value of data governance appeared first on The Data Roundtable.

102016
 

The WHERE clause in SAS is a powerful mechanism for selecting observations as you read or write a data set. The WHERE clause supports many operators, including the IN operator, which enables you to compactly specify multiple conditions for a categorical variable.

A common use of the IN operator is to specify a list of US states and territories that should be included or excluded in an analysis. For example, the following DATA step reads the Sashelp.Zipcode data, but excludes zip codes for the states of Alaska ("AK"), Hawaii ("HI"), and US territories such as Puerto Rico ("PR"), Guam ("GU"), and so on:

data Lower48;
set Sashelp.Zipcode(where=(   /* exclude multiple states and territories */
    Statecode not in ("AK" "HI" "VI" "GU" "FM" "MP" "MH" "PW"))
    );
run;

WHERE operators in SAS/IML are vectorized

In my previous article about how to use the WHERE clause in SAS/IML, my examples used scalar comparisons such as where(sex="F") to select only females in the data. The SAS/IML language does not support the IN operator, but there is another compact way to include or exclude multiple values. Because SAS/IML is a matrix-vector language, many operations support vector arguments. In particular, the WHERE clause in the SAS/IML language enables you to use the ordinary equal operator (=) and specify a vector of values on the right hand side!

For example, the following statement reads in all US zip codes in the contiguous US and creates a scatter plot of their locations:

proc iml;
excludeList = {"AK" "HI" "PR" "VI" "GU" "FM" "MP" "MH" "PW"};
use Sashelp.Zipcode where(Statecode ^= excludeList);   /* vector equiv of NOT IN */
read all var {X Y "City" "Statecode"};
close;
 
title "Centers of US ZIP Codes";
call scatter(X, Y) group=Statecode option="markerattrs=(size=2)" 
     label={"Longitude" "Latitude"} procopt="noautolegend";
ZIP code locations filtered by a WHERE clause in SAS/IML

The WHERE clause skips observations for which the Statecode variable matches any of the values in the excludeList vector. The scatter plot reveals the basic shape of the contiguous US. You can see that the plot does not display locations from Alaska, Hawaii, or US territories.

String matching operators in the SAS WHERE clause

As long as we're talking about the WHERE clause in SAS, let's discuss some string-matching operators that might not be familiar to some SAS programmers. I'll use SAS/IML for the examples, but these operators are generally supported (in scalar form) in all SAS WHERE clauses. The operators are

  • The "contains" operator (?)
  • The "not contains" operator (^?)
  • The "begins with" operator (=:)
  • The "sounds like" operator (=*)

All these operators are documented in the list of WHERE clause operators in SAS/IML.


WHERE operators in #SAS: string matching and fuzzy matching
Click To Tweet


The "contains" operator (?) and the "not contains" operator (^?) match a substring that appears anywhere in the target character variable.

The "begins with" operator (=:) matches substrings that appear at the beginning of a target variable. For example, the following statements select observations for which the state begins with the letter "B", "C", or "D". (There are no US states that begin with "B.") Notice that the "begin with" operator is also vectorized in SAS/IML:

use Sashelp.Zipcode where(Statecode =: {"B" "C" "D"});  /* =:  "begins with" */
read all var {X Y "City" "Statecode"};
close;
 
u = unique(Statecode);
print u;
t_whereiml1

Fuzzy matching of English words

Perhaps the most unusual operator in the WHERE clause in SAS is the "sounds like" operator (=*), which does "fuzzy matching" of English words. The operator finds English words that are similar to the specified target words by using the SOUNDEX function in SAS. The SOUNDEX function is often used to select different names that sound alike but have different spelling, such as "John" and "Jon" or "Lynn" and "Lynne." In the following WHERE clause, the "sounds like" operator is used to select observations for which the city sounds similar to either "Cary" or "Asheville." The selected observations are plotted in a scatter plot after eliminating duplicate rows.

use Sashelp.Zipcode where(City =* {"Cary" "Asheville"}); /* =*  "sounds like" */
read all var {X Y "City" "Statecode"};
close;
 
start UniqueRows(x);            
   cols = 1:ncol(x);               /* sort by all columns */
   call sortndx(ndx, x, cols);     /* ndx = permutation of rows that sorts matrix */
   uRows = uniqueby(x, cols, ndx); /* locate unique rows of sorted matrix */
   return ( ndx[uRows] );          /* rows in original matrix */
finish;
 
r = UniqueRows(City||Statecode);   /* get row numbers in x for unique rows */
call scatter(X[r], Y[r]) group=Statecode[r] datalabel=City[r]
   option="markerattrs=(symbol=CircleFilled)" procopt="noautolegend";
Citites that sound like Cary or Asheville, filtered by a WHERE clause in SAS/IML

According to the "sounds like" operator, the name "Cary" sounds like "Carey," "Cory," and "Cherry." The name "Asheville" sounds like "Ashville," "Ashfield," and "Ash Flat."

In summary, the WHERE clause in SAS/IML works a little differently than the more-familiar version in the SAS DATA step. Both versions enable you to selectively include or exclude observations that satisfy one or more conditions. However, the SAS/IML WHERE clause is vectorized. You can specify a vector of conditions for operators, thus reproducing the functionality of the IN operator.

This article also demonstrates a few lesser-known string operators, such as "contains" (?), "not contains" (^?), "begins with" (=:), and "sounds like" (=*).

tags: SAS Programming

The post WHERE operators in SAS: Multiple comparisons and fuzzy matching appeared first on The DO Loop.

082016
 

Have you ever been involved in executing an exploratory analysis based on an integrated clinical trial database? If so, you've probably experienced firsthand how elaborate the initial phase of data access and data processing can be. Market analysts estimate the ratio for preparing the data, compared to actually analyzing the information, […]

Building clinical data and insight visually was published on SAS Voices.