8月 172018
 

Data density estimation is often used in statistical analysis as well as in data mining and machine learning. Visualization of data density estimation will show the data’s characteristics like distribution, skewness and modality, etc. The most widely-used visualizations people used for data density are boxplot, histogram, kernel density estimates, and some other plots. SAS has several procedures that can create such plots. Here, I'll visualize the kernel density estimates superimposing on histogram using SAS Visual Analytics.

A histogram shows the data distribution through some continuous interval bins, and it is a very useful visualization to present the data distribution. With a histogram, we can get a rough view of the density of the values distribution. However, the bin width (or number of bins) has significant impact to the shape of a histogram and thus gives different impressions to viewers. For example, we have same data for the two below histograms, the left one with 6 bins and the right one with 4 bins. Different bin width shows different distribution for same data. In addition, histogram is not smooth enough to visually compare with the mathematical density models. Thus, many people use kernel density estimates which looks more smoothly varying in the distribution.

Kernel density estimates (KDE) is a widely-used non-parametric approach of estimating the probability density of a random variable. Non-parametric means the estimation adjusts to the observations in the data, and it is more flexible than parametric estimation. To plot KDE, we need to choose the kernel function and its bandwidth. Kernel function is used to compute kernel density estimates. Bandwidth controls the smoothness of KDE plot, which is essentially the width of the sliding window used to generate the density. SAS offers several ways to generate the kernel density estimates. Here I use the Proc UNIVARIATE to create KDE output as an example (for simplicity, I set c = SJPI to have SAS select the bandwidth by using the Sheather-Jones plug-in method), then make the corresponding visualization in SAS Visual Analytics.

Visualize the kernel density estimates using SAS code

It is straightforward to run kernel density estimates using SAS Proc UNIVARIATE. Take the variable MSRP in SASHELP.CARS dataset as an example. The min/max value of MSRP column is 10280 and 192465 respectively. I plot the histogram with 15 bins here in the example. Below is the sample codes segment I used to construct kernel density estimates of the MSRP column:

title 'Kernel density estimates of MSRP';
proc univariate data = sashelp.cars noprint;	
   histogram MSRP / kernel (c = SJPI) endpoints = 10280 to 192465 by 12145 outkernel = KDE  odstitle = title; 
run;

Run above code in SAS Studio, and we get following graph.

Visualize the kernel density estimates using SAS Visual Analytics

  1. In SAS Visual Analytics, load the SASHELP.CARS and the KDE dataset (from previous Proc UNIVARIATE) to the CAS server.
  2. Drag and drop a ‘Precision Container’ in the canvas, and put a histogram and a numeric series plot in the container.
  3. Assign corresponding data to the histogram plot: assign CARS.MSRP as histogram Measure, and ‘Frequency Percent’ as histogram Frequency; Set the options of the histogram with following settings:
    Object -> Title: No title;

    Graph Frame: Grid lines: disabled

    Histogram -> Bin range: Measure values; check the ‘Set a fixed bin count’ and set ‘Bin count’ to 15.

    X Axis options:

       Fixed minimum: 10280

       Fixed maximum: 192465

       Axis label: disabled

       Axis Line: enabled

       Tick value: enabled

    Y Axis options:

       Fixed minimum: 0

       Fixed maximum: 0.5

       Axis label: disabled

       Axis Line: disabled

       Tick value: disabled

  1. Assign corresponding KDE data to the numeric series plot. Define a calculated item: Percent as (‘Percent of Observations Per Data Unit’n / 100) with the format of ‘PERCENT12.2’, and assign it to the ‘Y axis’; assign the ‘Data Value’ to the ‘X axis.’ Now set the options of the numeric series plot with following settings:
    Object -> Title: No title;

    Style -> Line/Marker: (change the first color to purple)

    Graph Frame -> Grid lines: disabled

    Series -> Line thickness: 2

    X Axis options:

       Axis label: disabled

       Axis Line: disabled

       Tick value: disabled

    Y Axis options:

       Fixed minimum: 0

       Fixed maximum: 0.5

       Axis label: enabled

       Axis Line: enabled

       Tick value: enabled

    Legend:

       Visibility: Off

  1. Now we can start to overlay the two charts. As can be seen in the screenshot below, SAS Visual Analytics 8.3 provides a smart guide with precision container, which shows grids to help you align the objects in it. If you hold the ctrl button while dragging the numeric series plot to overlay the histogram, some fine grids displayed by the smart guide to help you with basic alignment. It is a little tricky though, to make the overlay precisely, you may fine tune the value of the Left/Top/Width/Height in the Layout of VA Options panel. The goal is to make the intersection of the axes coincides with each other.

After that, we can add a text object above the charts we just made, and done with the kernel density estimates superimposing on a histogram shown in below screenshot, similarly as we got from SAS Proc UNIVARIATE. (If you'd like to use PROC KDE UNIVAR statement for data density estimates, you can visualize it in SAS Visual Analytics in a similar way.)

To go further, I make a KDE with a scatter plot where we can also get impression of the data density with those little circles; another KDE plot with a needle plot where the data density is also represented by the barcode-like lines. Both are created in similar ways as described in above histogram example.

So far, I’ve shown you how I visualize KDE using SAS Visual Analytics. There are other approaches to visualize the kernel density estimates in SAS Visual Analytics, for example, you may create a custom graph in Graph Builder and import it into SAS Visual Analytics to do the visualization. Anyway, KDE is a good visualization in helping you understand more about your data. Why not give a try?

Visualizing kernel density estimates in SAS Visual Analytics was published on SAS Users.

8月 172018
 

When I'm at a social gathering, someone always asks what type of work I do. I like to keep my social life separate from my work, therefore I usually give a vague answer such as "software" (and quickly change the topic). How vague or specific is your response? How vague [...]

The post What are the most common occupations in each US state? appeared first on SAS Learning Post.

8月 152018
 

SAS Viya logoSAS Viya 3.4 has some new functionality that provides real help for those who want to transition from SAS Visual Analytics on 9.4 to SAS Viya. In prior releases of SAS Viya you could promote reports and explorations (and a few other supporting objects). In SAS Viya 3.4, promotion support is added for many additional SAS 9.4 resources, making it easier to make the leap to SAS Viya. In this blog, I will review this new functionality.

In SAS Viya 3.4, the following objects participate in promotion from SAS 9.4.

  • Configuration
    • Identities
    • Authorization
    • Data definitions
  • Content
    • Folders
    • Reports
    • Explorations
    • Stored processes
    • Supporting resources (such as themes, images, graphs templates)

The details of support for each resource are unique and are discussed below.

Identities

User and group promotion from SAS 9.4 to SAS Viya is used to support the transition to the target environment of authorization settings that are associated with content.  Metadata is exported to support the mapping of SAS 9.4 identity metadata (Users and Groups) to SAS Viya identities (Users, Groups and Custom groups).

During promotion of identity metadata:

  • Users connections are mapped using metadata DefaultAuth:logonid to SAS Viya identity id
  • Metadata-only groups from SAS 9.4 are converted to SAS Viya Custom groups (except SAS General Servers and SAS System Services)
  • If custom groups of the same name (or sometimes the same purpose but a different name) exist in the target, the group is preserved and any mapped members from the source system are added to the group.

Authorization

Identities are “promoted” to support re-implementation of authorization. You do not have to explicitly export authorization as it is included with libraries, tables, folders and reports when they are exported. Promotion of authorization is optional. If you don’t wish to include authorization, but rather re-implement it in
SAS Viya, you can switch this functionality off at import time.

SAS Viya has two authorization systems, the general authorization system for folders and content, and the CAS authorization system for data. These authorization systems are different than the metadata authorization model in SAS 9.4. So what happens when you promote content that includes authorization?

General Authorization (folders and content)

Promotion will attempt to convert SAS 9.4 authorization to rules in the General authorization system.  During the process:

  • Explicit Access Control Entries are converted to SAS Viya Rules
  • Access Control Entries with denials are discarded
  • Access Control Templates are not promoted

In addition, if an object (folder/report):

  • does not exist in the target environment,relevant authorization is set for the object and the access control entries from the source are implemented as rules on the object.
  • existsin the target environment, then access control entries from the source are merged with any pre-existing authorizations in the target environment.

CAS Authorization

The CAS authorization system covers CASlibs and data.  Promotion will attempt to convert SAS 9.4 authorization on libraries and tables to access controls in the CAS authorization system. During the process:

  • Access Control Entries are not promoted unless they are applied directly to a library or table.
  • Access Control Entries are converted to CAS access controls.
  • Row-level permissions are preserved.
  • If an object exists in the target environment no authorization settings are imported.
  • Access Control Templates are not promoted.

For details of how individual permissions for both data and content are mapped from SAS 9.4 to SAS Viya see the documentation has great coverage of the steps to follow.

The Process

To finish off, I'll share few observations on the process of exporting from 9.4 and importing in SAS Viya. Like SAS 9.4 promotion, you need to import in a specific order. This allows the software to make the relevant connections to dependent resources. For example, if the CASLIB already exists in the target, then import tables can be mapped to it. Typically, the order is: identities > library definitions > tables > reports and folders. To support this process, make sure, during export, you have a separate package for each resource type. Some considerations for the export process.

You should export:

  • Identities (users and groups) from the security folders in SAS 9.4 metadata to a separate package.
  • Only groups that you need in the target environment (you can subset any irrelevant SAS 9 groups at export time).
  • LASR and Base Libraries and tables directly from the library definition in the folder tree (this prevents extraneous folders being created in the target environment).
  • Libraries in a separate package from tables so that they may be imported first and be available for mapping when the tables are imported.
  • Content and reports from the base of the folder tree so that all directly applied access control entries will be included in the package.

Prior to importing, make sure that users and groups are configured correctly in LDAP. As I already mentioned, physical data is not promoted so ensure that required data and formats are accessible to the SAS Viya environment.

The new functionality for promotion is a great start in helping with the transition from SAS 9.4 to SAS Viya. Look for more functionality in future releases.

New functionality for transitioning from SAS Visual Analytics on 9.4 to SAS Viya was published on SAS Users.

8月 152018
 

What if? Two simple words. A bold question. Driver of change and progress. Fuel for innovation. “What if?” perfectly captures core values of the SAS culture: to be curious, forward-thinking and to challenge assumptions in solving problems. Today, I would like to introduce to you the people behind four of [...]

Curiosity: The force that drives innovation in SAS® technologies was published on SAS Voices by Oliver Schabenberger

8月 152018
 

What if? Two simple words. A bold question. Driver of change and progress. Fuel for innovation. “What if?” perfectly captures core values of the SAS culture: to be curious, forward-thinking and to challenge assumptions in solving problems. Today, I would like to introduce to you the people behind four of [...]

Curiosity: The force that drives innovation in SAS® technologies was published on SAS Voices by Oliver Schabenberger

8月 152018
 

This article shows how to perform an optimization in SAS when the parameters are restricted by nonlinear constraints. In particular, it solves an optimization problem where the parameters are constrained to lie in the annular region between two circles. The end of the article shows the path of partial solutions which converges to the solution and verifies that all partial solutions lie within the constraints of the problem.

The four kinds of constraints in optimization

The term "nonlinear constraints" refers to bounds placed on the parameters. There are four types of constraints in optimization problems. From simplest to most complicated, they are as follows:

  • Unconstrained optimization: In this class of problems, the parameters are not subject to any constraints.
  • Bound-constrained optimization: One or more parameters are bounded. For example, the scale parameter in a probability distribution is constrained by σ > 0.
  • Linearly constrained optimization: Two or more parameters are linearly bounded. In a previous example, I showed images of a polygonal "feasible region" that can result from linear constraints.
  • Nonlinearly constrained optimization: Two or more parameters are nonlinearly bounded. This article provides an example.

A nonlinearly constrained optimization

Objective function in a nonlinear constrained region

This article solves a two-dimensional nonlinearly constrained optimization problem. The constraint region will be the annular region defined by the two equations
x2 + y2 ≥ 1
x2 + y2 ≤ 9

Let f(x,y) be the objective function to be optimized. For this example,
f(x,y) = cos(π r) - Dist( (x,y), (2,0) )
where
r = sqrt(x2 + y2) and
Dist( (x,y), (2,0) ) = sqrt( (x-2)2 + (y-0)2 )
is the distance from (x,y) to the point (2,0).

By inspection, the maximum value of f occurs at (x,y) = (2,0) because the objective function obtains its largest value (1) at that point. The following SAS statements use a scatter plot to visualize the value of the objective function inside the constraint region. The graph is shown to the right.

/* evaluate the objective function on a regular grid */
data FuncViz;
do x = -3 to 3 by 0.05;
   do y = -3 to 3 by 0.05;
      r2 = x**2 + y**2;
      r = sqrt(r2);
      ObjFunc = cos(constant('pi')*r) - sqrt( (x-2)**2+(y-0)**2 );
      if 1 <= r2 <= 9 then output;  /* output points in feasible region */
   end;
end;
run;
 
/* draw the boundary of the constraint region */
%macro DrawCircle(R);
   R = &R;
   do t = 0 to 2*constant('pi') by 0.05; 
      xx= R*cos(t); yy= R*sin(t); output;
   end;
%mend;
data ConstrBoundary;
   %DrawCircle(1)
   %DrawCircle(3)
run;
 
ods graphics / width=400px height=360px antialias=on antialiasmax=12000 labelmax=12000;
title "Objective Function and Constraint Region";
data Viz; set FuncViz ConstrBoundary;  run;
 
proc sgplot data=Viz noautolegend;
   xaxis grid; yaxis grid;
   scatter x=x y=y / markerattrs=(symbol=SquareFilled size=4)
                     colorresponse=ObjFunc colormodel=ThreeColorRamp; 
   polygon ID=R x=xx y=yy / lineattrs=(thickness=3);
   gradlegend;
run;

Solving a constrained optimization problem in SAS/IML

SAS/IML and SAS/OR each provide general-purpose tools for optimization in SAS. This section solves the problem by using SAS/IML. A solution that uses PROC OPTMODEL is more compact and is shown at the end of this article.

SAS/IML supports 10 different optimization algorithms. All support unconstrained, bound-constrained, and linearly constrained optimization. However, only two support nonlinear constraints: the quasi-Newton method (implemented by using the NLPQN subroutine) and the Nelder-Mead simplex method (implemented by using the NLPNMS subroutine).

To specify k nonlinear constraints in SAS/IML, you need to do two things:

  • Use the options vector to the NLP subroutines to specify the number of nonlinear constraints. Set optn[10] = k to tell SAS/IML that there are a total of k constraints. If any of the constraints are equality constraints, specify that number in the optn[11] field. That is, if there are keq equality constraints, then optn[11] = keq.
  • Write a SAS/IML module that accepts a parameter value as input and produces a column vector that contains k rows. Each row representing the evaluation of one of the constraints on the input parameters. For consistency, all nonlinear inequality constraints should be written in the form g(c) ≥ 0. The first keq rows contains the equality constraints and the remaining kkeq elements contain the inequality constraints. You specify the module that evaluates the nonlinear constraints by using the NLC= option on the NLPQN or NLPNMS subroutine.

For the current example, k = 2 and keq = 0. Therefore the module that specifies the nonlinear constraints should return two rows. The first row evaluates whether a parameter value is outside the unit circle. The second row evaluates whether a parameter value is inside the circle with radius 3. Both constraints need to be written as "greater than" inequalities. This leads to the following SAS/IML program which solves the nonlinear optimization problem:

proc iml;
start ObjFunc(z);
   r = sqrt(z[,##]);        /* z = (x,y) ==> z[,##] = x##2 + y##2 */
   f = cos( constant('pi')*r ) - sqrt( (z[,1]-2)##2 + (z[,2]-0)##2 ); 
   return f;
finish;
 
start NLConstr(z);
   con = {0, 0};            /* column vector for two constraints */
   con[1] = z[,##] - 1;     /* (x##2 + y##2) >= 1    ==> (x##2 + y##2) - 1 >= 0 */
   con[2] = 9 - z[,##];     /* (x##2 + y##3) <= 3##2 ==> 9 - (x##2 + y##2) >= 0 */
   return con;
finish;
 
optn= j(1,11,.);     /* allocate options vector (missing ==> 'use default') */
optn[1] = 1;         /* maximize objective function */
optn[2] = 4;         /* include iteration history in printed output */
optn[10]= 2;         /* there are two nonlinear constraints */
optn[11]= 0;         /* there are no equality constraints */
 
ods output grad = IterPath;   /* save iteration history in 'IterPath' data set */
z0 = {-2 0};                  /* initial guess */
call NLPNMS(rc,xOpt,"ObjFunc",z0,optn) nlc="NLConstr";   /* solve optimization */
ods output close;

The call to the NLPNMS subroutine generates a lot of output (because optn[2]=4). The final parameter estimates are shown. The routine estimates the optimal parameters as (x,y) = (2.000002, 0.000015), which is very close to the exact optimal value (x,y)=(2,0). The objective function at the estimated parameters is within 1.5E-5 of the true maximum.

Visualizing the iteration path towards an optimum

I intentionally chose the initial guess for this problem to be (x0,y0)=(-2,0), which is diametrically opposite the true value in the annular region. I wanted to see how the optimization algorithm traverses the annular region. I knew that every partial solution would be within the feasible region, but I suspected that it might be possible for the iterates to "jump across" the hole in the middle of the annulus. The following SAS statements visualize the iteration history of the Nelder-Mead method as it iterates towards an optimal solution:

data IterPath2;
   set IterPath;
   Iteration = '  ';
   if _N_ IN (1,5,6,7,9) then Iteration = put(_N_,2.); /* label certain step numbers */
   format Iteration 2.;
run;
 
data Path;  set Viz IterPath2;  run;
 
title "Objective Function and Solution Path";
title2 "Nelder-Mead Simplex Method";
proc sgplot data=Path noautolegend;
   xaxis grid; yaxis grid;
   scatter x=x y=y / markerattrs=(symbol=SquareFilled size=4)
                     colorresponse=ObjFunc colormodel=ThreeColorRamp; 
   polygon ID=R x=xx y=yy / lineattrs=(thickness=3);
   series x=x1 y=x2 / markers datalabel=Iteration datalabelattrs=(size=8) markerattrs=(size=5);
   gradlegend;
run;
Iteration history of an initial guess for the Nelder-Mead simplex method in a nonlinear constrained region

For this initial guess and this optimization algorithm, the iteration path "jumps across" the hole in the annulus on the seventh iteration. For other initial guesses or methods, the iteration might follow a different path. For example, if you change the initial guess and use the quasi-Newton method, the optimization requires many more iterations and follows the "ridge" near the circle of radius 2, as shown below:
z0 = {-2 -0.1}; /* initial guess */
call NLPQN(rc,xOpt,"ObjFunc",z0,optn) nlc="NLConstr"; /* solve optimization */

Iteration history of an initial guess for the quasi-Newton method in a nonlinear constrained region

Nonlinear constraints with PROC OPTMODEL

If you have access to SAS/OR software, PROC OPTMODEL provides a syntax that is compact and readable. For this problem, you can use a single CONSTRAINT statement to specify both the inner and outer bounds of the annular constraints. The following call to PROC OPTMODEL uses an interior-point method to compute a solution that numerically equivalent to (x,y) = (-2, 0):

proc optmodel;
   var x, y;
   max f = cos(constant('pi')*sqrt(x^2+y^2)) - sqrt((x-2)^2+(y-0)^2);
   constraint 1 <= x^2 + y^2 <= 9;       /* define constraint region */
   x = -2; y = -0.1;                     /* initial guess */
   solve with NLP / maxiter=25;
   print x y;
quit;

Summary

In summary, you can use SAS software to solve nonlinear optimization problems that are unconstrained, bound-constrained, linearly constrained, or nonlinearly constrained. This article shows how to solve a nonlinearly constrained problem by using SAS/IML or SAS/OR software. In SAS/IML, you have to specify the number of nonlinear constraints and write a function that can evaluate a parameter value and return a vector of positive numbers if the parameter is in the constrained region.

The post Optimization with nonlinear constraints in SAS appeared first on The DO Loop.

8月 142018
 

Even though I’ve worked at SAS for nearly 30 years, I still get excited when great things come together for our customers! This year we are hosting the very first hackathon at our Analytics Experience conference in San Diego - the AnalyticsX Hackathon. Analytics Experience is in its third year [...]

The post Announcing the AnalyticsX Hackathon: Are you up for it? appeared first on SAS Learning Post.

8月 132018
 

Data in the cloud makes it easily accessible, and can help businesses run more smoothly. SAS Viya runs its calculations on Cloud Analytics Service (CAS). David Shannon of Amadeus Software spoke at SAS Global Forum 2018 and presented his paper, Come On, Baby, Light my SAS Viya: Programming for CAS. (In addition to being an avid SAS user and partner, David must be an avid Doors fan.) This article summarizes David's overview of how to run SAS programs in SAS Viya and how to use CAS sessions and libraries.

If you're using SAS Viya, you're going to need to know the basics of CAS to be able to perform calculations and use SAS Viya to its potential. SAS 9 programs are compatible with SAS Viya, and will run as-is through the CAS engine.

Using CAS sessions and libraries

Use a CAS statement to kick off a session, then use CAS libraries (caslibs) to store data and resources. To start the session, simply code "cas;" Each CAS session is given its own unique identifier (UUID) that you can use to reconnect to the session.

Handpicked Related VIDEO: SAS programming in the cloud: CASL code

There are a few significant codes that can help you to master CAS operations. Consider these examples, based on a CAS session that David labeled "speedyanalytics":

  • What CAS sessions do I have running?
    cas _all_ list;
  • Get the version and license specifics from the CAS server hosting my session:
    cas speedyanalytics listabout;
  • I want to sign out of SAS Studio for now, so I will disconnect from my CAS session, but return to it later…
    cas speedyanalytics disconnect;
  • ...later in the same or different SAS Studio session, I want to reconnect to the CAS session I started earlier using the UUID I previous grabbed from the macro variable or SAS log:
    cas uuid="&speedyanalytics_uuid";
  • At the end of my program(s), shutdown all my CAS sessions to release resources on the server:
    cas _all_ terminate;

Using CAS libraries

CAS libraries (caslib) are the method to access data that is being stored in memory, as well as the related metadata.

From the library, you can load data into CAS tables in a couple of different ways:

  1. Takes a sample data set, calculate a new measure and stores the output in memory
  2. Proc COPY can bring existing SAS data into a caslib
  3. Proc CASUTIL loads tables into caslibs

The Proc CASUTIL allows you to save your tables (named "classsi" data in David's examples) for future use through the SAVE statement:

proc casutil;
 save casdata="classsi" casout="classsi";
run;

And reload like this in a future session, using the LOAD statement:

proc casutil;
 load casdata="classsi" casout="classsi";
run;

When accessing your CAS libraries, remember that there are multiple levels of scope that can apply. "Session" refers to data from just the current session, whereas "Global" allows you to reach data from all CAS sessions.

Programming in CAS

Showing how to put CAS into action, David shared this diagram of a typical load/save/share flow:

Existing SAS 9 programs and CAS code can both be run in SAS Viya. The calculations and data memory occurs through CAS, the Cloud Analytics Service. Before beginning, it's important to understand a general overview of CAS, to be able to access CAS libraries and your data. For more about CAS architecture, read this paper from CAS developer Jerry Pendergrass.

The performance case for SAS Viya

To close out his paper, David outlined a small experiment he ran to demonstrate performance advantages that can be seen by using SAS Viya v3.3 over a standard, stand-alone SAS v9.4 environment. The test was basic, but performed reads, writes, and analytics on a 5GB table. The tests revealed about a 50 percent increase in performance between CAS and SAS 9 (see the paper for a detailed table of comparison metrics). SAS Viya is engineered for distributive computing (which works especially well in cloud deployments), so more extensive tests could certainly reveal even further increases in performance in many use cases.

Additional resources

A quick introduction to CAS in SAS Viya was published on SAS Users.

8月 132018
 

In the oil industry you can make or lose money based on how good your forecasts are, so I’ve pulled together six papers that discuss different ways in which you can leverage analytics to optimize your output and more accurately predict your production performance. Written by employees at oil and [...]

6 real-world ways to use optimization and forecasting in the oil and gas industry was published on SAS Voices by David Pope