Avoid frustrations by following these 5 tips from David Loshin to create a successful data management strategy for analytics.
The post Data management strategy for analytics appeared first on The Data Roundtable.
Avoid frustrations by following these 5 tips from David Loshin to create a successful data management strategy for analytics.
The post Data management strategy for analytics appeared first on The Data Roundtable.
Data density estimation is often used in statistical analysis as well as in data mining and machine learning. Visualization of data density estimation will show the data’s characteristics like distribution, skewness and modality, etc. The most widely-used visualizations people used for data density are boxplot, histogram, kernel density estimates, and some other plots. SAS has several procedures that can create such plots. Here, I'll visualize the kernel density estimates superimposing on histogram using SAS Visual Analytics.
A histogram shows the data distribution through some continuous interval bins, and it is a very useful visualization to present the data distribution. With a histogram, we can get a rough view of the density of the values distribution. However, the bin width (or number of bins) has significant impact to the shape of a histogram and thus gives different impressions to viewers. For example, we have same data for the two below histograms, the left one with 6 bins and the right one with 4 bins. Different bin width shows different distribution for same data. In addition, histogram is not smooth enough to visually compare with the mathematical density models. Thus, many people use kernel density estimates which looks more smoothly varying in the distribution.
Kernel density estimates (KDE) is a widely-used non-parametric approach of estimating the probability density of a random variable. Non-parametric means the estimation adjusts to the observations in the data, and it is more flexible than parametric estimation. To plot KDE, we need to choose the kernel function and its bandwidth. Kernel function is used to compute kernel density estimates. Bandwidth controls the smoothness of KDE plot, which is essentially the width of the sliding window used to generate the density. SAS offers several ways to generate the kernel density estimates. Here I use the Proc UNIVARIATE to create KDE output as an example (for simplicity, I set c = SJPI to have SAS select the bandwidth by using the Sheather-Jones plug-in method), then make the corresponding visualization in SAS Visual Analytics.
It is straightforward to run kernel density estimates using SAS Proc UNIVARIATE. Take the variable MSRP in SASHELP.CARS dataset as an example. The min/max value of MSRP column is 10280 and 192465 respectively. I plot the histogram with 15 bins here in the example. Below is the sample codes segment I used to construct kernel density estimates of the MSRP column:
title 'Kernel density estimates of MSRP'; proc univariate data = sashelp.cars noprint; histogram MSRP / kernel (c = SJPI) endpoints = 10280 to 192465 by 12145 outkernel = KDE odstitle = title; run; |
Run above code in SAS Studio, and we get following graph.
Graph Frame: Grid lines: disabled
Histogram -> Bin range: Measure values; check the ‘Set a fixed bin count’ and set ‘Bin count’ to 15.
X Axis options:
Fixed minimum: 10280
Fixed maximum: 192465
Axis label: disabled
Axis Line: enabled
Tick value: enabled
Y Axis options:
Fixed minimum: 0
Fixed maximum: 0.5
Axis label: disabled
Axis Line: disabled
Tick value: disabled
Style -> Line/Marker: (change the first color to purple)
Graph Frame -> Grid lines: disabled
Series -> Line thickness: 2
X Axis options:
Axis label: disabled
Axis Line: disabled
Tick value: disabled
Y Axis options:
Fixed minimum: 0
Fixed maximum: 0.5
Axis label: enabled
Axis Line: enabled
Tick value: enabled
Legend:
Visibility: Off
After that, we can add a text object above the charts we just made, and done with the kernel density estimates superimposing on a histogram shown in below screenshot, similarly as we got from SAS Proc UNIVARIATE. (If you'd like to use PROC KDE UNIVAR statement for data density estimates, you can visualize it in SAS Visual Analytics in a similar way.)
To go further, I make a KDE with a scatter plot where we can also get impression of the data density with those little circles; another KDE plot with a needle plot where the data density is also represented by the barcode-like lines. Both are created in similar ways as described in above histogram example.
So far, I’ve shown you how I visualize KDE using SAS Visual Analytics. There are other approaches to visualize the kernel density estimates in SAS Visual Analytics, for example, you may create a custom graph in Graph Builder and import it into SAS Visual Analytics to do the visualization. Anyway, KDE is a good visualization in helping you understand more about your data. Why not give a try?
Visualizing kernel density estimates in SAS Visual Analytics was published on SAS Users.
When I'm at a social gathering, someone always asks what type of work I do. I like to keep my social life separate from my work, therefore I usually give a vague answer such as "software" (and quickly change the topic). How vague or specific is your response? How vague [...]
The post What are the most common occupations in each US state? appeared first on SAS Learning Post.
SAS Viya 3.4 has some new functionality that provides real help for those who want to transition from SAS Visual Analytics on 9.4 to SAS Viya. In prior releases of SAS Viya you could promote reports and explorations (and a few other supporting objects). In SAS Viya 3.4, promotion support is added for many additional SAS 9.4 resources, making it easier to make the leap to SAS Viya. In this blog, I will review this new functionality.
In SAS Viya 3.4, the following objects participate in promotion from SAS 9.4.
The details of support for each resource are unique and are discussed below.
User and group promotion from SAS 9.4 to SAS Viya is used to support the transition to the target environment of authorization settings that are associated with content. Metadata is exported to support the mapping of SAS 9.4 identity metadata (Users and Groups) to SAS Viya identities (Users, Groups and Custom groups).
During promotion of identity metadata:
Identities are “promoted” to support re-implementation of authorization. You do not have to explicitly export authorization as it is included with libraries, tables, folders and reports when they are exported. Promotion of authorization is optional. If you don’t wish to include authorization, but rather re-implement it in
SAS Viya, you can switch this functionality off at import time.
SAS Viya has two authorization systems, the general authorization system for folders and content, and the CAS authorization system for data. These authorization systems are different than the metadata authorization model in SAS 9.4. So what happens when you promote content that includes authorization?
Promotion will attempt to convert SAS 9.4 authorization to rules in the General authorization system. During the process:
In addition, if an object (folder/report):
The CAS authorization system covers CASlibs and data. Promotion will attempt to convert SAS 9.4 authorization on libraries and tables to access controls in the CAS authorization system. During the process:
For details of how individual permissions for both data and content are mapped from SAS 9.4 to SAS Viya see the documentation has great coverage of the steps to follow.
To finish off, I'll share few observations on the process of exporting from 9.4 and importing in SAS Viya. Like SAS 9.4 promotion, you need to import in a specific order. This allows the software to make the relevant connections to dependent resources. For example, if the CASLIB already exists in the target, then import tables can be mapped to it. Typically, the order is: identities > library definitions > tables > reports and folders. To support this process, make sure, during export, you have a separate package for each resource type. Some considerations for the export process.
You should export:
Prior to importing, make sure that users and groups are configured correctly in LDAP. As I already mentioned, physical data is not promoted so ensure that required data and formats are accessible to the SAS Viya environment.
The new functionality for promotion is a great start in helping with the transition from SAS 9.4 to SAS Viya. Look for more functionality in future releases.
New functionality for transitioning from SAS Visual Analytics on 9.4 to SAS Viya was published on SAS Users.
What if? Two simple words. A bold question. Driver of change and progress. Fuel for innovation. “What if?” perfectly captures core values of the SAS culture: to be curious, forward-thinking and to challenge assumptions in solving problems. Today, I would like to introduce to you the people behind four of [...]
Curiosity: The force that drives innovation in SAS® technologies was published on SAS Voices by Oliver Schabenberger
What if? Two simple words. A bold question. Driver of change and progress. Fuel for innovation. “What if?” perfectly captures core values of the SAS culture: to be curious, forward-thinking and to challenge assumptions in solving problems. Today, I would like to introduce to you the people behind four of [...]
Curiosity: The force that drives innovation in SAS® technologies was published on SAS Voices by Oliver Schabenberger
This article shows how to perform an optimization in SAS when the parameters are restricted by nonlinear constraints. In particular, it solves an optimization problem where the parameters are constrained to lie in the annular region between two circles. The end of the article shows the path of partial solutions which converges to the solution and verifies that all partial solutions lie within the constraints of the problem.
The term "nonlinear constraints" refers to bounds placed on the parameters. There are four types of constraints in optimization problems. From simplest to most complicated, they are as follows:
This article solves a two-dimensional nonlinearly constrained optimization problem.
The constraint region will be the annular region defined by the two equations
x^{2} + y^{2} ≥ 1
x^{2} + y^{2} ≤ 9
Let f(x,y) be the objective function to be optimized. For this example,
f(x,y) = cos(π r) - Dist( (x,y), (2,0) )
where
r = sqrt(x^{2} + y^{2}) and
Dist( (x,y), (2,0) ) = sqrt( (x-2)^{2} + (y-0)^{2} )
is the distance from (x,y) to the point (2,0).
By inspection, the maximum value of f occurs at (x,y) = (2,0) because the objective function obtains its largest value (1) at that point. The following SAS statements use a scatter plot to visualize the value of the objective function inside the constraint region. The graph is shown to the right.
/* evaluate the objective function on a regular grid */ data FuncViz; do x = -3 to 3 by 0.05; do y = -3 to 3 by 0.05; r2 = x**2 + y**2; r = sqrt(r2); ObjFunc = cos(constant('pi')*r) - sqrt( (x-2)**2+(y-0)**2 ); if 1 <= r2 <= 9 then output; /* output points in feasible region */ end; end; run; /* draw the boundary of the constraint region */ %macro DrawCircle(R); R = &R; do t = 0 to 2*constant('pi') by 0.05; xx= R*cos(t); yy= R*sin(t); output; end; %mend; data ConstrBoundary; %DrawCircle(1) %DrawCircle(3) run; ods graphics / width=400px height=360px antialias=on antialiasmax=12000 labelmax=12000; title "Objective Function and Constraint Region"; data Viz; set FuncViz ConstrBoundary; run; proc sgplot data=Viz noautolegend; xaxis grid; yaxis grid; scatter x=x y=y / markerattrs=(symbol=SquareFilled size=4) colorresponse=ObjFunc colormodel=ThreeColorRamp; polygon ID=R x=xx y=yy / lineattrs=(thickness=3); gradlegend; run; |
SAS/IML and SAS/OR each provide general-purpose tools for optimization in SAS. This section solves the problem by using SAS/IML. A solution that uses PROC OPTMODEL is more compact and is shown at the end of this article.
SAS/IML supports 10 different optimization algorithms. All support unconstrained, bound-constrained, and linearly constrained optimization. However, only two support nonlinear constraints: the quasi-Newton method (implemented by using the NLPQN subroutine) and the Nelder-Mead simplex method (implemented by using the NLPNMS subroutine).
To specify k nonlinear constraints in SAS/IML, you need to do two things:
For the current example, k = 2 and k_{eq} = 0. Therefore the module that specifies the nonlinear constraints should return two rows. The first row evaluates whether a parameter value is outside the unit circle. The second row evaluates whether a parameter value is inside the circle with radius 3. Both constraints need to be written as "greater than" inequalities. This leads to the following SAS/IML program which solves the nonlinear optimization problem:
proc iml; start ObjFunc(z); r = sqrt(z[,##]); /* z = (x,y) ==> z[,##] = x##2 + y##2 */ f = cos( constant('pi')*r ) - sqrt( (z[,1]-2)##2 + (z[,2]-0)##2 ); return f; finish; start NLConstr(z); con = {0, 0}; /* column vector for two constraints */ con[1] = z[,##] - 1; /* (x##2 + y##2) >= 1 ==> (x##2 + y##2) - 1 >= 0 */ con[2] = 9 - z[,##]; /* (x##2 + y##3) <= 3##2 ==> 9 - (x##2 + y##2) >= 0 */ return con; finish; optn= j(1,11,.); /* allocate options vector (missing ==> 'use default') */ optn[1] = 1; /* maximize objective function */ optn[2] = 4; /* include iteration history in printed output */ optn[10]= 2; /* there are two nonlinear constraints */ optn[11]= 0; /* there are no equality constraints */ ods output grad = IterPath; /* save iteration history in 'IterPath' data set */ z0 = {-2 0}; /* initial guess */ call NLPNMS(rc,xOpt,"ObjFunc",z0,optn) nlc="NLConstr"; /* solve optimization */ ods output close; |
The call to the NLPNMS subroutine generates a lot of output (because optn[2]=4). The final parameter estimates are shown. The routine estimates the optimal parameters as (x,y) = (2.000002, 0.000015), which is very close to the exact optimal value (x,y)=(2,0). The objective function at the estimated parameters is within 1.5E-5 of the true maximum.
I intentionally chose the initial guess for this problem to be (x0,y0)=(-2,0), which is diametrically opposite the true value in the annular region. I wanted to see how the optimization algorithm traverses the annular region. I knew that every partial solution would be within the feasible region, but I suspected that it might be possible for the iterates to "jump across" the hole in the middle of the annulus. The following SAS statements visualize the iteration history of the Nelder-Mead method as it iterates towards an optimal solution:
data IterPath2; set IterPath; Iteration = ' '; if _N_ IN (1,5,6,7,9) then Iteration = put(_N_,2.); /* label certain step numbers */ format Iteration 2.; run; data Path; set Viz IterPath2; run; title "Objective Function and Solution Path"; title2 "Nelder-Mead Simplex Method"; proc sgplot data=Path noautolegend; xaxis grid; yaxis grid; scatter x=x y=y / markerattrs=(symbol=SquareFilled size=4) colorresponse=ObjFunc colormodel=ThreeColorRamp; polygon ID=R x=xx y=yy / lineattrs=(thickness=3); series x=x1 y=x2 / markers datalabel=Iteration datalabelattrs=(size=8) markerattrs=(size=5); gradlegend; run; |
For this initial guess and this optimization algorithm, the iteration path "jumps across" the hole in the annulus on the seventh iteration. For other initial guesses or methods, the iteration might follow a different path. For example, if you change the initial guess and use the quasi-Newton method, the optimization requires many more iterations and follows the "ridge" near the circle of radius 2, as shown below:
z0 = {-2 -0.1}; /* initial guess */
call NLPQN(rc,xOpt,"ObjFunc",z0,optn) nlc="NLConstr"; /* solve optimization */
If you have access to SAS/OR software, PROC OPTMODEL provides a syntax that is compact and readable. For this problem, you can use a single CONSTRAINT statement to specify both the inner and outer bounds of the annular constraints. The following call to PROC OPTMODEL uses an interior-point method to compute a solution that numerically equivalent to (x,y) = (-2, 0):
proc optmodel; var x, y; max f = cos(constant('pi')*sqrt(x^2+y^2)) - sqrt((x-2)^2+(y-0)^2); constraint 1 <= x^2 + y^2 <= 9; /* define constraint region */ x = -2; y = -0.1; /* initial guess */ solve with NLP / maxiter=25; print x y; quit; |
In summary, you can use SAS software to solve nonlinear optimization problems that are unconstrained, bound-constrained, linearly constrained, or nonlinearly constrained. This article shows how to solve a nonlinearly constrained problem by using SAS/IML or SAS/OR software. In SAS/IML, you have to specify the number of nonlinear constraints and write a function that can evaluate a parameter value and return a vector of positive numbers if the parameter is in the constrained region.
The post Optimization with nonlinear constraints in SAS appeared first on The DO Loop.
Even though I’ve worked at SAS for nearly 30 years, I still get excited when great things come together for our customers! This year we are hosting the very first hackathon at our Analytics Experience conference in San Diego - the AnalyticsX Hackathon. Analytics Experience is in its third year [...]
The post Announcing the AnalyticsX Hackathon: Are you up for it? appeared first on SAS Learning Post.
Data in the cloud makes it easily accessible, and can help businesses run more smoothly. SAS Viya runs its calculations on Cloud Analytics Service (CAS). David Shannon of Amadeus Software spoke at SAS Global Forum 2018 and presented his paper, Come On, Baby, Light my SAS Viya: Programming for CAS. (In addition to being an avid SAS user and partner, David must be an avid Doors fan.) This article summarizes David's overview of how to run SAS programs in SAS Viya and how to use CAS sessions and libraries.
If you're using SAS Viya, you're going to need to know the basics of CAS to be able to perform calculations and use SAS Viya to its potential. SAS 9 programs are compatible with SAS Viya, and will run as-is through the CAS engine.
Use a CAS statement to kick off a session, then use CAS libraries (caslibs) to store data and resources. To start the session, simply code "cas;
" Each CAS session is given its own unique identifier (UUID) that you can use to reconnect to the session.
There are a few significant codes that can help you to master CAS operations. Consider these examples, based on a CAS session that David labeled "speedyanalytics":
cas _all_ list;
cas speedyanalytics listabout;
cas speedyanalytics disconnect;
cas uuid="&speedyanalytics_uuid";
cas _all_ terminate;
CAS libraries (caslib) are the method to access data that is being stored in memory, as well as the related metadata.
From the library, you can load data into CAS tables in a couple of different ways:
The Proc CASUTIL allows you to save your tables (named "classsi" data in David's examples) for future use through the SAVE statement:
proc casutil; save casdata="classsi" casout="classsi"; run;
And reload like this in a future session, using the LOAD statement:
proc casutil; load casdata="classsi" casout="classsi"; run;
When accessing your CAS libraries, remember that there are multiple levels of scope that can apply. "Session" refers to data from just the current session, whereas "Global" allows you to reach data from all CAS sessions.
Showing how to put CAS into action, David shared this diagram of a typical load/save/share flow:
Existing SAS 9 programs and CAS code can both be run in SAS Viya. The calculations and data memory occurs through CAS, the Cloud Analytics Service. Before beginning, it's important to understand a general overview of CAS, to be able to access CAS libraries and your data. For more about CAS architecture, read this paper from CAS developer Jerry Pendergrass.
To close out his paper, David outlined a small experiment he ran to demonstrate performance advantages that can be seen by using SAS Viya v3.3 over a standard, stand-alone SAS v9.4 environment. The test was basic, but performed reads, writes, and analytics on a 5GB table. The tests revealed about a 50 percent increase in performance between CAS and SAS 9 (see the paper for a detailed table of comparison metrics). SAS Viya is engineered for distributive computing (which works especially well in cloud deployments), so more extensive tests could certainly reveal even further increases in performance in many use cases.
A quick introduction to CAS in SAS Viya was published on SAS Users.
In the oil industry you can make or lose money based on how good your forecasts are, so I’ve pulled together six papers that discuss different ways in which you can leverage analytics to optimize your output and more accurately predict your production performance. Written by employees at oil and [...]
6 real-world ways to use optimization and forecasting in the oil and gas industry was published on SAS Voices by David Pope