Tech

7月 212017
 

n% of observations from a data setSAS® offers several ways that you can find the top n% and bottom n% of data values based on a numeric variable. The RANK procedure with the GROUPS= option is one method. Another method is The UNIVARIATE procedure with the PCTLPTS= option. Because there are several ways to perform this task, you can choose the procedure that you are most familiar with. In this blog post, I use the SUMMARY procedure to generate the percentile values and macro logic to dynamically choose the desired percentile statistics. After the percentiles are generated, I subset the data set based on those values. This blog post provides two detailed examples: one calculates percentiles for a single variable and one calculates percentiles within a grouping variable.

Calculate Percentiles of a Single Variable

Calculating percentiles of a single variable includes the following steps. Within the macro, a PROC SUMMARY step calculates the percentiles. The subsequent DATA step uses CALL SYMPUTX to create macro variables for the percentile values, and the final DATA step uses those macro variables to subset the data. Here is the code, which is explained in detail below:

/* Create sample data */
data test;                   
   do i=1 to 10000;                                                     
      x=ranuni(i)*12345;                                         
      output;                                                         
   end; 
   drop i; 
run;     
 
proc sort data=test;
   by x;
run; 
 
%macro generate_percentiles(ptile1,ptile2); 
/* Output desired percentile values */                         
proc summary data=test;                                               
   var x;                                                       
   output out=test1 &ptile1= &ptile2= / autoname;                               
run;                                                                 
 
/* Create macro variables for the percentile values */     
data _null_;                                                         
   set test1;                                                         
   call symputx("&ptile1", x_&ptile1);                                     
   call symputx("&ptile2", x_&ptile2);                                     
run;    
%put &&&ptile1;
%put &&&ptile2; 
 
data test2;                                                             
   set test;                                                           
/* Use a WHERE statement to subset the data */                         
   where x le &&&ptile1 or x ge &&&ptile2;                                       
run;  
 
proc print;
run; 
 
%mend;
 
options mprint mlogic symbolgen;
%generate_percentiles(p1,p99)
%generate_percentiles(p25,p75)

After creating and sorting the sample data, I begin my macro definition with two parameters that enable me to substitute the desired percentiles in my macro invocation:

%macro generate_percentiles(ptile1,ptile2);

The PROC SUMMARY step writes the desired percentiles for variable X to the Test1 data set. The AUTONAME option names the percentile statistics in the following format, <varname>_<percentile> (for example, x_p25).

proc summary data=test;                                               
   var x;                                                       
   output out=test1 &amp;ptile1= &amp;ptile2= / autoname;                               
run;

Next, I want to store the values of the percentile statistics in macro variables so that I can use them in later processing. I use CALL SYMPUTX to do this, which gives the macro variables the same name as the statistic. To see the resulting values in the log, I use

data _null_;                                                         
   set test1;                                                         
   call symputx("&amp;ptile1", x_&amp;ptile1);                                     
   call symputx("&amp;ptile2", x_&amp;ptile2);                                     
run;    
%put &amp;&amp;&amp;ptile1;
%put &amp;&amp;&amp;ptile2;

The SAS log shows the following:

MLOGIC(GENERATE_PERCENTILES):  %PUT &amp;&amp;&amp;ptile1
SYMBOLGEN:  &amp;&amp; resolves to &amp;.
SYMBOLGEN:  Macro variable PTILE1 resolves to p1
SYMBOLGEN:  Macro variable P1 resolves to 123.22158288
123.22158288
MLOGIC(GENERATE_PERCENTILES):  %PUT &amp;&amp;&amp;ptile2
SYMBOLGEN:  &amp;&amp; resolves to &amp;.
SYMBOLGEN:  Macro variable PTILE2 resolves to p99
SYMBOLGEN:  Macro variable P99 resolves to 12232.136483
12232.136483

I use these macro variables in a WHERE statement within a DATA step to subset the data set based on the percentile values:

data test2;                                                             
   set test;                                                           
/* Use a WHERE statement to subset the data */                         
   where x le &amp;&amp;&amp;ptile1 or x ge &amp;&amp;&amp;ptile2;                                       
run;

Finally, the macro invocations below pass in the desired percentile statistics:

%generate_percentiles(p1,p99)
%generate_percentiles(p25,p75)

The percentile statistics that are available with PROC SUMMARY are included in the documentation for the

/* Create sample data */
data test; 
 do group='a','b'; 
   do i=1 to 10000;                                                     
      x=ranuni(i)*12345;                                         
      output;                                                         
   end; 
 end;
   drop i; 
run;     
 
proc sort data=test;
   by group x;
run; 
 
%macro generate_percentiles(ptile1,ptile2); 
/* Output desired percentile values by group */                         
proc summary data=test; 
   by group; 
   var x;                                                       
   output out=test1 &amp;ptile1= &amp;ptile2= / autoname;                               
run;                                                                 
 
/* Create macro variables for each value of the BY variable */
/* Create macro variables for the percentile values for each BY group */ 
/* Create a macro variable that is the count of the unique
values of the BY variable */ 
data _null_;   
  retain count 0; 
   set test1;   
   by group;
   if first.group then do;
    count+1;
    call symputx('val'||left(count),group); 
    call symputx("&amp;ptile1"||'_'||left(count), x_&amp;ptile1);                                     
    call symputx("&amp;ptile2"||'_'||left(count), x_&amp;ptile2);  
   end; 
  call symput('last',left(count));
run;    
%put _user_;
 
/* Loops through each value of the BY variable */ 
%do i=1 %to &amp;last;
 
data test&amp;i;                                                             
   set test;  
   where group="&amp;&amp;val&amp;i"; 
/* Use an IF statement to subset the data */
   if x le &amp;&amp;&amp;ptile1._&amp;i or x ge &amp;&amp;&amp;ptile2._&amp;i;                                       
run;  
 
proc print;
run; 
 
%end;
%mend;
 
options mprint mlogic symbolgen;
%generate_percentiles(p1,p99)

Calculating percentiles has many applications, including ranking data, finding outliers, and subsetting data. Using a procedure in Base SAS® that enables you to request percentile statistics along with the power of the macro language, you can dynamically generate desired values that can be used for further processing and analysis.

Selecting the top n% and bottom n% of observations from a data set was published on SAS Users.

7月 182017
 

In the digital world where billions of customers are making trillions of visits on a multi-channel marketing environment, big data has drawn researchers’ attention all over the world. Customers leave behind a huge trail of data volumes in digital channels. It is becoming an extremely difficult task finding the right data, given exploding data volumes, that can potentially help make the right decision.

This can be a big issue for brands. Traditional databases have not been efficient enough to capture the sheer amount of information or complexity of datasets we accumulate on the web, on social media and other places.

A leading consulting firm, for example, boasts of one of its client having 35 million customers and 9million unique visitors daily on their website, leaving a huge amount of shopper information data every second. Segmenting this large amount of data with the right tools to help target marketing activities is not readily available. To make matters more complicated, this data can be structured and unstructured, making traditional methods of analysing data not suitable.

Tackling market segmentation

Market segmentation is a process by which market researchers identify key attributes about customers and potential customers which can be used to create distinct target market groups. Without a market segmentation base, Advertising and Sales can lose large amount of money targeting the wrong set of customers.

Some known methods of segmenting consumers market include geographical segmentation, demographical segmentation, behavioural segmentation, multi-variable account segmentation and others. Common approaches using statistical methods to segment various markets include:

  • Clustering algorithms such as K-Means clustering
  • Statistical mixture models such as Latent Class Analysis
  • Ensemble approaches such as Random Forests

Most of these methods assume the number of clusters to be known, which in reality is never the case. There are several approaches to estimate the number of clusters. However, strong evidence about the quality of this clusters does not exist.

To add to the above issues, clusters could be domain specific, which means they are built to solve certain domain problems such as:

  • Delimitation of species of plants or animals in biology.
  • Medical classification of diseases.
  • Discovery and segmentation of settlements and periods in archaeology.
  • Image segmentation and object recognition.
  • Social stratification.
  • Market segmentation.
  • Efficient organization of data bases for search queries.

There are also quite general tasks for which clustering is applied in many subject areas:

  • Exploratory data analysis looking for “interesting patterns” without prescribing any specific interpretation, potentially creating new research questions and hypotheses.
  • Information reduction and structuring of sets of entities from any subject area for simplification, effective communication, or effective access/action such as complexity reduction for further data analysis, or classification systems.
  • Investigating the correspondence of a clustering in specific data with other groupings or characteristics, either hypothesized or derived from other data.

Depending on the application, it may differ a lot what is meant by a “cluster,” and cluster definition and methodology have to be adapted to the specific aim of clustering in the application of interest.

Van Mechelen et al. (1993) set out an objective characteristics of what a “true clusters” should possess which includes the following:

  • Within-cluster dissimilarities should be small.
  • Between-cluster dissimilarities should be large.
  • Clusters should be fitted well by certain homogeneous probability models such as the Gaussian or a uniform distribution on a convex set, or by linear, time series or spatial process models.
  • Members of a cluster should be well represented by its centroid.
  • The dissimilarity matrix of the data should be well represented by the clustering (i.e., by the ultrametric induced by a dendrogram, or by defining a binary metric “in same cluster/in different clusters”).
  • Clusters should be stable.
  • Clusters should correspond to connected areas in data space with high density.
  • The areas in data space corresponding to clusters should have certain characteristics (such as being convex or linear).
  • It should be possible to characterize the clusters using a small number of variables.
  • Clusters should correspond well to an externally given partition or values of one or more variables that were not used for computing the clustering.
  • Features should be approximately independent within clusters.
  • The number of clusters should be low.

Reference

  1. Van Mechelen, J. Hampton, R.S. Michalski, P. Theuns (1993). Categories and Concepts—Theoretical Views and Inductive Data Analysis, Academic Press, London

What are the characteristics of “true clusters?” was published on SAS Users.

7月 102017
 

In SAS Viya 3.2, SAS Visual Data Builder provides a mechanism for performing simple, self-service data preparation tasks for SAS Visual Analytics or other applications. SAS Visual Data Builder is NOT an Extract, Transform and Load (ETL) or data quality tool. You may still need one of those tools to perform more complex data preparation.

SAS Visual Data Builder can perform the following tasks:

  • View table and column profiles – provides information on number rows and columns on the table, as well as standard and advanced metrics for the columns.
  • Perform data transformations – includes items such as joining tables, transposing columns, creating calculated columns, filtering data and splitting columns.
  • Create plans – a plan is a collection of data transformations (actions) performed on one or more tables.  Plans can be saved and executed again.

SAS Visual Data Builder

To access SAS Visual Data Builder from SAS Home, select ≡ > SAS Visual Data Builder from the menu.
Note: The user must belong to the pre-defined custom user group Data Builders to have permission to access the application.

For SAS Visual Data Builder, the user can select their preferred default start screen in their application Settings.

The options are:

  • Show welcome dialog.
  • Start with data.
  • Start with new plan.
  • Choose existing plan.

With the SAS Viya 3.2 release, SAS Visual Data Builder is now a separate application from Visual Analytics (VA). There is not a one-to-one mapping of the feature set in SAS 9.4: VA 7.3 Data Preparation to SAS Viya 3.2: SAS Visual Data Builder.

For more information on SAS Visual Data Builder refer to the SAS Viya 3.2: Visual Data Builder was published on SAS Users.

7月 012017
 

As a technical consultant for SAS, I have the privilege of meeting with SAS customers, learning more about how they use our software, and then helping them solve their problems. Recently, a client of mine was having trouble finding a way to implement the use of multiple application servers in SAS Visual Analytics 7.3 based on what business unit a user is a part of. Guessing that some of our SAS Visual Analytics Administrators may have the same question, I thought I’d share how we solved his problem.

The outline below is the order of operations of how a server context is selected. The example provided in the screenshots is how you would control access so that user requests will only be sent through the appropriate business unit server.

Preliminary Requirements

  • The Server is registered with the job execution service.
  • The Server is visible to the requesting user.

Assuming that all servers are registered with the job execution service the following steps are how an application server would be selected:

Step 1: The server associated with the target LASR library will be used. If the server is not visible to the user, proceed to step 2.

Figure 1: In this example the server context associated with the target library is SASApp which the user is denied access to in the server context permissions

Step 2: The suite-level default server defined in the SAS Visual Analytics configuration properties at “va.defaultWorkspaceServer” will be used. If the server is not visible to the user, proceed to step 3.

Figure 2: In this example the configuration property is also set to SASApp which the user is denied access to in the server context permissions

Step 3: Lastly, use any server that is registered with the job execution service and visible to the requesting user.

Figure 3: In this example the BU server context would end up being chosen because it is the only context that the user has permission to access

Some additional Info to Keep in Mind

  • The preferences in the administrator and data builder tabs allow the forced used of a specific server by opting out of automatic selection. These preferences can be separately set and one does not affect the other.
  • In SAS Visual Analytics Explorer, the pooled workspace server is used to populate the available data to import field, and the workspace server is used to perform the actual importing.

Figure 4: The pooled workspace server is denied access so the import data area will not be populated

I hope you found this blog helpful. Please feel free to leave a question or comment below.

Using multiple server contexts in SAS Visual Analytics was published on SAS Users.

6月 302017
 

In this blog post I’d like to explore how to create a custom group in SAS Viya to restrict access to functionality. To illustrate my points, we will create a report developers custom group and ensure that only users of that group can create reports and analysis in SAS Visual Analytics.

What a user or group can do (and see) is controlled by rules. A rule is a composite of authorization elements including:

  • Principal: user or group.
  • Target: a resource for example a service, folder or report.
  • Permissions: type of access for example read or write.
  • Setting: indication of whether access is provided, for example grant or prohibit.

The target of a rule is identified using a uniform resource identifier (uri). The uri can represent a folder, content such as a report or data plan, or functionality and features such as being able to import data. Here are some examples of uri’s in SAS Viya:

  • Data Plan: /dataPreparationPlans/plans/810e2c6b-4733-4d53-94fd-dfeb4df0de9e
  • Folder: /folders/folders/e28e35af-2673-4fc7-81fa-1a074f4c0de9
  • Functionality: /SASVisualAnalytics/**

In our example, we will look at restricting access SAS Visual Analytics for a subset of users. In SAS 9.4 this would have been accomplished using roles and capabilities. In SAS Viya, we will:

  • Create a custom group.
  • Govern that groups access to functionality using rules.

Create a New Custom Group

In SAS Environment Manager, as an administrator (only administrators can manage users and groups) select Users > Custom Groups > New.

In the new custom group screen give the group a name, a unique id and a description. We will call our group Report Developers.

After the new group is created, click the edit button to add new members to the group. You can add users or other groups as members of the new group.

Change the Rules so that Only Report Developers can Access the SAS Visual Analytics Application

Now that we have a new group called Report Developers, the next step is to create or update the rule that determines who can access this functionality. First, we will look at what rules currently apply to SAS Visual Analytics.

In SAS Environment Manager select the Security menu item and select the Rules view.

Select Filter by: ObjectURI and enter SASVisualA in the search box.

The second rule listed is the one we are interested in. Notice that URI ends with /**.  URI’s can end with /* or  / **.  An objectUri that includes the /** suffix affects access to all descendant functionality. For example, the /SASVisualAnalytics/** means all functionality in the SAS Visual Analytics application.

Select /SASVisualAnalytics/** and click the Edit icon.  The attributes show that this rule determines who can use SAS Visual Analytics. Currently you’ll see:

  • Grants Read access
    • to /SASVisualAnalytics/** and all its descendent functionality
      • to all authenticated users.

The rule works because the general authorization system implicitly disallows any access that is not granted. The current rule overrides the implicit deny to allow authenticated users to access SAS Visual Analytics. We will edit the rule and change the principal from Authenticated Users to ReportDevelopers.

In the edit rule screen under Principal, select ReportDevelopers.

The impact of the change is that now only users who are members of the Report Developers group can access the Visual Analytics application to create reports.

To test this, you can logon as a user who is not a member of the group. Those users will be able to navigate to reports and open then using the report viewer, but they will not be able to access SAS Visual Analytics to create new reports.

That is a quick look at using custom groups and rules to dictate what users can do in SAS Viya. There is much more detail on these topics in the SAS® Viya 3.2 Administration Guide:

New rules for authorization in SAS Viya was published on SAS Users.

6月 292017
 

In my last blog, we examined the data pane in SAS Visual Analytics 8.1. That blog discussed how to have the data pane display the data items of your active data source, and how to perform tasks such as viewing measure details, changing data item properties, and creating geographic data items, hierarchies, and custom categories.  In this blog, we’ll look at creating new calculated data items and calculated aggregations.

If you recall, l you display the Data pane in the Visual Analytics interface by clicking the Data icon on the left menu.

A calculated data item is a new data item created from existing data by using an expression.

  • Calculations are performed on un-aggregated data—the expression is evaluated on each row before aggregations are calculated.
  • Calculated data items can accept parameters.
  • A hierarchy can contain calculated category data items.
  • Calculated data items can be changed to geography data items and used in geo maps.

You can create a derived calculation from a category or measure data item by right-clicking on the data item and selecting Create calculation from data item.

For a category data item, you can create a distinct count, count, or number missing. Creating a derived calculation from a category data item:

For a measure data item, you can create a percent of total, or a periodic calculation based on one of your date data items. Creating a derived calculation from a measure data item:

Notice that in both cases, the new data item is an aggregation, so the new item will appear under the Aggregated Measure category in the data pane.

Note:  In order to use the periodic calculation types, your selected data item must include the year.

You can also edit these new data items by right-clicking on the data item and selecting Edit. Editing a derived calculation:

There is now a single interface for creating calculated data items of type Numeric, Character, Date or Datetime or Aggregated measures.

  • This interface provides both Visual mode and Text mode for viewing and editing the expression.
  • You can drag and drop data items or parameters and operators onto the expression in either mode.
  • In text mode, you can also type in your expression.

Creating a calculated data item or aggregated measure:

Specifying the calculation result type and format:

Some notes for using operators in calculations and aggregations:

  • Operators are provided for both calculations and aggregations.
  • You can expand and collapse each category of operators.
  • If you add an Aggregated operator to an expression, the result type will be changed to Aggregated Measure.
  • You cannot have nested aggregations in an expression.You also have access to periodic operators and simple and advanced aggregated operators for calculation aggregations.

In the same interface, you have access to simple and advanced numeric operators, simple and advanced text operators, along with boolean, date and time, and comparison operators for your calculations.

You also have access to periodic operators and simple and advanced aggregated operators for calculation aggregations.

The most important point to remember in using this interface is to think ahead as to whether you are creating a calculation (operating on each row) or an aggregation (operating across rows) and specify the data type and format before you begin to drag and drop data items and operators.  The default data type is Numeric, but if you add an aggregation operator, the type will automatically switch to Aggregated Measure.

Remember that you also create calculated items of character, date, and datetime data types–and you can choose from a list of date and datetime formats for those data types.

The SAS Visual Analytics 8.1 Data Pane: Creating Calculations and Aggregations was published on SAS Users.

6月 212017
 

IT organizations today are constantly challenged to do more with less. Reusing data processing jobs and employing best practices in monitoring the health of your data are proven ways to improve the productivity of data professionals. Dataflux Data Management Studio is a component of both the SAS Data Quality and the SAS Data Management offerings that allows you to create data processing jobs to integrate, cleanse and monitor your data.

You can write global functions for SAS Data Management jobs that can be reused in any expression in the system, in either data or process flow jobs. Global functions can be called from expression nodes, monitor rules, profile filters, surviving record indicators, process flow if-nodes and more.

Global functions are defined in a text file and saved in the Data Management installation directory under “etc/udf” in Data Management Studio or Data Management Server respectively.

Each global function has to have a unique name and is wrapped with a function / end function block code, can process any number of input parameters and returns a single value of either integer, real, date, string or boolean type.

Hello World

For a start, let’s create a “hello world” function.

  • If it does not exist, create a folder in the installation directory under “etc/udf” (DM Studio and DM Server).
  • In “etc/udf” create a file named hello_world.txt.
  • In the hello_word file create the function as follows:
function hello_world return string
return “hello world!”
end function
  • Save the file and restart DM Studio, if necessary, in order to use hello_world().

The new function is fully integrated in Data Management Studio. You can see the new function in an expression node under Function->Other or as expression language help in the expression node editor.

Handling Parameters

Global functions can handle any number of parameters. Parameter helper functions are available to access input parameters inside a function:

  • paramatercount() returns the number parameters that have been passed into the function call.  This is helpful if the incoming parameters are unknown.
integer i
for i = 1 to parametercount() 
begin
   // process each parameter
end

 

  • parametertype(integer) returns the type of the parameter for the given parameter position. The first parameter is 1. The return value will either be integer, real, date, string or Boolean.
  • parameterstring(integer), parameterinteger(integer), parameterboolean(integer), parameterdate(integer), parameterreal(integer) these functions return the value of the parameter as specified by position, or null if the parameter doesn’t exist. You can use these functions if you know the incoming parameter type at a given position.
  • parameter(integer) returns the value of the parameter as specified by position, or null if the parameter doesn’t exist. If you don’t know the incoming parameter type you can use this function. Note: Using the parameter() function may require additional overhead to coerce the values to the field type. Using the specific data type parameter functions above will eliminate the cost of coercion.

Global Function Example

This global function will check if password rules are followed.

//////////////////////////////////////////////////////////////////////////
// Function:     check_psw_rule
// Inputs:       string
// Output:       boolean -&gt; true == passed check; false == failed check
// Description:  Check the rules for password. The rules are:
//               Need to be at least 8 characters long
//               Need to have at least one lower case character
//               Need to have at least one upper case character
//               Need to have at least one number
//////////////////////////////////////////////////////////////////////////
function check_psw_rule return boolean
	string check_str
	boolean rc
	regex r
 
	check_str= parameterstring(1)   //copy input parameter to variable
 
	rc= false                       //set default return value to failed (false)
	if(len(check_str) &lt; 8)          //check if at least 8 characters
		return rc
	r.compile("[a-z]")              
	if (!r.findfirst(check_str))    //check if at least one lower case character
		return rc
	r.compile("[A-Z]")
	if (!r.findfirst(check_str))    //check if at least one upper case character
		return rc
	r.compile("[0-9]")
	if (!r.findfirst(check_str))    //check if at least one number
		return rc
	rc= true                        //return true if all checks passed
	return rc
end function

 

This function can be called from any expression in a Data Management job:

boolean  check_result
check_result= check_psw_rule(password)

Global function can also call other global function

Just a few things to be aware of. There is a late binding process, which means if function B() wants to call function A(), then function A() needs to be loaded first. The files global functions are stored in are loaded alphabetically by file name. This means the file name containing function A() has to occurs alphabetically before file name containing function B().

Best Practices

Here are some best practice tips which will help you to be most successful writing global functions:

  1. Create one file per expression function.
    This allows for global functions to easily be deployed and shared.
  2. Use lots of comments.
    Describe what the function’s purpose, expected parameters, and outputs and improve the readability and reusability of your code
  3. Test the expressions in data jobs first.
    Write a global function body as an expression first and test it via preview. This way it is easier to find typos, syntax errors and to ensure that the code is doing what you would like it to do.
  4. Debugging - If the global function is not loading, check the platform_date.log.  For Studio, this could for example be found under: C:\Users\<your_id>\AppData\Roaming\DataFlux\DMStudio\studio1

You now have a taste of how to create reusable functions in Data Management Studio to help you both improve the quality of your data as well as improve the productivity of your data professionals. Good luck and please let us know what kind of jobs you are using to help your organization succeed.

Writing your own functions in SAS Data Quality using Dataflux Data Management Studio was published on SAS Users.

6月 212017
 

IT organizations today are constantly challenged to do more with less. Reusing data processing jobs and employing best practices in monitoring the health of your data are proven ways to improve the productivity of data professionals. Dataflux Data Management Studio is a component of both the SAS Data Quality and the SAS Data Management offerings that allows you to create data processing jobs to integrate, cleanse and monitor your data.

You can write global functions for SAS Data Management jobs that can be reused in any expression in the system, in either data or process flow jobs. Global functions can be called from expression nodes, monitor rules, profile filters, surviving record indicators, process flow if-nodes and more.

Global functions are defined in a text file and saved in the Data Management installation directory under “etc/udf” in Data Management Studio or Data Management Server respectively.

Each global function has to have a unique name and is wrapped with a function / end function block code, can process any number of input parameters and returns a single value of either integer, real, date, string or boolean type.

Hello World

For a start, let’s create a “hello world” function.

  • If it does not exist, create a folder in the installation directory under “etc/udf” (DM Studio and DM Server).
  • In “etc/udf” create a file named hello_world.txt.
  • In the hello_word file create the function as follows:
function hello_world return string
return “hello world!”
end function
  • Save the file and restart DM Studio, if necessary, in order to use hello_world().

The new function is fully integrated in Data Management Studio. You can see the new function in an expression node under Function->Other or as expression language help in the expression node editor.

Handling Parameters

Global functions can handle any number of parameters. Parameter helper functions are available to access input parameters inside a function:

  • paramatercount() returns the number parameters that have been passed into the function call.  This is helpful if the incoming parameters are unknown.
integer i
for i = 1 to parametercount() 
begin
   // process each parameter
end

 

  • parametertype(integer) returns the type of the parameter for the given parameter position. The first parameter is 1. The return value will either be integer, real, date, string or Boolean.
  • parameterstring(integer), parameterinteger(integer), parameterboolean(integer), parameterdate(integer), parameterreal(integer) these functions return the value of the parameter as specified by position, or null if the parameter doesn’t exist. You can use these functions if you know the incoming parameter type at a given position.
  • parameter(integer) returns the value of the parameter as specified by position, or null if the parameter doesn’t exist. If you don’t know the incoming parameter type you can use this function. Note: Using the parameter() function may require additional overhead to coerce the values to the field type. Using the specific data type parameter functions above will eliminate the cost of coercion.

Global Function Example

This global function will check if password rules are followed.

//////////////////////////////////////////////////////////////////////////
// Function:     check_psw_rule
// Inputs:       string
// Output:       boolean -&gt; true == passed check; false == failed check
// Description:  Check the rules for password. The rules are:
//               Need to be at least 8 characters long
//               Need to have at least one lower case character
//               Need to have at least one upper case character
//               Need to have at least one number
//////////////////////////////////////////////////////////////////////////
function check_psw_rule return boolean
	string check_str
	boolean rc
	regex r
 
	check_str= parameterstring(1)   //copy input parameter to variable
 
	rc= false                       //set default return value to failed (false)
	if(len(check_str) &lt; 8)          //check if at least 8 characters
		return rc
	r.compile("[a-z]")              
	if (!r.findfirst(check_str))    //check if at least one lower case character
		return rc
	r.compile("[A-Z]")
	if (!r.findfirst(check_str))    //check if at least one upper case character
		return rc
	r.compile("[0-9]")
	if (!r.findfirst(check_str))    //check if at least one number
		return rc
	rc= true                        //return true if all checks passed
	return rc
end function

 

This function can be called from any expression in a Data Management job:

boolean  check_result
check_result= check_psw_rule(password)

Global function can also call other global function

Just a few things to be aware of. There is a late binding process, which means if function B() wants to call function A(), then function A() needs to be loaded first. The files global functions are stored in are loaded alphabetically by file name. This means the file name containing function A() has to occurs alphabetically before file name containing function B().

Best Practices

Here are some best practice tips which will help you to be most successful writing global functions:

  1. Create one file per expression function.
    This allows for global functions to easily be deployed and shared.
  2. Use lots of comments.
    Describe what the function’s purpose, expected parameters, and outputs and improve the readability and reusability of your code
  3. Test the expressions in data jobs first.
    Write a global function body as an expression first and test it via preview. This way it is easier to find typos, syntax errors and to ensure that the code is doing what you would like it to do.
  4. Debugging - If the global function is not loading, check the platform_date.log.  For Studio, this could for example be found under: C:\Users\<your_id>\AppData\Roaming\DataFlux\DMStudio\studio1

You now have a taste of how to create reusable functions in Data Management Studio to help you both improve the quality of your data as well as improve the productivity of your data professionals. Good luck and please let us know what kind of jobs you are using to help your organization succeed.

Writing your own functions in SAS Data Quality using Dataflux Data Management Studio was published on SAS Users.

6月 162017
 

Using parameters within the macro facilityHave you ever written a macro and wondered if there was an easy way to pass values to the macro? You can by using macro parameters. Macro parameters enable you to pass values into the macro at macro invocation, and set default values for macro variables within the macro definition. In this blog post, I also discuss how you can pass in a varying number of parameter values.

There are two types of macro parameters: positional and keyword.

Positional Parameters

You can use positional parameters to assign values based on their position in the macro definition and at invocation. The order that you use to specify the values must match the order in which they are listed in the %MACRO statement. When specifying multiple positional parameters, use a comma to separate the parameters. If you do not pass a value to the macro when it is invoked, a null value is assigned to the macro variable specified in the %MACRO statement.

Here is an example:

%macro test(var1,var2,var3);                                                                                                            
 %put &=var1;                                                                                                                           
 %put &=var2;                                                                                                                           
 %put &=var3;                                                                                                                           
%mend test;                                                                                                                             
 
/** Each value corresponds to the position of each variable in the definition. **/ 
/** Here, I am passing numeric values.                                         **/                                                            
%test(1,2,3)                                                                                                                            
/** The first position matches with var1 and is given a null value.            **/                                                             
%test(,2,3)                                                                                                                             
/** I pass no values, so var1-var3 are created with null values.               **/                                                             
%test()                                                                                                                                 
/** The first value contains a comma, so I use %STR to mask the comma.         **/                                                             
/** Otherwise, I would receive an error similar to this: ERROR: More           **/
/** positional parameters found than defined.                                  **/                                                             
%test(%str(1,1.1),2,3)                                                                                                                  
/** Each value corresponds to the position of each variable in the definition. **/ 
/** Here, I am passing character values.                                       **/                                                            
%test(a,b,c) 
/** I gave the first (var1) and second (var2) positions a value of             **/
/** b and c, so var3 is left with a null value.                                **/                                                             
%test(b,c)

 

Here are the log results:

173  /** Each value corresponds to the position of each variable in the definition. **/
174  /** Here, I am passing numeric values.                                         **/
175  %test(1,2,3)
VAR1=1
VAR2=2
VAR3=3
176  /** The first position matches with var1 and is given a null value.            **/                                                             
177  %test(,2,3)
VAR1=
VAR2=2
VAR3=3
 
178  /** I pass no values, so var1-var3 are created with null values.               **/
179  %test()
VAR1=
VAR2=
VAR3=
180  /** The first value contains a comma, so I use %STR to mask the comma.         **/                                                             
181  /** Otherwise, I would receive an error similar to this: ERROR: More           **/
182  /** positional parameters found than defined.                                  **/                                                             
183  %test(%str(1,1.1),2,3)
VAR1=1,1.1
VAR2=2
VAR3=3
184  /** Each value corresponds to the position of each variable in the definition. **/
185  /** Here, I am passing character values.                                       **/
186  %test(a,b,c)
VAR1=a
VAR2=b
VAR3=c
187  /** I gave the first (var1) and second (var2) positions a value of             **/
188  /** b and c, so var3 is left with a null value.                               **/
189  %test(b,c)
VAR1=b
VAR2=c
VAR3=

 

Keyword Parameters

The benefit of using keyword parameters is the ability to give the macro variables a default value within the macro definition. When you assign values using keyword parameters, you must include an equal sign after the macro variable name.

Here is an example:

%macro test(color=blue,id=123);                                                                                                         
 %put &=color;                                                                                                                          
 %put &=id;                                                                                                                             
%mend test;                                                                                                                             
 
/** Values passed to the macro overwrite default values from the definition. **/                                                                 
%test(color=red,id=456)                                                                                                                 
/** Passing in no values allows the default values to take precedence.      **/                                                                 
%test()                                                                                                                                 
/** You are not required to pass in a value for each keyword parameter.    **/                                                                 
%test(color=green)                                                                                                                      
/** The order of variables does not matter.                               **/                                                                                                 
%test(id=789,color=yellow)

 

Here are the log results:

270  /** Values passed to the macro overwrite default values from the definition. **/
271  %test(color=red,id=456)
COLOR=red
ID=456
272  /** Passing in no values allows the default values to take precedence.     **/
273  %test()
COLOR=blue
ID=123
274  /** You are not required to pass in a value for each keyword parameter.   **/
275  %test(color=green)
COLOR=green
ID=123
276  /** The order of variables does not matter.                              **/
277  %test(id=789,color=yellow)
COLOR=yellow
ID=789

 

If the macro definition combines positional and keyword parameters, positional parameters must come first. If you do not follow this order, this error is generated:

ERROR: All positional parameters must precede keyword parameters.

 

Here is an example:

%macro test(val,color=blue,id=123);                                                                                                     
 %put &=color;                                                                                                                          
 %put &=id;                                                                                                                             
 %put &=val;                                                                                                                            
%mend test;                                                                                                                             
 
/** The positional parameter is listed first. **/                                                                 
%test(1,color=red,id=456)
 
Here are the log results:
 
318  /** The positional parameter is listed first. **/                                                                 319  %test(1,color=red,id=456)
COLOR=red
ID=456
VAL=1

 

PARMBUFF

The PARMBUFF option creates a macro variable called &SYSPBUFF that contains the entire list of parameter values, including the parentheses. This enables you to pass in a varying number of parameter values. In the following example, you can pass any number of parameter values to the macro. This following example illustrates how to parse each word in the parameter list:

%macro makes/parmbuff; 
  /** The COUNTW function counts the number of words within &SYSPBUFF.            **/                                                                                                                 
   %let cnt=%sysfunc(countw(&syspbuff)); 
  /** The %DO loop increments based on the number of words returned to the macro. **/
  /** variable &CNT.                                                              **/                                
   %do i= 1 %to &cnt;  
  /** The %SCAN function extracts each word from &SYSPBUFF.                      **/                                                                                                                  
     %let make=%scan(&syspbuff,&i);                                                                                                     
     %put &make;                                                                                                                        
   %end;                                                                                                                                
%mend makes;                                                                                                                            
 
%makes(toyota,ford,chevy)

 

Here are the log results:

19  %macro makes/parmbuff;
20    /** The COUNTW function counts the number of words within &SYSPBUFF.            **/
21     %let cnt=%sysfunc(countw(&syspbuff));
22    /** The %DO loop increments based on the number of words returned to the macro  **/
23    /** variable &CNT.                                                              **/
24     %do i= 1 %to &cnt;
25    /** The %SCAN function extracts each word from &SYSPBUFF.                       **/
26       %let make=%scan(&syspbuff,&i);
27       %put &make;
28     %end;
29  %mend makes;
30
31  %makes(toyota,ford,chevy)
toyota
ford
chevy

 

When you specify the PARMBUFF option and the macro definition includes both positional and keyword parameters, the parameters still receive values when you invoke the macro. In this scenario, the entire invocation list of values is assigned to &SYSPBUFF. Here is an example:

%macro test(b,a=300)/parmbuff;                                                                                                      
 %put &=syspbuff;                                                                                                                        
 %put _local_;                                                                                                                          
%mend;                                                                                                                                  
 
%test(200,a=100)

 

Here are the log results:

SYSPBUFF=(200,a=100)
TEST A 100
TEST B 200

 

Notice that &SYSPBUFF includes the entire parameter list (including the parentheses), but each individual parameter still receives its own value.

If you need to know all the parameter values that are passed to the macro, specify the PARMBUFF option in the macro definition to get access to &SYSPBUFF, which contains all the parameter values. For more information about PARMBUFF, see %MACRO Statement in SAS® 9.4 Macro Language: Reference, Fifth Edition.

I hope this blog post has helped you understand how to pass values to a macro. If you have SAS macro questions that you would like me to cover in future blog posts, please comment below.

Using parameters within the macro facility was published on SAS Users.

6月 142017
 

In SAS Viya 3.2, the Self-Service Import provides a mechanism for a user to import (copy) data into the SAS Cloud Analytic Services (CAS) environment. The data is copied as a .sashdat file into the selected CAS Library location when it is imported.  Self-Service Import data can only be imported into CAS libraries of type PATH, HDFS, or DNFS.

The Self-Service Import functionality is available in the following applications:

  • SAS Visual Data Builder
  • SAS Visual Analytics
  • SAS Environment Manager – Data

To have access to Self-Service Import, the end user must be granted the Read permission on the /casManagement_capabilities/importData object URI in the Security ⇨ Rules area of SAS Environment Manager.

Self-Service Import supports importing data to CAS from Local, Server, and Social Media sources.

Self-Service Import in SAS Viya

SAS Viya 3.2: Self-Service Import

Local

Local file data can be imported from Microsoft Excel (.XLSX or .XLS), text file (.TXT or .CSV), the clipboard, or a SAS Data Set (SASHDAT or SAS7BDAT). The file(s) must exist on a file system available to your PC.

Server

After providing the appropriate server connection information, a table from LASR or select database types can be imported. The currently supported database types are:  Oracle, Teradata, Hadoop, PostgreSQL, and Impala. The Server selections displayed are dependent on your licensing and configuration.

Social Media

After authentication with the social media provider (Twitter, Facebook, Google Analytics, or YouTube), data can be imported through the social media provider’s public API. Access to these APIs is subject to the social media provider’s applicable license terms, terms of use, and other usage terms and policies.

Currently, there is a size limit for file imports that is set on the CAS Management service Configuration screen in SAS Environment Manager. The default size is 4GB. The local file importer has a 4GB limit because that is what the smallest size limit browser (Internet Explorer) is restricted to; however, Chrome and other browsers will allow larger file sizes, which is why there is a property that allows an Admin to set a higher limit. A modification to the max-file-size property requires a restart of the casManagement service.

Social Media and DBMS importers have no explicit limits. However, there is a limitation of the disk size of where casManagement is running because the uploaded file gets written to a temporary file on the server relative to casManagement.

For more information refer to the Self-Service Import section of the The Self-Service Import in SAS Viya 3.2 was published on SAS Users.