8月 282020
 

Remember back to your early school days, singing with all your classmates “If you’re happy and you know it clap your hands!” and then we’d all clap our hands. Being happy back then was so simple. Today, it’s hard to get away from all the negative headlines of 2020! It’s [...]

Analyzing happiness data in 2020 was published on SAS Voices by Melanie Carey

8月 272020
 

Decision trees are a fundamental machine learning technique that every data scientist should know. Luckily, the construction and implementation of decision trees in SAS is straightforward and easy to produce.

There are simply three sections to review for the development of decision trees:

  1. Data
  2. Tree development
  3. Model evaluation

Data

The data that we will use for this example is found in the fantastic UCI Machine Learning Repository. The data set is titled “Bank Marketing Dataset,” and it can be found at: http://archive.ics.uci.edu/ml/datasets/Bank+Marketing#

This data set represents a direct marketing campaign (phone calls) conducted by a Portuguese banking institution. The goal of the direct marketing campaign was to have customers subscribe to a term deposit product. The data set consists of 15 independent variables that represent customer attributes (age, job, marital status, education, etc.) and marketing campaign attributes (month, day of week, number of marketing campaigns, etc.).

The target variable in the data set is represented as “y.” This variable is a binary indicator of whether the phone solicitation resulted in a sale of a term deposit product (“yes”) or did not result in a sale (“no”). For our purposes, we will recode this variable and label it as “TARGET,” and the binary outcomes will be 1 for “yes” and 0 for “no.”

The data set is randomly split into two data sets at a 70/30 ratio. The larger data set will be labeled “bank_train” and the smaller data set will be labeled “bank_test”. The decision tree will be developed on the bank_train data set. Once the decision tree has been developed, we will apply the model to the holdout bank_test data set.

Tree development

The code below specifies how to build a decision tree in SAS. The data set mydata.bank_train is used to develop the decision tree. The output code file will enable us to apply the model to our unseen bank_test data set.

ODS GRAPHICS ON;
 
PROC HPSPLIT DATA=mydata.bank_train;
 
    CLASS TARGET _CHARACTER_;
 
    MODEL TARGET(EVENT='1') = _NUMERIC_ _CHARACTER_;
 
    PRUNE costcomplexity;
 
    PARTITION FRACTION(VALIDATE=<strong>0.3</strong> SEED=<strong>42</strong>);
 
    CODE FILE='C:/Users/James Gearheart/Desktop/SAS Book Stuff/Data/bank_tree.sas';
 
    OUTPUT OUT = SCORED;
 
run;

The output of the decision tree algorithm is a new column labeled “P_TARGET1”. This column shows the probability of a positive outcome for each observation. The output also contains the standard tree diagram that demonstrates the model split points.

Model evaluation

Once you have developed your model, you will need to evaluate it to see whether it meets the needs of the project. In this example, we want to make sure that the model adequately predicts which observation will lead to a sale.

The first step is to apply the model to the holdout bank_test data set.

DATA test_scored;
 
    SET MYDATA.bank_test;
 
    %INCLUDE 'C:/Users/James Gearheart/Desktop/SAS Book Stuff/Data/bank_tree.sas';
 
RUN;

The %INCLUDE statement applied the decision tree algorithm to the bank_test data set and created the P_TARGET1 column for the bank_test data set.

Now that the model has been applied to the bank_test data set, we will need to evaluate the performance of the model by creating a lift table. Lift tables provide additional information that has been summarized in the ROC chart. Remember that every point along the ROC chart is a probability threshold. The lift table provides detailed information for every point along the ROC curve.

The model evaluation macro that we will use was developed by Wensui Liu. This easy-to-use macro is labeled “separation” and can be applied to any binary classification model output to evaluate the model results.

You can find this macro in my GitHub repository for my new book, End-to-End Data Science with SAS®. This GitHub repository contains all of the code demonstrated in the book along with all of the macros that were used in the book.

This macro on my C drive, and we call it with a %INCLUDE statement.

%INCLUDE 'C:/Users/James Gearheart/Desktop/SAS Book Stuff/Projects/separation.sas';
 
%<em>separation</em>(data = test_scored, score = P_TARGET1, y = target);

The score script that was generated from the CODE FILE statement in the PROC HPSPLIT procedure is applied to the holdout bank_test data set through the use of the %INCLUDE statement.

The table below is generated from the lift table macro.

This table shows that that model adequately separated the positive and negative observations. If we examine the top two rows of data in the table, we can see that the cumulative bad percent for the top 20% of observations is 47.03%. This can be interpreted as we can identify 47.03% of positive cases by selecting the top 20% of the population. This selection is made by selecting observations with a P_TARGET1 score greater than or equal to 0.8276 as defined by the MAX SCORE column.

Additional information about decision trees along with several other model designs are reviewed in detail in my new book End-to-End Data Science with SAS® available at Amazon and SAS.com.

Build a decision tree in SAS was published on SAS Users.

8月 262020
 

Fraud, waste and abuse (FWA) ravages the US health care system. Estimates from the National Health Care Anti-Fraud Association show fraud costs health care organizations $70 billion to $230 billion each year. The precise figure is unknowable because only 3 to 10% of this fraud is ever detected. With more [...]

Catch a fraudster: Finding the needle in the haystack with AI was published on SAS Voices by Alyssa Farrell

8月 262020
 

In the paper "Tips and Techniques for Using the Random-Number Generators in SAS" (Sarle and Wicklin, 2018), I discussed an example that uses the new STREAMREWIND subroutine in Base SAS 9.4M5. As its name implies, the STREAMREWIND subroutine rewinds a random number stream, essentially resetting the stream to the beginning. I struggled to create a compelling example for the STREAMREWIND routine because using the subroutine "results in dependent streams of numbers" and because "it is usually not necessary in simulation studies" (p. 12). Regarding an application, I asserted that the subroutine "is convenient for testing."

But recently I was thinking about two-factor authentication and realized that I could use the STREAMREWIND subroutine to emulate generating a random token that changes every 30 seconds. I think it is a cool example, and it gives me the opportunity to revisit some of the newer features of random-number generation in SAS, including new generators and random number keys.

A brief overview of two-factor authentication

I am not an expert on two-factor authentication (TFA), but I use it to access my work computer, my bank accounts, and other sensitive accounts. The main idea behind TFA is that before you can access a secure account, you must authenticate yourself in two ways:

  • Provide a valid username and password.
  • Provide information that depends on a physical device that you own and that you have previously registered.

Most people use a smartphone as the physical device, but it can also be a PC or laptop. If you do an internet search for "two factor authentication tokens," you can find many images like the one on the right. This is the display from a software program that runs on a PC, laptop, or phone. The "Credential ID" field is a long string that is unique to each device. (For simplicity, I've replaced the long string with "12345.") The "Security Code" field displays a pseudorandom number that changes every 30 seconds. The Security Code depends on the device and on the time of day (within a 30-second interval). In the image, you can see a small clock and the number 28, which indicates that the Security Code will be valid for another 28 seconds before a new number is generated.

After you provide a valid username and password, the account challenges you to type in the current Security Code for your registered device. When you submit the Security Code, the remote server checks whether the code is valid for your device and for the current time of day. If so, you can access your account.

Two-factor random number streams

I love the fact that the Security Code is pseudorandom and yet verifiable. And it occurred to me that I can use the main idea of TFA to demonstrate some of the newer features in the SAS random number generators (RNGs).

Long-time SAS programmers know that each stream is determined by a random number seed. But a newer feature is that you can also set a "key" for a random number stream. For several of the new RNGs, streams that have the same seed but different keys are independent. You can use this fact to emulate the TFA app:

  • The Credential ID (which is unique to each device) is the "seed" for an RNG.
  • The time of day is the "key" for an RNG. Because the Security Code must be valid for 30 seconds, round the time to the nearest 30-second boundary.
  • Usually each call to the RAND function advances the state of the RNG so that the next call to RAND produces a new pseudorandom number. For this application, we want to get the same number for any call within a 30-second period. One way to do this is to reset the random number stream before each call so that RAND always returns the FIRST number in the stream for the (seed, time) combination.

Using a key to change a random-number stream

Before worrying about using the time of day as the key value, let's look at a simpler program that returns the first pseudorandom number from independent streams that have the same seed but different key values. I will use PROC FCMP to write a function that can be called from the SAS DATA step. Within the DATA step, I set the seed value and use the "Threefry 2" (TF2) RNG. I then call the Rnd6Int function for six different key values.

proc fcmp outlib=work.TFAFunc.Access;
   /* this function sets the key of a random-numbers stream and 
      returns the first 6-digit pseudorandom number in that stream */
   function Rnd6Int(Key);
      call stream(Key);               /* set the Key for the stream */
      call streamrewind(Key);         /* rewind stream with this Key */
      x = rand("Integer", 0, 999999); /* first 6-digit random number in stream */
      return( x );
   endsub;
quit;
 
options cmplib=(work.TFAFunc);       /* DATA step looks here for unresolved functions */
data Test;
DeviceID = 12345;                    /* ID for some device */
call streaminit('TF2', DeviceID);    /* set RNG and seed (once per data step) */
do Key = 1 to 6;
   SecCode = Rnd6Int(Key);           /* get random number from seed and key values */
   /* Call the function again. Should produce the same value b/c of STREAMREWIND */
   SecCodeDup = Rnd6Int(Key);  
   output;
end;
keep DeviceID Key SecCode:;
format SecCode SecCodeDup Z6.;
run;
 
proc print data=Test noobs; run;

Each key generates a different pseudorandom six-digit integer. Notice that the program calls the Rnd6Int function twice for each seed value. The function returns the same number each time because the random number stream for the (seed, key) combination gets reset by the STREAMREWIND call during each call. Without the STREAMREWIND call, the function would return a different value for each call.

Using a time value as a key

With a slight modification, the program in the previous section can be made to emulate the program/app that generates a new TFA token every 30 seconds. However, so that we don't have to wait so long, the following program sets the time interval (the DT macro) to 10 seconds instead of 30. Instead of talking about a 30-second interval or a 10-second interval, I will use the term "DT-second interval," where DT can be any time interval.

The program below gets the "key" by looking at the current datetime value and rounding it to the nearest DT-second interval. This value (the RefTime variable) is sent to the Rnd6Int function to generate a pseudorandom Security Code. To demonstrate that the program generates a new Security Code every DT seconds, I call the Rnd6Int function 10 times, waiting 3 seconds between each call. The results are printed below:

%let DT = 10;                  /* change the Security Code every DT seconds */
 
/* The following DATA step takes 30 seconds to run because it
   performs 10 iterations and waits 3 secs between iterations */
data TFA_Device;
keep DeviceID Time SecCode;
DeviceID = 12345;
call streaminit('TF2', DeviceID);   /* set the RNG and seed */
do i = 1 to 10;
   t = datetime();                  /* get the current time */
   /* round to the nearest DT seconds and save the "reference time" */
   RefTime = round(t, &DT); 
   SecCode = Rnd6Int(RefTime);      /* get a random Security Code */
   Time = timepart(t);              /* output only the time */
   call sleep(3, 1);                /* delay 3 seconds; unit=1 sec */
   output;
end;
format Time TIME10. SecCode Z6.;
run;
 
proc print data=TFA_Device noobs; 
   var DeviceId Time SecCode;
run;

The output shows that the program generated three different Security Codes. Each code is constant for a DT-second period (here, DT=10) and then changes to a new value. For example, when the seconds are in the interval [05, 15), the Security Code has the same value. The Security Code is also constant when the seconds are in the interval [15, 25) and so forth. A program like this emulates the behavior of an app that generates a new pseudorandom Security Code every DT seconds.

Different seeds for different devices

For TFA, every device has a unique Device ID. Because the Device ID is used to set the random number seed, the pseudorandom numbers that are generated on one device will be different than the numbers generated on another device. The following program uses the Device ID as the seed value for the RNG and the time of day for the key value. I wrapped a macro around the program and called it for three hypothetical values of the Device ID.

%macro GenerateCode(ID, DT);
data GenCode;
   keep DeviceID Time SecCode;
   format DeviceID 10. Time TIME10. SecCode Z6.;
   DeviceID = &ID;
   call streaminit('TF2', DeviceID); /* set the seed from the device */
   t = datetime();                   /* look at the current time */
   /* round to the nearest DT seconds and save the "reference time" */
   RefTime = round(t, &DT);          /* round to nearest DT seconds */
   SecCode = Rnd6Int(RefTime);       /* get a random Security Code */
   Time = timepart(t);               /* output only the time */
run;
 
proc print data=GenCode noobs; run;
%mend;
 
/* each device has a unique ID */
%GenerateCode(12345, 30);
%GenerateCode(24680, 30);
%GenerateCode(97531, 30);

As expected, the program produces different Security Codes for different Device IDs, even though the time (key) value is the same.

Summary

In summary, you can use features of the SAS random number generators in SAS 9.4M5 to emulate the behavior of a TFA token generator. The SAS program in this article uses the Device ID as the "seed" and the time of day as a "key" to select an independent stream. (Round the time into a certain time interval.) For this application, you don't want the RAND function to advance the state of the RNG, so you can use the STREAMREWIND call to rewind the stream before each call. In this way, you can generate a pseudorandom Security Code that depends on the device and is valid for a certain length of time.

The post Rewinding random number streams: An application appeared first on The DO Loop.

8月 252020
 

Analytics is playing an increasingly strategic role in the ongoing digital transformation of organizations today. However, to succeed and scale your digital transformation efforts, it is critical to enable analytics skills at all tiers of your organization. In a recent blog post covering 4 principles of analytics you cannot ignore, SAS COO Oliver Schabenberger articulated the importance of democratizing analytics. By scaling your analytics efforts beyond traditional data science teams and involving more people with strong business domain knowledge, you can gain more valuable insights and make more significant impacts.

SAS Viya was built from the ground up to fulfill this vision of democratizing analytics. At SAS, we believe analytics should be accessible to everyone. While SAS Viya offers tremendous support and will continue to be the tool of choice for many advanced users and programmers, it is also highly accessible for business analysts and insights team who prefer a more visual approach to analytics and insights discovery.

Self-service data management

First of all, SAS Viya makes it easy for anyone to ingest and prepare data without a single line of code. The integrated data preparation components within SAS Viya support ad-hoc, agile-oriented data management tasks where you can profile, cleanse, and join data easily and rapidly.

Automatically Generated Data Profiling Report

You can execute complex joins, create custom columns, and cleanse your data via a completely drag-and-drop interface. The automation built into SAS Viya eases the often tedious task of data profiling and data cleansing via automated data type identification and transform suggestions. In an area that can be both complex and intimidating, SAS Viya makes data management tasks easy and approachable, helping you to analyze more data and uncover more insights.

Data Join Using a Visual Interface

A visual approach supporting low-code and no-code programming

Speaking of no-code, SAS Viya’s visual approach and support extend deep into data exploration and advanced modeling. Not only can you quickly build charts such as histograms and box plots using a drag and drop interface, but you can also build complex machine learning models using algorithms such as decision trees and logistic regression on the same visual canvas.

Building a Decision Tree Model Using SAS Viya

By putting the appropriate guard rails and providing relevant and context-rich help for the user, SAS Viya empowers users to undertake data analysis using other advanced analytics techniques such as forecasting and correlation analysis. These techniques empower users to ask more complex questions and can potentially help uncover more actionable and valuable insights.

Correlation Analysis Using the Correlation Matrix within SAS Viya

Augmented analytics

Augmented analytics is an emerging area of analytics that leverages machine learning to streamline and automate the process of doing analytics and building machine learning models. SAS Viya leverages augmented analytics throughout the platform to automate various tasks. My favorite use of augmented analytics in SAS Viya, though, is the hyperparameters autotuning feature.

In machine learning, hyperparameters are parameters that you need to set before the learning processing can begin. They are only used during the training process and contribute significantly to the model training process. It can often be challenging to set the optimal hyperparameter settings, especially if you are not an experienced modeler. This is where SAS Viya can help by making building machine learning models easier for everyone one hyperparameter at a time.

Here is an example of using the SAS Viya autotuning feature to improve my decision tree model. Using the autotuning window, all I needed to do was tell SAS Viya how long I want the autotuning process to run for. It will then work its magic and determine the best hyperparameters to use, which, in this case, include the Maximum tree level and the number of Predictor bins. In most cases, you get a better model after coming back from getting a glass of water!

Hyperparameters Autotuning in SAS Viya

Under the hood, SAS Viya uses complex optimization techniques to try to find the best hyperparameter combinations to use all without you having to understand how it manages this impressive feat. I should add that hyperparameters autotuning is supported with many other algorithms in SAS Viya, and you have even more autotuning options when using it via the programmatic interface!

By leveraging a visually oriented framework and augmented analytics capabilities, SAS Viya is making analytics easier and machine learning models more accessible for everyone within an organization. For more on how SAS Viya enables everyone to ask more complex questions and uncover more valuable insights, check out my book Smart Data Discovery Using SAS® Viya®.

Analytics for everyone with SAS Viya was published on SAS Users.

8月 242020
 

I got a lot of feedback about my recent article about how to find roots of nonlinear functions by using the SOLVE function in PROC FCMP. A colleague asked how the FCMP procedure stores the functions. Specifically, why the OUTLIB= option on the PROC FCMP statement use a three-level syntax: OUTLIB=libref.DataSetName.PackageName. The three levels are a libref, a data set name, and a package name. The documentation is terse about what the third level (the package name) is used for and why it is important. This article describes how the FCMP-defined functions are stored, and how you can use the package name to call different versions of a function.

This article is my attempt to "reverse engineer" how PROC FCMP stores functions based on what I have read and observed. In addition to the FCMP documentation, I recommend reading Secosky (2007) and Eberhardt (2009). Feel free to add your own knowledge in the comments.

How FCMP functions are stored

I started writing about the capabilities of the FCMP procedure in 2012, but the procedure itself goes back to SAS 9.2. Modern versions of SAS store functions in an analytic store (which is read by using PROC ASTORE) or in an item store (which is read by using PROC PLM). But these binary storage formats had not yet been developed back in the pre-9.2 days. So PROC FCMP stores functions in a SAS data set. That means you can use PROC PRINT to investigate how PROC FCMP stores functions.

When you use the OUTLIB= option in PROC FCMP, you specify a three-level name: OUTLIB=libref.DataSetName.PackageName. The first two levels specify the name of a SAS data set. This data set is created if it doesn't exist, or it is modified if it already exists. The third level is used as a text field in a variable named _KEY_, which enables one data set to contain functions that belong to different packages. The package name becomes important if two packages define a function that has the same name.

To demonstrate, let's define some functions and store them in a data set named Work.MyFuncs. The following statements create two functions (A and B) that belong to the 'PROD' (for 'Production') package and one function (A) that belongs to the 'DEV' (for 'Development') package. Notice that both packages have a function named 'A'. The following statements define the functions and use PROC PRINT to display a portion of the Work.MyFuncs data set:

/* Store all functions in the data set WORK.MyFuncs */
/* Define functions in 'PROD' package */
proc fcmp outlib=work.MyFuncs.Prod;
   function A(x);
      return( x );      /* in the 'Prod' pkg, A(x) equals x */
   endsub;
   function B(x);
      return( x > 0 );
   endsub;
quit;
 
/* Define functions in 'DEV' package */
proc fcmp outlib=work.MyFuncs.Dev;
   function A(x);
      return( 2*x );    /* the 'Dev' pkg uses a different definition for A(x) */
   endsub;
quit;
 
proc print data=work.MyFuncs;
   var _Key_ Sequence Type Subtype Name;
run;

The output from PROC PRINT is shown. The data set contains 20 rows. I have put a red rectangle around rows 1–13 and another around rows 14–20. Each rectangle defines a package. The names of the packages are defined by the observations where Subtype='Package', which are highlighted in yellow. The Type, Subtype, and Name columns indicate how the FCMP statements that define the functions are stored in the data set. The _KEY_ column identifies which rows define which functions. There are other columns (not shown) that store the actual content of each function.

This output shows how the third level of the OUTLIB= option is used. The _KEY_ column records the package name and appends each function name ('A' or 'B') to the name of the package. So PROC FCMP knows that there are three stored functions whose full names are PROD.A, PROD.B, and DEV.A.

Calling a function from the DATA step

Since there are two functions called 'A', what happens if I call 'A' from a SAS DATA step? The answer is that the DATA step uses the most recent definition, which in this example is DEV.A. To alert you to the fact that calling 'A' is ambiguous, the SAS log displays a warning. You can also use the _DISPLAYLOC_ flag on the CMPLIB= system option to display the origin of each call to an FCMP function, as follows:

/* Tell the DATA step where to look for unresolved functions.
   The _DISPLAYLOC_ flag shows the full name for each call to an FCMP function */
options cmplib=(work.MyFuncs _DISPLAYLOC_); 
data Want;
   x = 1; y = A(x);   /* y is the result of the latest definition */
run;
 
proc print data=Want noobs; run;
WARNING: Function 'A' was defined in a previous package. 'A' in current
         package DEV will be used as default when the package name is not
         specified.
 
NOTE: Function 'A' loaded from work.MyFuncs.DEV.

The value of the X and Y variables make it clear that the function DEV.A was called (because A(x)=2*x in that definition). The WARNING and NOTE in the SAS log reinforce this fact.

Choosing which package to call

The WARNING in the previous section says that the current (most recent) package "will be used as default when the package name is not specified." This message seems to imply that you can somehow call PROD.A, which is the other stored function that is named 'A'. This is, in fact, true. The PROC FCMP documentation states, "to select a specific subroutine when there is ambiguity, use the package name and a period as the prefix to the subroutine name."

You cannot specify the package name directly in the DATA step, but you can specify the package name in an FCMP function. So, for example, you can define a function called 'ChooseA' that includes a flag that indicates which package to use. The following PROC FCMP statements define a function that will call either PROD.A or DEV.A, depending on the value of a parameter. This wrapper function can then be called in the DATA step:

/* In PROC FCMP, you can "dis-ambiguate" by using a two-level function name */
proc fcmp outlib=work.MyFuncs.Choose;
   function ChooseA(x, choice $);
      if upcase(choice)="DEV" then
        return( Dev.A(x) );
      else
        return( Prod.A(x) );
   endsub;
quit;
 
data WantChoice;
   x = 1;
   y_Dev  = ChooseA(x, "Dev");   /* call Dev.A */
   y_Prod = ChooseA(x, "Prod");  /* call Prod.A */
run;
 
proc print data=WantChoice noobs; run;

From the definitions of DEV.A and PROD.A, you can verify that each function was called correctly. Because the _DISPLAYLOC_ option is still active, the SAS log also indicates that each function was called.

Summary

This article was motivated by a question about how the FCMP procedure stores functions. The answer is that the OUTLIB= option on the PROC FCMP statement requires a libref, a data set name, and a package name. In most circumstances, you do not need to use the package name. The package name becomes important only if two different packages each support a function that has the same name. In that case, you can use the package name to disambiguate the function call.

Personally, I prefer to avoid having two packages that define the same function, but if you cannot avoid it, this trick shows you how to handle it. Eberhardt (2009, p. 15) discusses a related issue, which is how to call functions that are stored in two (or more) different data sets.

The post How does PROC FCMP store functions? appeared first on The DO Loop.

8月 212020
 

SAS Viya is an open analytics platform accessible from interfaces or various coding languages. REST API is one of the widely used interfaces. Multiple resources exist on how to access SAS Visual Analytics reports using SAS Viya REST API. For example Programmatically listing data sources in SAS Visual Analytics by my colleague Michael Drutar. His post shows how to list the data sources of VA reports. Also, in Using SAS Viya REST APIs to access images from SAS Visual Analytics, Joe Furbee demonstrates how to retrieve report images. In this post, I am going to show you how to get the path for SAS Visual Analytics reports using REST APIs.

Full API reference documentation for SAS REST APIs is on developer.sas.com. You can exercise REST APIs in several ways such as curl, browsers, browser plugins, or any other REST client. Here I am going to access the SAS Viya Visualization and Core Services REST API with SAS Code. The Visualization service APIs provide access for reports, report images, and report transforms. The Core Services APIs provides operations for resources like folders, files, authorization, and so on.

Composition of a report object

The chart below describes the object composition of VA reports, from an API perspective. We see the report object itself has metadata storing the report properties like id, name, creator, modified date, and links, etc. Each VA report object is identified uniquely by its ID in SAS Viya. The report content object, presented in either XML or JSON format, is stored separately from the report object. The report content object enumerates the data and image resources, generating visual elements such as graphs, tables, and images.

Reports API definition

Get a list of reports

Let's begin with a scenario of getting a list of reports. These reports may be returned from a search or a filter in Viya, or a list you've got at hand. (The SAS Viya support filter link has more information on using the filter.) Here I'm using a filter to get a list of reports named 'Report 2'. I use Proc HTTP to access the Reports API in the Visualization service with a 'GET' request and '/reports/reports?filter=eq(name,'Report 2')' in the URL. Note, the HEADERS of Proc HTTP need to be set properly to generate expected results. Below is a snippet for this.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME rptFile TEMP ENCODING='UTF-8';
PROC HTTP METHOD = "GET" oauth_bearer=sas_services OUT = rptFile
      /* get a list of reports, say report name is 'Report 2' */
      URL = "&BASE_URI/reports/reports?filter=eq(name,'Report 2')";
      HEADERS "Accept" = "application/vnd.sas.collection+json"
               "Accept-Item" = "application/vnd.sas.summary+json";
RUN;
LIBNAME rptFile json;

The results of running the code above returns a list in the ITEMS table, in the rptFile json library. It returns about 10 reports with the same name of 'Report 2', each with a unique id.

ITEMS table report for 'Report2' query

Get the report content object of a VA report

Using the Reports API of the Visualization service, we can get the report content object of a VA report. As shown in the snippet below, by making a 'GET' request to the SAS Viya server followed by the '/reports/reports//content' in the URL, the report content object is retrieved.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME rptFile TEMP ENCODING='UTF-8';
PROC HTTP METHOD="GET" oauth_bearer=sas_services OUT=rptFile
           URL = "&BASE_URI/reports/reports/<report id>/content";
           HEADERS "Accept" = "application/vnd.sas.report.content+json";
RUN;
LIBNAME rptFile json;

In the output, we see the rptFile json library enumerates the data and image resources in the report content object. Below shows what I retrieved from a report content object.

Contents of rptFile json library

Notice the DATASOURCES_CASRESOURCE table, which Michael uses in Programmatically listing data sources in SAS Visual Analytics. You may explore more information in these tables if interested, such as report states, visual elements, etc. In this post, I won't dig further into the report content object.

Get the metadata of a report object

Next, I am going to get the metadata of a report object with its unique report id using the Reports API in the Visualization service. I use the 'GET' request and '/reports/reports/' in the URL. By runing the code snippet below, I get the metadata of the report object in the rptFile json library.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME rptFile TEMP ENCODING='UTF-8';
PROC HTTP METHOD="GET" oauth_bearer=sas_services OUT=rptFile
        URL = "&BASE_URI/reports/reports/cecac7d7-b957-412e-9709-a3fe504f00b1";
        HEADERS "Accept" = "application/vnd.sas.report+json";
RUN;
LIBNAME rptFile json;

Below is part of the ALLDATA table from the rptFile library. The table contains metadata of the report object, including its unique id, name, creator, creationTimeStamp, modifiedTimeStamp, links, and so on. But in the table, I can't find the folder location of the report object.

ALLDATA table from the rptFile library

Get the report object folder location

So far, I've retrieved most of the metadata info we are looking for, but not the report object folder location. All VA reports are put under the /SAS Content/ folder or its subfolders in SAS Viya. Yet, no such information exists in the report object or the report content object. How can I get the path of a VA report under the /SAS Content/ folder?

The answer is to use the Folders service on the Core Services API. Folders provide an organizational structure for SAS content as well as external content in Viya. The Folders object itself is a virtual container for other resources or folders, and it persists only the URI of resources managed by other services.

A folder object has two types of members: child and reference. Whereas resources can have references in multiple folders, they are restricted to being the child in a single folder. Resources like VA reports are added as child members of a folder, and the folder persists the URI of the is VA report. Thus, we get the folder reversely from the child report by looking for the ancestors of this report object.

By using the Folders API in Core services with a 'GET' request and '/folders/ancestors?childUri=' in the URL, the Proc HTTP code below gets the ancestors of the VA report before getting the full path.

%let BASE_URI=%sysfunc(getoption(SERVICESBASEURL));
FILENAME fldFile TEMP ENCODING='UTF-8';
PROC HTTP METHOD="GET" oauth_bearer=sas_services OUT=fldFile
          URL = "&BASE_URI/folders/ancestors?childUri=/reports/reports/cecac7d7-b957-412e-9709-a3fe504f00b1";
          HEADERS "Accept" = "application/vnd.sas.content.folder.ancestor+json";
RUN;
LIBNAME fldFile json;

From the fldFile.ANCESTORS table, we see the metadata of the ancestor folders, including folder id, folder name, creator, type, and its parentFolderURI, etc. The screenshot below is part of the ANCESTORS table. Thus, the path of the specific report concatenates these subfolders to a full path of /SAS Content/NLS/Cindy/.

Folder path detailed in the ANCESTORS table

Get the path for VA reports

Now I have several reports, I need to go through the above steps repeatedly for each report. So, I wrote SAS code to handle these:

  1. Filter those reports named 'Report 2', using the reports API in Visualization service. Save the list of reports in the ds_rpts dataset. The results include metadata for report id, name, createdBy, CreatedAt, and LastModified.
  2. For each report in the ds_rpts data set, call the macro named 'save_VA_Report_Path(reportURI)'. The macro accesses the Folders API in Core Services, and saves the path for a given report back in the rptPath column of the ds_rpts data set.
  3. Print the list of reports with path and other metadata.

The code yields the following output:

List of reports with paths and metadata

You may access my code samples from GitHub and give it a try in your environment. I run the code with SAS Studio 5.2 and VA on SAS Viya 3.5. You may prefer to modify the filter condition as needed (such as createdBy, contains, or more from SAS Viya support filter).

Finally

The Reports API is one of many SAS Viya REST APIs. In this post, I've provided multiple discovery paths to follow. You can find more information about this and other APIs on the SAS Viya REST APIs page on the developers portal.

Discover Visual Analytics Report Paths with REST APIs was published on SAS Users.

8月 192020
 

No matter what your brand's level of marketing maturity is, SAS can help you move from data to insight to action with rich functionality for adaptive planning, journey activation and an embedded real-time decision engine – all fueled by powerful analytics and artificial intelligence (AI) capabilities. Let's begin with a [...]

SAS Customer Intelligence 360: Introduction to marketing data management was published on Customer Intelligence Blog.