1月 172018
 

I've used SAS with a bunch of different REST APIs: GitHub, Brightcove, Google Analytics, Lithium, LinkedIn, and more. For most of these I have to send user/password or "secret" application tokens to the web service so that it knows who I am and what data I can retrieve. I do not want to keep this secret information in my SAS program files -- that would be a bad idea. If my credentials were part of the program -- even if they were obfuscated and not stored in clear text -- then anyone who managed to get a copy of my program could run it. And they could gain access to my data, as if they were me.

I've written about this topic for SAS-related passwords. In this article, I'll share the approach that I use for API credentials and tokens.

REST APIs: Each service requires different types of secrets

My REST API services don't require just simple user ID and password combos. It depends on the API, but usually the information is in the form of one or more tokens that I've generated using the vendor's developer console, or perhaps that have been granted by an administrator.

For example, to access the Google Analytics API, I need three things: a client ID, a client secret token, and a valid "refresh" token. I can send these three items to the Google OAuth2 API, and in return I'll receive a live "access" token that I can use to request my data. I think of this like checking into a hotel. I show my ID and a credit card at the front desk, and in exchange I receive a room key. Just like my hotel room key, the access token doesn't last forever and cannot be reused on my next visit.

Other APIs are simpler and require just a single token that never expires. That's more like a house key -- it's mine to use forever, or until someone decides to change the locks.

Whether a static token or a token-for-token exchange, I don't want to just leave these keys lying around for just anyone to find and use.

Hide your tokens in a file that only you can read

My simple approach is to store the token values in a text file within my home directory. Then, I change the permissions on the file such that only my account can read it. Whether I submit my program interactively (in SAS Enterprise Guide or SAS Studio) or as a scheduled batch job, it's running under my account. I'm showing the instructions here for UNIX/Linux, but Windows users can accomplish something similar with Windows permissions.

On Linux, I've used the chmod command to specify the octal value that says "only the owner can read/write." That's "chmod 600 filename". The "ls -l" command shows that this permissions mask has been applied.

chmod 600 ./.google_creds.csv
ls -l ./.google_creds.csv
> -rw------- 1 myid mygroup 184 Jan 15 12:41 ./.google_creds.csv

I stored my tokens in a standard CSV format because it's easy for SAS to read and it's easy for me to read if I ever need to change it.

Use INFILE to read the tokens dynamically

With this critical data now stored externally, and the file permissions in place, I can use SAS to read the credentials/tokens within my program and store the values in SAS macro variables. In the following SAS program, I assigned a macro variable to my user root folder. Since I might run this program on Linux or Windows, I used this trick to determine the proper path notation. I also used the &SYSUSERID macro variable to make my program more portable. If I want to supply this program to any colleagues (or to you!), the only thing that's needed is to create and store the token CSV files in the proper location.

/* My path is different for UNIX vs Windows */
%let authpath = %sysfunc(ifc(&SYSSCP. = WIN,
	 \\netshare\root\u\&sysuserid.,
	 /u/&sysuserid.));
 
/* This should be a file that only YOU or trusted group members can read */
/* Use "chmod 0600 filename" in UNIX environment */
/* "dotfile" notation is convention for on UNIX for "hidden" */
filename auth "&authpath./.google_creds.csv";
 
/* Read in the secret account keys from another file */
data _null_;
 infile auth firstobs=2 dsd delimiter=',' termstr=crlf;
 length client_id $ 100 client_secret $ 30 refresh_token $ 60;
 input client_id client_secret refresh_token;
 call symputx('client_id',client_id);
 call symputx('client_secret',client_secret);
 call symputx('refresh_token',refresh_token);
run;

When I run this code in my production job, I can see the result:

NOTE: The infile AUTH is:
      Filename=/u/myid/.google_creds.csv,
      Owner Name=myid,Group Name=mygroup,
      Access Permission=rw-------,
      Last Modified=Mon Jan 15 12:41:58 2018,
      File Size (bytes)=184

NOTE: 1 record was read from the infile AUTH.
      The minimum record length was 145.
      The maximum record length was 145.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.01 seconds

For this example, my next step is to call the Google API to get my access token. I'll use the macro variables that my program created with CALL SYMPUT to build the proper API call.

/* Call Google API to exchange the refresh token for an active access token */
%let oauth2=https://www.googleapis.com/oauth2/v4/token;
filename rtoken temp;
proc http
 method="POST"
 url="&oauth2.?client_id=&client_id.%str(&)client_secret=&client_secret.%str(&)grant_type=refresh_token%str(&)refresh_token=&refresh_token."
 out=rtoken;
run;

See the full explanation of this Google Analytics example in this article.

The post How to secure your REST API credentials in SAS programs appeared first on The SAS Dummy.

1月 162018
 

If you are a SAS programmer, you may wonder why you should read this. After all isn’t SAS Enterprise Guide just for folks who don’t want to be bothered writing SAS code? SAS Enterprise Guide is just point, click, and get results, right? Well there is a lot more to [...]

The post SAS Enterprise Guide tools for programmers appeared first on SAS Learning Post.

1月 152018
 

Last week I got the following message:

Dear Rick: How can I create a normal distribution within a specified range (min and max)? I need to simulate a normal distribution that fits within a specified range. I realize that a normal distribution is by definition infinite... Are there any alternatives, such as a distribution that has shape and properties similar to a normal distribution, but for which I can restrict the range? - Carol

Mathematically, this request doesn't make sense, as Carol (not her real name) acknowledges. However, if you are working with a client who is not statistically savvy, you might be asked to do the impossible! How can Carol make sense of this request?

I provide a two-part answer. First I discuss general ways to choose a distribution that models a data sample. You can use these methods when you have access to the data. Then I address the specific question and present five distributions that you can use to model data that looks like a "bounded" normal distribution. These distributions can be used when you don't have access to the data.

How to choose a distribution that matches the data?

In general, statisticians are often asked to choose a model that seems to fit an observed set of data. In simulation studies, the model is used to simulate samples that have the same distributional properties as the real data. In the book Simulating Data with SAS (Chapter 16), I discuss many strategies, including the following:

  • Use domain-specific knowledge to guide you. There might be physical or biological reasons to prefer one probability distribution over another.
  • Sample from the empirical distribution of the data, which is equivalent to bootstrap resampling. For more information, see the article about how to bootstrap in SAS or Chapter 15 of Simulating Data with SAS.
  • Use a well-known “named” parametric distribution to model the data, such as the normal, gamma, and beta distributions. Typically you will fit several candidate distributions to the data to see which fits best. (In SAS, you can use PROC UNIVARIATE, PROC SEVERITY, or PROC NLMIXED to fit distributions.) After you choose a distribution that fits the data, you can draw random samples from the fitted distribution.
  • For univariate data, you can choose a flexible system of distributions such as the Pearson system or the Johnson system. The Johnson system is supported by PROC UNIVARIATE (Wicklin, p. 112–114).
  • Use a graphical tool, called the moment-ratio diagram, to help select candidate distributions for the data. Traditionally the moment-ratio diagram is a theoretical tool for understanding the relationships of families and systems of distributions, but Chapter 16 shows how to use the moment-ratio diagram as a tool to organize simulation studies.

These ideas (and others) are illustrated in the following diagram, which is also from Wicklin (2013, p. 298):

The flowchart shows a few paths that a researcher can follow to model data. Most of the paths are also applicable to multivariate data.

Simulate data from the "eyeball distribution"

Now let's discuss the specific question that I was asked: how to simulate a "bounded" normal distribution. If Carol has access to the data, she could use the techniques in the previous section. Consequently, I assume that Carol does not have access to the data. This can happen in several ways, such as when you are trying to reproduce the results in a published article but the article includes only summary statistics or a graph of the data. If you cannot obtain the original data from the authors, you might be forced to simulate fake data based only on the graph and the summary statistics.

I call this using the "eyeball distribution." For non-native speakers of English, "to eyeball" means to look at or observe something. When used as an adjective, “eyeball” indicates that a quantity was obtained by visual inspection, rather than through formal measurements. Thus an "eyeball distribution" is one that is chosen heuristically because it looks similar to the histogram of the data. (You can extend these ideas to multivariate data.)

From Carol's description, the histogram of her data is symmetric, unimodal, and "bell-shaped." The data are in the interval [a, b]. I can think of several "eyeball distributions" that could model data like this:

  1. If you know the sample mean (m) and standard deviation (s), then draw random samples from N(m, s).
  2. If you know sample quantiles of the data, you can approximate the CDF and use inverse probability sampling for the simulation.
  3. Theory tells us that 99.7% of the normal distribution is within three standard deviations of the mean. For symmetric distributions, the mean equals the midrange m = (a + b)/2. Thus you could use the three-sigma rule to approximate s ≈ (b - a)/6 and sample from N(m, s).
  4. If you have domain knowledge that guarantees that the data are truly bounded on [a, b], you could use the truncated normal distribution, which samples from N(m, s) but discards any random variates that are outside of the interval.
  5. Alternatively, you can sample from a symmetric bounded distribution such as the Beta(α, α) distribution. For example, Beta(5, 5) is approximately bell-shaped on the interval [0,1].

The first option is often the most reasonable choice. Even though the original data are in the interval [a, b], the simulated data do not have to be in the same interval. If you believe that the original data are normally distributed, then simulate from that model and accept that you might obtain simulated data that are beyond the bounds of the original sample. The same fact holds for the third option, which estimates the standard deviation from a histogram.

The following SAS DATA step implements the last three options. The sample distributions are displayed by using a comparative histogram in SAS:

%let N = 500;      /* sample size */
%let min = 5;
%let max = 15;
 
proc format;
value MethodFmt 1 = "Normal, 3-Sigma Rule"
                2 = "Truncated Normal"
                3 = "Beta(5, 5), Scaled";
run;
data BellSim(keep=x Method
format method MethodFmt.;
call streaminit(54321);  /* set random number seed */
numSD = 3;               /* use 3-sigma rule to estiamte std dev */
mu = (&min + &max)/2;
sigma = (&max - &min)/ (2*numSD);
 
method = 1;        /* Normal distribution N(mu, sigma) */
   do i = 1 to &N;
      x = rand("Normal", mu, sigma);  output;
   end;
 
method = 2;        /* truncated normal distribution TN(mu, sigma; a, b) */
   i = 0;
   do until (i = &N);
      x = rand("Normal", mu, sigma);
      if &min < x < &max then do;
         output;
         i + 1;
      end;
   end;
 
method = 3;        /* symmetric beta distribution, scaled into [a, b] */
   alpha = 5;
   do i = 1 to &N;
      x = &min + (&max - &min)*rand("beta", alpha, alpha);
      output;
   end;
run;
 
ods graphics / width=480px height=480px;
title "Simulate Bell-Shaped Data on [&min, &max]";
proc sgpanel data=BellSim;
   panelby method / columns=1 novarname;
   histogram x;
   refline &min &max / axis=x;
   colaxis values=(&min to &max) valueshint;
run;

The histograms show a simulated sample of size 500 for each of the three models. The top sample is from the normal distribution. It contains simulated observations that are outside of the interval [a,b]. In many situations, that is fine. The second sample is from the truncated normal distribution on [a,b]. The third is from a Beta(5,5) distribution, which has been scaled onto the interval [a,b].

Although the preceding SAS program answers the question Carol asked, let me emphasize that if you (and Carol) have access to the data, you should use it to choose a model. When the data are unobtainable, you can use an "eyeball distribution," which is basically a guess. However, if your goal is to generate fake data so that you can test a computational method, an eyeball distribution might be sufficient.

The post Data unavailable? Use the "eyeball distribution" to simulate appeared first on The DO Loop.

1月 132018
 

SAS Global Forum 2021 Conference ChairEach year the SAS Global Users Group Executive Board (SGUGEB) solicits applications for the SAS Global Forum Conference Chair for the conference three years from now. Individuals are identified, applications are requested, submitted applications are reviewed, candidates are interviewed, and finally a choice is made.

We are asking for interested individuals to submit their application for SAS Global Forum 2021 Conference Chair. Yep, 2021! The SGUGEB wants to ensure that each conference chair has time to learn, gather ideas, generate ideas, learn from their predecessors and determine the focus for their conference.

Three years?

Is three years really necessary? Yep! The first year you will be working with the current conference team and begin to understand all the ins and outs of planning the content, organizing the content, and delivering the content. You will play a key role on the conference team, either on the Content Advisory Team or on the Content Delivery Team. This will help you in understanding the various roles and responsibilities of each team.   In the second year, you will again play a key role on the conference team and will utilize the experience gained from the previous year to begin developing and determining your content focus, identify potential new initiatives, and begin to build your team. The third year is all about your conference and the implementation of the focus and initiatives you identified… all with the aid of your team of course.

Who are we looking for?

Good candidates should be active SAS users, authors, administrators, managers, and/or practitioners. Individuals should be active in the SAS community and other professional conferences and organizations as well. Good presentations and collaboration skills are a must. Also, candidates should have a vision on how they want to shape their conference to benefit the SAS Community. As an SASGF or Regional conference attendee, we have benefitted from the content and education we received. Those who have been a conference chair will tell you that it is an honor and a privilege to be able to shape the educational content delivered to our SAS Community.

My experience

As conference chair for SASGF 2016, I can tell you it was one of the most rewarding professional and personal experiences I have had. I was given the opportunity to work with a lot of intelligent and talented individuals who, like me, wanted to ensure that current and future SAS users have a place to learn and grow professionally. With over 5,000 attendees and Livestream content available to millions, my institution had increased visibility, I developed additional leadership skills (by chairing such a large international conference), and I got to know and spend time with some exceptional SAS users, SAS leaders and executives. The experience was worth all the time and effort I expended.

Ready to Apply

So, are you interested? If so, we invite you to peruse information about Conference Leadership and SAS Global Forum Conference Chair roles and responsibilities, as well as the many different volunteer opportunities that exist before, during and after SAS Global Forum, and then make an informed decision about whether to apply for conference chair.

I would encourage anyone interested in applying to submit an application. Information on how to apply is available here. As well, share this information with anyone you feel would make a great conference chair and remember that the application deadline is February 18, 2018.

Interested in being the SAS Global Forum 2021 Conference Chair? Apply Now! was published on SAS Users.

1月 132018
 

SAS Global Forum 2021 Conference ChairEach year the SAS Global Users Group Executive Board (SGUGEB) solicits applications for the SAS Global Forum Conference Chair for the conference three years from now. Individuals are identified, applications are requested, submitted applications are reviewed, candidates are interviewed, and finally a choice is made.

We are asking for interested individuals to submit their application for SAS Global Forum 2021 Conference Chair. Yep, 2021! The SGUGEB wants to ensure that each conference chair has time to learn, gather ideas, generate ideas, learn from their predecessors and determine the focus for their conference.

Three years?

Is three years really necessary? Yep! The first year you will be working with the current conference team and begin to understand all the ins and outs of planning the content, organizing the content, and delivering the content. You will play a key role on the conference team, either on the Content Advisory Team or on the Content Delivery Team. This will help you in understanding the various roles and responsibilities of each team.   In the second year, you will again play a key role on the conference team and will utilize the experience gained from the previous year to begin developing and determining your content focus, identify potential new initiatives, and begin to build your team. The third year is all about your conference and the implementation of the focus and initiatives you identified… all with the aid of your team of course.

Who are we looking for?

Good candidates should be active SAS users, authors, administrators, managers, and/or practitioners. Individuals should be active in the SAS community and other professional conferences and organizations as well. Good presentations and collaboration skills are a must. Also, candidates should have a vision on how they want to shape their conference to benefit the SAS Community. As an SASGF or Regional conference attendee, we have benefitted from the content and education we received. Those who have been a conference chair will tell you that it is an honor and a privilege to be able to shape the educational content delivered to our SAS Community.

My experience

As conference chair for SASGF 2016, I can tell you it was one of the most rewarding professional and personal experiences I have had. I was given the opportunity to work with a lot of intelligent and talented individuals who, like me, wanted to ensure that current and future SAS users have a place to learn and grow professionally. With over 5,000 attendees and Livestream content available to millions, my institution had increased visibility, I developed additional leadership skills (by chairing such a large international conference), and I got to know and spend time with some exceptional SAS users, SAS leaders and executives. The experience was worth all the time and effort I expended.

Ready to Apply

So, are you interested? If so, we invite you to peruse information about Conference Leadership and SAS Global Forum Conference Chair roles and responsibilities, as well as the many different volunteer opportunities that exist before, during and after SAS Global Forum, and then make an informed decision about whether to apply for conference chair.

I would encourage anyone interested in applying to submit an application. Information on how to apply is available here. As well, share this information with anyone you feel would make a great conference chair and remember that the application deadline is February 18, 2018.

Interested in being the SAS Global Forum 2021 Conference Chair? Apply Now! was published on SAS Users.

1月 132018
 

The purpose of this blog is simply to post this link to an article in The Economist.

Click here to read the article “How soon will computers replace The Economist’s science writers?

At this point, I could stop writing because this article falls into the category of No Comment Needed, but that would be too easy.  As a conscientious blogger, I will add that while computers have clearly failed to master technical and scientific writing, they should have no trouble writing literary reviews.

1月 132018
 

Former U.S. Chief Technology officer Megan Smith stressed the importance of continued investments in science, technology, engineering and mathematics (STEM) in her keynote address at SAS Analytics Experience, sharing a quote from George Washington. In his first address to Congress, in 1790, Washington said, “There is nothing which can better [...]

Former US Chief Technology Officer encourages STEM diversity was published on SAS Voices by Paula Henderson

1月 112018
 

If you work in a team environment, you might be accustomed to using mapped network drives for source data folders or to publish results. If you've recently moved to a SAS server environment, you might not have those mapped drives available. How can you tell?

This question was posted on the SAS Support Communities, and SAS Communities member Patrick provided a handy code snippet that does the trick. Patrick learned the trick himself from SAS Note 24818.

The following code uses a special DRIVEMAP device for the

filename diskinfo DRIVEMAP;
data _null_;
  infile diskinfo;
  input;
  put _infile_;
run;

Here's the output on my PC, where I have a two mapped network drives, U: and W:. (C: and D: are part of my laptop device.)

NOTE: The infile DISKINFO is:
      Drivemap Access Device,
      PROCESS,RECFM=V,LRECL=32767
C:
D:
U:
W:

Mapped drives are typically available only from your local machine. Even if you use SAS on a remote Windows server, the process that defines the mapped drive aliases is usually not triggered when you connect. If this is the case for you, you'll have to adjust your process to use UNC path notation (example: "\\server\folder\project" instead of the familiar "P:\project" shortcut). (Additional note for SAS admins: on a Windows SAS server,

The post How to list your mapped drives in a SAS program appeared first on The SAS Dummy.

1月 112018
 

The SAS® platform is now open to be accessed from open-source clients such as Python, Lua, Java, the R language, and REST APIs to leverage the capabilities of SAS® Viya® products and solutions. You can analyze your data in a cloud-enabled environment that handles large amounts of data in a variety of different formats. To find out more about SAS Viya, see the “SAS Viya: What's in it for me? The user.” article.

This blog post focuses on the openness of SAS® 9.4 and discusses features such as the SASPy package and the SAS kernel for Jupyter Notebook and more as clients to SAS. Note: This blog post is relevant for all maintenance releases of SAS 9.4.

SASPy

The SASPy package enables you to connect to and run your analysis from SAS 9.4 using the object-oriented methods and objects from the Python language as well as the Python magic methods. SASPy translates the objects and methods added into the SAS code before executing the code. To use SASPy, you must have SAS 9.4 and Python 3.x or later.
Note: SASPy is an open-source project that encourages your contributions.

After you have completed the installation and configuration of SASPy, you can import the SASPy package as demonstrated below:
Note: I used Jupyter Notebook to run the examples in this blog post.

1.   Import the SASPy package:

Openness of SAS® 9.4

2.   Start a new session. The sas object is created as a result of starting a SAS session using a locally installed version of SAS under Microsoft Windows. After this session is successfully established, the following note is generated:

Adding Data

Now that the SAS session is started, you need to add some data to analyze. This example uses SASPy to read a CSV file that provides census data based on the ZIP Codes in Los Angeles County and create a SASdata object named tabl:

To view the attributes of this SASdata object named tabl, use the PRINT() function below, which shows the libref and the SAS data set name. It shows the results as Pandas, which is the default result output for tables.

Using Methods to Display and Analyze Data

This section provides some examples of how to use different methods to interact with SAS data via SASPy.

Head() Method

After loading the data, you can look at the first few records of the ZIP Code data, which is easy using the familiar head() method in Python. This example uses the head() method on the SASdata object tabl to display the first five records. The output is shown below:

Describe() Method

After verifying that the data is what you expected, you can now analyze the data. To generate a simple summary of the data, use the Python describe() method in conjunction with the index [1:3]. This combination generates a summary of all the numeric fields within the table and displays only the second and third records. The subscript works only when the result is set to Pandas and does not work if set to HTML or Text, which are also valid options.

Teach_me_SAS() Method

The SAS code generated from the object-oriented Python syntax can also be displayed using SASPy with the teach_me_SAS() method. When you set the argument in this method to True, which is done using a Boolean value, the SAS code is displayed without executing the code:

ColumnInfo() Method

In the next cell, use the columnInfo() method to display the information about each variable in the SAS data set. Note: The SAS code is generated as a result of adding the teach_me_SAS() method in the last section:

Submit() Method

Then, use the submit() method to execute the PROC CONTENTS that are displayed in the cell above directly from Python. The submit method returns a dictionary with two keys, LST and LOG. The LST key contains the results and the LOG key returns the SAS log. The results are displayed as HTML. The HTML package is imported  to display the results.

The SAS Kernel Using Jupyter Notebook

Jupyter Notebook can run programs in various programming languages including SAS when you install and configure the SAS kernel. Using the SAS kernel is another way to run SAS interactively using a web-based program, which also enables you to save the analysis in a notebook. See the links above for details about installation and configuration of the SAS kernel. To verify that the SAS kernel installed successfully, you can run the following code: jupyter kernelspec list

From the command line, use the following command to start the Jupyter Notebook: Jupyter notebook. The screenshot below shows the Jupyter Notebook session that starts when you run the code. To execute SAS syntax from Jupyter Notebook, select SAS from the New drop-down list as shown below:

You can add SAS code to a cell in Jupyter Notebook and execute it. The following code adds a PRINT procedure and a SGPLOT procedure. The output is in HTML5 by default. However, you can specify a different output format if needed.

You can also use magics in the cell such as the %%python magic even though you are using the SAS kernel. You can do this for any kernel that you have installed.

Other SAS Goodness

There are more ways of interacting with other languages with SAS as well. For example, you can use the Groovy procedure to run Groovy statements on the Java Virtual Machine (JVM). You can also use the LUA procedure to run LUA code from SAS along with the ability to call most SAS functions from Lua. For more information, see “Using Lua within your SAS programs.” Another very powerful feature is the DATA step JavaObject, which provides the ability to instantiate Java classes and access fields and methods. The DATA step JavaObject has been available since SAS® 9.2.

Resources

SASPy Documentation

Introducing SASPy: Use Python code to access SAS

Come on in, we're open: The openness of SAS® 9.4 was published on SAS Users.