R

3月 192020
 

At SAS Press, we agree with the saying “The best things in life are free.” And one of the best things in life is knowledge. That’s why we offer free e-books to help you learn SAS or improve your skills. In this blog post, we will introduce you to one of our amazing titles that is absolutely free.

SAS Programming for R Users

Many data scientists today need to know multiple programming languages including SAS, R, and Python. If you already know basic statistical concepts and how to program in R but want to learn SAS, then SAS Programming for R Users by Jordan Bakerman was designed specifically for you! This free e-book explains how to write programs in SAS that replicate familiar functions and capabilities in R. This book covers a wide range of topics including the basics of the SAS programming language, how to import data, how to create new variables, random number generation, linear modeling, Interactive Matrix Language (IML), and many other SAS procedures. This book also explains how to write R code directly in the SAS code editor for seamless integration between the two tools.

The book is based on the free, 14-hour course of the same name offered by SAS Education available here. Keep reading to learn more about the differences between SAS and R.

SAS versus R

R is an object-oriented programming language. Results of a function are stored in an object and desired results are pulled from the object as needed. SAS revolves around the data table and uses procedures to create and print output. Results can be saved to a new data table.

Let’s briefly compare SAS and R in a general way. Look at the following table, which outlines some of the major differences between SAS and R.

Here are a few other things about SAS to note:

  • SAS has the flexibility to interact with objects. (However, the book focuses on procedural methods.)
  • SAS does not have a command line. Code must be run in order to return results.

SAS Programs

A SAS program is a sequence of one or more steps. A step is a sequence of SAS statements. There are only two types of steps in SAS: DATA and PROC steps.

  • DATA steps read from an input source and create a SAS data set.
  • PROC steps read and process a SAS data set, often generating an output report. Procedures can be called an umbrella term. They are what carry out the global analysis. Think of a PROC step as a function in R.

Every step has a beginning and ending boundary. SAS steps begin with either of the following statements:

  • a DATA statement
  • a PROC statement

After a DATA or PROC statement, there can be additional SAS statements that contain keywords that request SAS perform an operation or they can give information to the system. Think of them as additional arguments to a procedure. Statements always end with a semicolon!

SAS options are additional arguments and they are specific to SAS statements. Unfortunately, there is no rule to say what is a statement versus what is an option. Understanding the difference comes with a little bit of experience. Options can be used to do the following:

  • generate additional output like results and plots
  • save output to a SAS data table
  • alter the analytical method

SAS detects the end of a step when it encounters one of the following statements:

  • a RUN statement (for most steps)
  • a QUIT statement (for some procedures)

Most SAS steps end with a RUN statement. Think of the RUN statement as the right parentheses of an R function. The following table shows an example of a SAS program that has a DATA step and a PROC step. You can see that both SAS statements end with RUN statements, while the R functions begin and end with parentheses.

If you want to learn more about this book or any other free e-books from SAS Press, visit https://support.sas.com/en/books/free-books.html. Subscribe to our newsletter to get the latest information on new books.

Free e-book: SAS Programming for R Users was published on SAS Users.

12月 042019
 

Site relaunches with improved content, organization and navigation.

In 2016, a cross-divisional SAS team created developer.sas.com. Their mission: Build a bridge between SAS (and our software) and open source developers.

The initial effort made available basic information about SAS® Viya® and integration with open source technologies. In June 2018, the Developer Advocate role was created to build on that foundation. Collaborating with many of you, the SAS Communities team has improved the site by clarifying its scope and updating it consistently with helpful content.

Design is an iterative process. One idea often builds on another.

-- businessman Mark Parker

The team is happy to report that recently developer.sas.com relaunched, with marked improvements in content, organization and navigation. Please check it out and share with others.

New overview page on developer.sas.com

The developer experience

The developer experience goes beyond the developer.sas.com portal. The Q&A below provides more perspective and background.

What is the developer experience?

Think of the developer experience (DX) as equivalent to the user experience (UX), only the developer interacts with the software through code, not points and clicks. Developers expect and require an easy interface to software code, good documentation, support resources and open communication. All this interaction occurs on the developer portal.

What is a developer portal?

The white paper Developer Portal Components captures the key elements of a developer portal. Without going into detail, the portal must contain (or link to) these resources: an overview page, onboarding pages, guides, API reference, forums and support, and software development kits (SDKs). In conjunction with the Developers Community, the site’s relaunch includes most of these items.

Who are these developers?

Many developers fit somewhere in these categories:

  • Data scientists and analysts who code in open source languages (mainly Python and R in this case).
  • Web application developers who create apps that require data and processing from SAS.
  • IT service admins who manage customer environments.

All need to interact with SAS but may not have written SAS code. We want this population to benefit from our software.

What is open source and how is SAS involved?

Simply put, open source software is just what the name implies: the source code is open to all. Many of the programs in use every day are based on open source technologies: operating systems, programming languages, web browsers and servers, etc. Leveraging open source technologies and integrating them with commercial software is a popular industry trend today. SAS is keeping up with the market by providing tools that allow open source developers to interact with SAS software.

What is an API?

All communications between open source and SAS are possible through APIs, or application programming interfaces. APIs allow software systems to communicate with one another. Software companies expose their APIs so developers can incorporate functionality and send or request data from the software.

Why does SAS care about APIs?

APIs allow the use of SAS analytics outside of SAS software. By allowing developers to communicate with SAS through APIs, customer applications easily incorporate SAS functions. SAS has created various libraries to aid in open source integration. These tools allow developers to code in the language of their choice, yet still interface with SAS. Most of these tools exist on github.com/sassoftware or on the REST API guides page.

A use case for SAS APIs

A classic use of SAS APIs is for a loan default application. A bank creates a model in SAS that determines the likelihood of a customer defaulting on a loan based on multiple factors. The bank also builds an application where a bank representative enters the information for a new potential customer. The bank application code uses APIs to communicate this information to the SAS model and return a credit decision.

What is a developer advocate?

A developer advocate is someone who helps developers succeed with a platform or technology. Their role is to act as a bridge between the engineering team and the developer community. At SAS, the developer advocate fields questions and comments on the Developers Community and works with R&D to provide answers. The administration of developer.sas.com also falls under the responsibility of the developer advocate.

We’re not done

The site will continue to evolve, with additions of other SAS products and offerenings, and other initiatives. Check back often to see what’s new.
Now that you are an open source and SAS expert, please check out the new developer.sas.com. We encourage feedback and suggestions for content. Leave comments and questions on the site or contact Joe Furbee: joe.furbee@sas.com.

developer.sas.com 2.0: More than just a pretty interface was published on SAS Users.

3月 072018
 

The R SWAT package (SAS Wrapper for Analytics Transfer) enables you to upload big data into an in-memory distributed environment to manage data and create predictive models using familiar R syntax. In the SAS Viya Integration with Open Source Languages: R course, you learn the syntax and methodology required to [...]

The post Use R to interface with SAS Cloud Analytics Services appeared first on SAS Learning Post.

5月 242017
 

According to Hyndman and Fan ("Sample Quantiles in Statistical Packages," TAS, 1996), there are nine definitions of sample quantiles that commonly appear in statistical software packages. Hyndman and Fan identify three definitions that are based on rounding and six methods that are based on linear interpolation. This blog post shows how to use SAS to visualize and compare the nine common definitions of sample quantiles. It also compares the default definitions of sample quantiles in SAS and R.

Definitions of sample quantiles

Suppose that a sample has N observations that are sorted so that x[1] ≤ x[2] ≤ ... ≤ x[N], and suppose that you are interested in estimating the p_th quantile (0 ≤ p ≤ 1) for the population. Intuitively, the data values near x[j], where j = floor(Np) are reasonable values to use to estimate the quantile. For example, if N=10 and you want to estimate the quantile for p=0.64, then j = floor(Np) = 6, so you can use the sixth ordered value (x[6]) and maybe other nearby values to estimate the quantile.

Hyndman and Fan (henceforth H&F) note that the quantile definitions in statistical software have three properties in common:

  • The value p and the sample size N are used to determine two adjacent data values, x[j]and x[j+1]. The quantile estimate will be in the closed interval between those data points. For the previous example, the quantile estimate would be in the closed interval between x[6] and x[7].
  • For many methods, a fractional quantity is used to determine an interpolation parameter, λ. For the previous example, the fraction quantity is (Np - j) = (6.4 - 6) = 0.4. If you use λ = 0.4, then an estimate the 64th percentile would be the value 40% of the way between x[6] and x[7].
  • Each definition has a parameter m, 0 ≤ m ≤ 1, which determines how the method interpolates between adjacent data points. In general, the methods define the index j by using j = floor(Np + m). The previous example used m=0, but other choices include m=0.5 or values of m that depend on p.

Thus a general formula for quantile estimates is q = (1 - λ) x[j]+ λ x[j+1], where λ and j depend on the values of p, N, and a method-specific parameter m.

You can read Hyndman and Fan (1986) for details or see the Wikipedia article about quantiles for a summary. The Wikipedia article points out a practical consideration: for values of p that are very close to 0 or 1, some definitions need to be slightly modified. For example, if p < 1/N, the quantity Np < 1 and so j = floor(Np) equals 0, which is an invalid index. The convention is to return x[1] when p is very small and return x[N] when p is very close to 1.

Compute all nine sample quantile definitions in SAS

SAS has built-in support for five of the quantile definitions, notably in PROC UNIVARIATE, PROC MEANS, and in the QNTL subroutine in SAS/IML. You can use the QNTLDEF= option to choose from the five definitions. The following table associates the five QNTLDEF= definitions in SAS to the corresponding definitions from H&F, which are also used by R. In R you choose the definition by using the type parameter in the quantile function.

SAS definitions of sample quantiles

It is straightforward to write a SAS/IML function to compute the other four definitions in H&F. In fact, H&F present the quantile interpolation functions as specific instances of one general formula that contains a parameter, which they call m. As mentioned above, you can also define a small value c (which depends on the method) such that the method returns x[1] if p < c, and the method returns x[N] if p ≥ 1 - c.

The following table presents the parameters for computing the four sample quantile definitions that are not natively supported in SAS:

Definitions of sample quantiles that are not natively supported in SAS

Visualizing the definitions of sample quantiles

Visualization of nine defniitions of sample quantiles, from Hyndman and Fan (1996)

You can download the SAS program that shows how to compute sample quantiles and graphs for any of the nine definitions in H&F. The differences between the definitions are most evident for small data sets and when there is a large "gap" between one or more adjacent data values. The following panel of graphs shows the nine sample quantile methods for a data set that has 10 observations, {0 1 1 1 2 2 2 4 5 8}. Each cell in the panel shows the quantiles for p = 0.001, 0.002, ..., 0.999. The bottom of each cell is a fringe plot that shows the six unique data values.

In these graphs, the horizontal axis represents the data and quantiles. For any value of x, the graph estimates the cumulative proportion of the population that is less than or equal to x. Notice that if you turn your head sideways, you can see the quantile function, which is the inverse function that estimates the quantile for each value of the cumulative probability.

You can see that although the nine quantile functions have the same basic shape, the first three methods estimate quantiles by using a discrete rounding scheme, whereas the other methods use a continuous interpolation scheme.

You can use the same data to compare methods. Instead of plotting each quantile definition in its own cell, you can overlay two or more methods. For example, by default, SAS computes sample quantiles by using the type=2 method, whereas R uses type=7 by default. The following graph overlays the sample quantiles to compare the default methods in SAS and R on this tiny data set. The default method in SAS always returns a data value or the average of adjacent data values; the default method in R can return any value in the range of the data.

Comparison of the default  quantile estimates in SAS and R on a tiny data set

Does the definition of sample quantiles matter?

As shown above, different software packages use different defaults for sample quantiles. Consequently, when you report quantiles for a small data set, it is important to report how the quantiles were computed.

However, in practice analysts don't worry too much about which definition they are using because the difference between methods is typically small for larger data sets (100 or more observations). The biggest differences are often between the discrete methods, which always report a data value or the average between two adjacent data values, and the interpolation methods, which can return any value in the range of the data. Extreme quantiles can also differ between the methods because the tails of the data often have fewer observations and wider gaps.

The following graph shows the sample quantiles for 100 observations that were generated from a random uniform distribution. As before, the two sample quantiles are type=2 (the SAS default) and type=7 (the R default). At this scale, you can barely detect any differences between the estimates. The red dots (type=7) are on top of the corresponding blue dots (type=2), so few blue dots are visible.

Comparison of the default  quantile estimates in SAS and R on a larger data set

So does the definition of the sample quantile matter? Yes and no. Theoretically, the different methods compute different estimates and have different properties. If you want to use an estimator that is unbiased or one that is based on distribution-free computations, feel free to read Hyndman and Fan and choose the definition that suits your needs. The differences are evident for tiny data sets. On the other hand, the previous graph shows that there is little difference between the methods for moderately sized samples and for quantiles that are not near gaps. In practice, most data analysts just accept the default method for whichever software they are using.

In closing, I will mention that there are other quantile estimation methods that are not simple formulas. In SAS, the QUANTREG procedure solves a minimization problem to estimate the quantiles. The QUANTREG procedure enables you to not only estimate quantiles, but also estimate confidence intervals, weighted quantiles, the difference between quantiles, conditional quantiles, and more.

SAS program to compute nine sample quantiles.

The post Sample quantiles: A comparison of 9 definitions appeared first on The DO Loop.

7月 312015
 

Last week, SAS released the 14.1 version of its analytics products, which are shipped as part of the third maintenance release of 9.4. If you run SAS/IML programs from a 64-bit Windows PC, you might be interested to know that you can now create matrices with about 231 ≈ 2 billion elements, provided that your system has enough RAM. (On Linux operating systems, this feature has been available since SAS 9.3.)

A numerical matrix with 2 billion elements requires 16 GB of RAM. In terms of matrix dimensions, this corresponds to a square numerical matrix that has approximately 46,000 rows and columns. I've written a handy SAS/IML program to determine how much RAM is required to store a matrix of a given size.

If you are running 64-bit SAS on Windows, this article describes how to set an upper limit for the amount of memory that SAS allocate for large matrices.

The MEMSIZE option

The amount of memory that SAS can allocate depends on the value of the MEMSIZE system option, which has a default value of 2GB on Windows. Many SAS sites do not override the default value, which means that SAS cannot allocate more than 2 GB of system memory.

You can run PROC OPTIONS to display the current value of the MEMSIZE option.

proc options option=memsize value;
run;
Option Value Information For SAS Option MEMSIZE
    Value: 2147483648
    Scope: SAS Session
    How option value set: Config File
    Config file name:
            C:Program FilesSASHomeSASFoundation9.4nlsensasv9.cfg

The value 2,147,483,648 is shown in the SAS log. The value is unfortunately in bytes. This number corresponds to 2 GB. Unless you change the MEMSIZE option, you will not be able to allocate a square matrix with more than about 16,000 rows and columns. For example, unless SAS can allocate 5 GB or more of RAM, the following SAS/IML program will produce an error message:

proc iml; 
/* allocate 25,000 x 25,000 matrix, which requires 4.7 GB */
x = j(25000, 25000, 0);


ERROR: Unable to allocate sufficient memory.

You can use the MEMSIZE system option to permit SAS to allocate a greater amount of system memory. SAS does not grab this memory and hold onto it. Instead, the MEMSIZE option specifies a maximum value for dynamic allocations.

The MEMSIZE option only applies when you launch SAS, so if SAS is currently running, save your work and exit SAS before continuing.

Changing the command-line invocation for SAS

If you run SAS locally on your PC, you can add the -MEMSIZE command-line option to the shortcut that you use to invoke SAS. This example uses "12G" to permit SAS to allocate up to 12 GB of RAM, but you can use different numbers, such as 8G or 16G.

  1. Locate the "SAS 9.4" icon on your Desktop or the "SAS 9.4" item on the Start menu.
  2. Right-click on the shortcut and select Properties
  3. A dialog box appears. Edit the Target field and insert -MEMSIZE 12G at the end of the current text, as shown in the image.

  4. memsize
  5. Click OK.

Every time you use this shortcut to launch SAS, the SAS process can allocate up to 12 GB of RAM. You can also specify -MEMSIZE 0, which permits allocations up to 80% of the available RAM. Personally, I do not use -MEMSIZE 0 because it permits SAS to consume most of the system memory, which does not leave much for other applications. I rarely permit SAS to use more than 75% of my RAM.

After editing the shortcut, launch SAS and call PROC OPTIONS. This time you should see something like the following:

Option Value Information For SAS Option MEMSIZE
    Value: 12884901888
    Scope: SAS Session
    How option value set: SAS Session Startup Command Line

SAS configuration files

A drawback of the command-line approach is that it only applies to a SAS session that is launched from the shortcut that you modified. In particular, it does not apply to launching SAS by double-clicking on a .sas or .sas7bdat file.

An alternative is to create or edit a configuration file. The SAS documentation has long and complete instructions about how to edit the sasv9.cfg file that sets the system options for SAS when SAS is launched.

SAS 9 creates two default configuration files during installation. Both configuration files are named SASV9.CFG. I suggest that you edit the one in !SASHOMESASFoundation9.4, which on many installations is c:program filesSASHomeSASFoundation9.4. By default, that configuration file has a -CONFIG option that points to a language-specific configuration file. Put the -MEMSIZE option and any other system options after the -CONFIG option, as follows:

-config "C:Program FilesSASHomeSASFoundation9.4nlsensasv9.cfg"
-RLANG
-MEMSIZE 12G

Notice that I also put the -RLANG option in this sasv9.cfg file. The -RLANG system option specifies that SAS/IML software can interface with the R language.

If you now double-click on a .sas file to launch SAS, PROC OPTIONS reports the following information:

Option Value Information For SAS Option MEMSIZE
    Value: 12884901888
    Scope: SAS Session
    How option value set: Config File
    Config file name:
            C:Program FilesSASHomeSASFoundation9.4SASV9.CFG

If you add multiple system options to the configuration file, you might want to go back to the SAS 9.4 Properties dialog box (in the previous section) and edit the Target value to point to the configuration file that you just edited.

Remote SAS servers

If you connect to a remote SAS server and submit SAS/IML programs through SAS/IML Studio, SAS Enterprise Guide, or SAS Studio, a SAS administrator has probably provided a configuration file that specifies how much RAM can be allocated by your SAS process. If you need a larger limit, discuss the situation with your SAS administrator.

Final thoughts on big matrices

You can create SAS/IML matrices that have millions of rows and hundreds of columns. However, you need to recognize that many matrix computations scale cubically with the number of elements in the matrix. For example, many computations on an n x n matrix require on the order of n3 floating point operations. Consequently, although you might be able to create extremely large matrices, computing with them can be very time consuming.

In short, allocating a large matrix is only the first step. The wise programmer will time a computation on a sequence of smaller problems as a way of estimating the time required to tackle The Big Problem.

tags: 14.1, R, SAS Programming

The post Large matrices in SAS/IML 14.1 appeared first on The DO Loop.

5月 132015
 

I didn’t play with SAS/IML for a while. I call it back when I need to read some R format data.

Technically, .Rdata is not a data format. It’s rather a big container to hold bunch of R objects:

Rdata

In this example, when a .Rdata is loaded, 3 objects are included where ‘data’(the ‘real’ data) and ‘desc’ (data description portion) are of our interests.

SAS/IML offers a nice interface to call R command which can be used to read the R format data:

proc iml;
submit / R;
load(“C:/data/w5/R data sets for 5e/GPA1.RData”)
endsubmit;

    call ImportDataSetFromR(“work.GPA1″, “data“);
call ImportDataSetFromR(“work.GPA1desc”, “desc“);

quit;

data _null_;
set GPA1desc end = eof ;
i+1;
II=left(put(i,3.));
call symputx(‘var’||II,variable);
call symputx(‘label’||II,label);
if eof then call symputx(‘n’,II);
run;

%macro labelit;
data gpa1;
set gpa1;

    label
%do i=1 %to &n;
&&var&i = &&label&i
%end;
;
run;
%mend;

%labelit

4月 022014
 

Last year I gave a talk in SESUG 2013 on list manipulation on SAS using a collection of function-like macros. Today I just explored in my recently upgraded SAS 9.4 that I can play with list natively, which means I can create a list, slice a list and do other list operations in Data Steps! This is not documented yet(which means it will not be supported by the software vendor) and I can see warning message in Log window like “WARNING: List object is preproduction in this release”,  and it is still limited somehow, so use it in your own risk (and of course, fun).  Adding such versatile list object will definitely make SAS programmers more powerful. I will keep watch its further development.

*************Update********

Some readers emailed to me that they can’t get the expected results as I did here. I think it’s best to check your own system:

I. Make sure you use the latest SAS software. I only tested on a 64-bit Window 7 machine with SAS 9.4 TS1M1:

SAS94

II. Make sure all hotfixes were applied (You can use this SAS Hot Fix Analysis, Download and Deployment Tool).

hotfix

*************Update End********

The followings are some quick plays and I will report more after more research:

1. Create a List

It’s easy to create a list:

data _null_;
a = ['apple', 'orange', 'banana'];
put a;
run;

the output in Log window:

list1

You can also transfer a string to a list:

data _null_;
a = ‘SAS94′
b =list(a)
put b;
run;

list2

2. Slice a List

Slicing a list is also pretty straightforward, like in R and Python:

data _null_;
a = ['apple', 'orange', 'banana'];
b = a[0];
c = a[:-1];
d = a[1:2];
put a;
put b;
put c;
put d;
run;

list3

3. List is Immutable in SAS!?

I felt much confortable to play list operations in SAS but a weird thing just happened. I tried to change a value in a list:

data _null_;
a = ['apple', 'orange', 'banana'];
a[0] = ‘Kivi’;
put a;
run;

Unexpectedly, I got an error:

list4

hhh, I need to create a new list to hold such modification? This is funny.

Based on my quick exploration, the list object in SAS is pretty intuitive from a programmers’ point of view. But since it’s undocumented and I don’t know how long it will stay in “preproduction” phase,  just be careful to implement it in your production work.

Personally I feel very exciting to “hack” such wonderful list features in SAS 9.4. If well implemented, it will easily beat R and Python (which claim themselves supporting rich data types and objects) as a scripting language for SAS programmers. I will keep update in this page.

12月 312013
 
Just some tips:

options(error=recover):  it will tell R to launch debug section and you can choose which one to debug


options(show.error.locations=TRUE):   let R show the source line number

Something else:

use traceback() to locate where the last error message is and then use browser() to run the function again to check what is wrong.
12月 252013
 


## first explain what is type="terms":  If type="terms" is selected, a matrix of predictions 
## on the additive scale is produced, each column giving the deviations from the overall mean
## (of the original data's response, on the additive scale), which is given by the attribute "constant".
 
set.seed(9999)
x1=rnorm(10)
x2=rnorm(10)
y=rnorm(10)
lmm=lm(y~x1+x2)
predict(lmm, data=cbind(x1,x2,y), type="terms")
 
lmm$coefficient[1]+lmm$coefficient[2]*x1+mean(lmm$coefficient[3]*x2)-mean(y)-predlm[,1]
lmm$coefficient[1]+lmm$coefficient[3]*x2+mean(lmm$coefficient[2]*x1)-mean(y)-predlm[,2]
 Posted by at 3:19 下午
12月 242013
 
library(ROCR)
library(Hmisc)
 
## calculate AUC from the package ROCR and compare with it from Hmisc
 
# method 1: from ROCR
data(ROCR.simple)
pred=prediction
(ROCR.simple$prediction, ROCR.simple$labels)
perf=performance
(pred, 'tpr', 'fpr') #true positive and false negative
plot(perf, colorize=T)
 
perf2=performance
(pred, 'auc')
auc=
unlist(slot(perf2, 'y.values')) # this is the AUC
 
# method 2: from Hmisc
rcorrstat=rcorr.cens
(ROCR.simple$prediction, ROCR.simple$labels)
rcorrstat
[1] # 1st is AUC, 2nd is Accuracy Ratio(Gini Coefficient, or PowerStat, or Somer's D)
 Posted by at 2:48 下午  Tagged with: