R

1月 092023
 

Since 2008, SAS has supported an interface for calling R from the SAS/IML matrix language. Many years ago, I wrote blog posts that describe how to call R from PROC IML. For SAS 9.4, the process of installing R and calling R from PROC IML is documented in the SAS/IML User's Guide. Essentially, you install R on the same computer that runs the SAS Workspace Server so that SAS and R can communicate with each other. A SAS programmer can install R on his desktop machine that runs SAS; a SAS administrator can install R on a remote SAS Workspace Server.

Fast forward to 2023. Today, many SAS customers run SAS Viya in the cloud. If you want to call R from PROC IML in SAS Viya, R must be installed and deployed by a SAS Viya administrator. The steps to install R and deploy SAS are very different from the old SAS 9 days. This article provides a high-level overview.

Call SAS from R or call R from SAS?

In SAS 9.4, you can call R from SAS. In SAS Viya, you can call SAS actions from open-source languages, which means that you can also call SAS from R. This capability has been available since circa 2019 and can be used on Viya 3.5 as well as modern releases of Viya.

Thus, R programmers have a choice. Do you want to use R as a client to drive the flow of the program and occasionally call some computation in SAS? Or do you want to use SAS as a client and occasionally call computations in R? SAS supports both options.

Call SAS from R

If you are primarily an R programmer who wants to call a Viya action for a special computation (maybe a huge parallel computation in the cloud), you can use the SWAT package in R to connect to a CAS server and to call actions. SWAT stands for SAS Wrapper for Analytics Transfer, and SAS provides SWAT packages for several open-source languages, including R and Python. The following two resources can help you get started with calling CAS actions from the R language:

Call R from SAS

As mentioned earlier, the process of calling R from SAS is relatively straightforward in SAS 9.4. It is more complex in SAS Viya because both SAS and R must run "in the cloud." In practice, this means that SAS and R should be in the same container that is deployed. Thus, a SAS Viya administrator must build R, add it to a container, and deploy the container to the cloud. The main steps are as follows. They are taken from Scott McCauley's article, where you can find the details: This process assumes that you already have an existing SAS Viya deployment, and you want to add R to the deployment.

  1. Build R and packages: SAS provides a utility application called the "SAS Configurator for Open Source," which automates downloading, building, and installing R from source. The tool builds R and any packages from source on Linux. It puts the compiled files in a location, called the Persistent Volume Claim (PVC). Note: Scott's example installs both Python and R. In his example, the SAS Configurator for Open Source creates and executes a job called sas-pyconfig that (despite its name) installs both open-source products. He does not show an example that installs only R.
  2. Configure SAS Viya to use the installation: Scott calls this "make R visible to SAS Viya."
  3. Tell SAS Viya how to connect to R: You need to define some environment variables and SAS to enable R to be called by PROC IML or by other SAS products. This step is similar in SAS 9.4, but in SAS Viya the changes must be made to a YAML file.
  4. Optionally configure access: You can limit who can access external languages such as R.
  5. Rebuild the SAS deployment and apply the changes: To apply these changes, you must update your SAS Viya deployment.

The purpose of this article is to inform people that SAS provides tools to include R as part of a SAS Viya deployment. These steps were tested on the Viya release Stable 2022.12. I am not a SAS Viya administrator, so I confess that I have never implemented this process myself. If you have questions about this process, please post them to Scott McCauley's article. He is much more qualified than I am to answer questions about configuration and deployment.

The post Installing R for SAS IML in SAS Viya appeared first on The DO Loop.

7月 262021
 

A SAS programmer recently asked why his SAS program and his colleague's R program display different estimates for the quantiles of a very small data set (less than 10 observations). I pointed the programmer to my article that compares the nine common definitions for sample quantiles. The article has a section that explicitly compares the default sample quantiles in SAS and R. The function in the article is written to support all nine definitions. The programmer asked whether I could provide a simpler function that computes only the default definition in R.

This article compares the default sample quantiles in SAS in R. It is a misnomer to refer to one definition as "the SAS method" and to another as "the R method." In SAS, procedures such as PROC UNIVARIATE and PROC MEANS enable you to use the QNTLDEF= to use five different quantile estimates. By using SAS/IML, you can compute all nine estimation methods. Similarly, R supports all nine definitions. However, users of statistical software often use the default methods, then wonder why they get different answers from different software. This article explains the difference between the DEFAULT method in SAS and the DEFAULT method in R. The default in R is also the default method in Julia and in the Python packages SciPy and NumPy.

The Hyndman and Fan taxonomy

The purpose of a sample statistic is to estimate the corresponding population parameter. That is, the sample quantiles are data-based estimates of the unknown quantiles in the population. Hyndman and Fan ("Sample Quantiles in Statistical Packages," TAS, 1996), discuss nine definitions of sample quantiles that commonly appear in statistical software packages. All nine definitions result in valid estimates. For large data sets (say, 100 or more observations), they tend to give similar results. The differences between the definitions are most evident if you use a small data set that has wide gaps between two adjacent pairs of values (after sorting the data). The example in this article is small and has a wide gap between the largest value and the next largest value.

By default, SAS uses Hyndman and Fan's Type=2 method, whereas R (and Julia, SciPy, and NumPy) use the Type=7 method. The Type=2 method uses the empirical cumulative distribution of the data (empirical CDF) to estimate the quantiles, whereas the Type=7 method uses a piecewise-linear estimate of the cumulative distribution function. This is demonstrated in the next section.

An example of sample quantiles

To focus the discussion, consider the data {0, 1, 1, 1, 2, 2, 2, 4, 5, 8}. There are 10 observations, but only six unique values. The following graphs show the estimates of the cumulative distribution function used by the Type=2 and Type=7 methods. The fringe plot (rug plot) below the CDF shows the locations of the data:

The sample quantiles are determined by the estimates of the CDF. The largest gap in the data is between the values X=5 and X=8. So, for extreme quantiles (greater than 0.9), we expect to see differences between the Type=2 and the Type=7 estimates for extreme quantiles. The following examples show that the two methods agree for some quantiles, but not for others:

  • The 0.5 quantile (the median) is determined by drawing a horizontal line at Y=0.5 and seeing where the horizontal line crosses the estimate of the CDF. For both graphs, the corresponding X value is X=2, which means that both methods give the same estimate (2) for the median.
  • The 0.75 quantile (the 75th percentile) estimates are different between the two methods. In the upper graph, a horizontal line at 0.75 crosses the empirical CDF at X=4, which is a data value. In the lower graph, the estimate for the 0.75 quantile is X=3.5, which is neither a data value nor the average of adjacent values.
  • The 0.95 quantile (the 95th percentile) estimates are different. In the upper graph, a horizontal line at 0.95 crosses the empirical CDF at X=8, which is the maximum data value. In the lower graph, the estimate for the 0.95 quantile is X=6.65, which is between the two largest data values.

Comments on the CDF estimates

The Type=2 method (the default in SAS) uses an empirical CDF (ECDF) to estimate the population quantiles. The ECDF has a long history of being used for fitting and comparing distributions. For example, the Kolmogorov-Smirnov test uses the ECDF to compute nonparametric goodness-of-fit tests. When you use the ECDF, a quantile is always an observed data value or the average of two adjacent data values.

The Type=7 method (the default in R) uses a piecewise-linear estimate to the CDF. There are many ways to create a piecewise-linear estimate, and there have been many papers (going back to the 1930's) written about the advantages and disadvantages of each choice. In Hyndman and Fan's taxonomy, six of the nine methods use piecewise-linear estimates. Some people prefer the piecewise-linear estimates because the inverse CDF is continuous: a small change to the probability value (such as 0.9 to 0.91) results in a small change to the quantile estimates. This is property is not present in the methods that use the ECDF.

A function to compute the default sample quantiles in R

Back in 2017, I wrote a SAS/IML function that can compute all of the common definitions of sample quantiles. If you only want to compute the default (Type=7) definition in R, you can use the following simpler function:

proc iml;
/* By default, R (and Julia, and some Python packages) uses
   Hyndman and Fan's Type=7 definition. Compute Type=7 sample quantiles.
*/
start GetQuantile7(y, probs);
   x = colvec(y);
   N = nrow(x);
   if N=1 then return (y);  /* handle the degenerate case, N=1 */
 
   /* remove missing values, if any */
   idx = loc(x^=.);
   if ncol(idx)=0 then  
      return (.);           /* all values are missing */
   else if ncol(idx)<N then do;
      x = x[idx,]; N = nrow(x);  /* remove missing */
   end;
 
   /* Main computation: Compute Type=7 sample quantile. 
      Estimate is a linear interpolation between x[j] and x[j+1]. */
   call sort(x);
   p = colvec(probs);
   m = 1-p;
   j = floor(N*p + m);      /* indices into sorted data values */
   g = N*p + m - j;         /* 0 <= g <= 1 for interpolation */
 
   q = j(nrow(p), 1, x[N]); /* if p=1, estimate by x[N]=max(x) */
   idx = loc(p < 1);
   if ncol(idx) >0 then do;
      j = j[idx]; g = g[idx];
      q[idx] = (1-g)#x[j] + g#x[j+1]; /* linear interpolation */
   end;   
   return q;
finish;
 
/* Compare the SAS and R default definitions.
   The differences between definitions are most apparent 
   for small samples that have large gaps between adjacent data values. */
x = {0 1 1 1 2 2 2 4 5 8}`;
prob = {0.5, 0.75, 0.9, 0.95};
 
call qntl(SASDefaultQntl, x, prob);
RDefaultQntl = GetQuantile7(x, prob);
print prob SASDefaultQntl RDefaultQntl;

The table shows some of the quantiles that were discussed previously. If you choose prob to be evenly spaced points in [0,1], you get the values on the graphs shown previously.

Summary

There are many ways to estimate quantiles. Hyndman and Fan (1996) list nine common definitions. By default, SAS uses the Type=2 method, where R (and other software) uses the Type=7 method. SAS procedures support five of the nine common definitions of sample quantiles, and you can use SAS/IML to compute the remaining definitions. To make it easy to reproduce the default values of sample quantiles from other software, I have written a SAS/IML function that computes the Type=7 quantiles.

If you do not have SAS/IML software, but you want to compute estimates that are based on a piecewise-linear estimate of the CDF, I suggest you use the QNTLDEF=1 option in PROC UNIVARIATE or PROC MEANS. This produces the Type=4 method in Hyndman and Fan's taxonomy. For more information about the quantile definitions that are natively available in SAS procedures, see "Quantile definitions in SAS."

The post Compare the default definitions for sample quantiles in SAS, R, and Python appeared first on The DO Loop.

3月 192020
 

At SAS Press, we agree with the saying “The best things in life are free.” And one of the best things in life is knowledge. That’s why we offer free e-books to help you learn SAS or improve your skills. In this blog post, we will introduce you to one of our amazing titles that is absolutely free.

SAS Programming for R Users

Many data scientists today need to know multiple programming languages including SAS, R, and Python. If you already know basic statistical concepts and how to program in R but want to learn SAS, then SAS Programming for R Users by Jordan Bakerman was designed specifically for you! This free e-book explains how to write programs in SAS that replicate familiar functions and capabilities in R. This book covers a wide range of topics including the basics of the SAS programming language, how to import data, how to create new variables, random number generation, linear modeling, Interactive Matrix Language (IML), and many other SAS procedures. This book also explains how to write R code directly in the SAS code editor for seamless integration between the two tools.

The book is based on the free, 14-hour course of the same name offered by SAS Education available here. Keep reading to learn more about the differences between SAS and R.

SAS versus R

R is an object-oriented programming language. Results of a function are stored in an object and desired results are pulled from the object as needed. SAS revolves around the data table and uses procedures to create and print output. Results can be saved to a new data table.

Let’s briefly compare SAS and R in a general way. Look at the following table, which outlines some of the major differences between SAS and R.

Here are a few other things about SAS to note:

  • SAS has the flexibility to interact with objects. (However, the book focuses on procedural methods.)
  • SAS does not have a command line. Code must be run in order to return results.

SAS Programs

A SAS program is a sequence of one or more steps. A step is a sequence of SAS statements. There are only two types of steps in SAS: DATA and PROC steps.

  • DATA steps read from an input source and create a SAS data set.
  • PROC steps read and process a SAS data set, often generating an output report. Procedures can be called an umbrella term. They are what carry out the global analysis. Think of a PROC step as a function in R.

Every step has a beginning and ending boundary. SAS steps begin with either of the following statements:

  • a DATA statement
  • a PROC statement

After a DATA or PROC statement, there can be additional SAS statements that contain keywords that request SAS perform an operation or they can give information to the system. Think of them as additional arguments to a procedure. Statements always end with a semicolon!

SAS options are additional arguments and they are specific to SAS statements. Unfortunately, there is no rule to say what is a statement versus what is an option. Understanding the difference comes with a little bit of experience. Options can be used to do the following:

  • generate additional output like results and plots
  • save output to a SAS data table
  • alter the analytical method

SAS detects the end of a step when it encounters one of the following statements:

  • a RUN statement (for most steps)
  • a QUIT statement (for some procedures)

Most SAS steps end with a RUN statement. Think of the RUN statement as the right parentheses of an R function. The following table shows an example of a SAS program that has a DATA step and a PROC step. You can see that both SAS statements end with RUN statements, while the R functions begin and end with parentheses.

If you want to learn more about this book or any other free e-books from SAS Press, visit https://support.sas.com/en/books/free-books.html. Subscribe to our newsletter to get the latest information on new books.

Free e-book: SAS Programming for R Users was published on SAS Users.

12月 042019
 

Site relaunches with improved content, organization and navigation.

In 2016, a cross-divisional SAS team created developer.sas.com. Their mission: Build a bridge between SAS (and our software) and open source developers.

The initial effort made available basic information about SAS® Viya® and integration with open source technologies. In June 2018, the Developer Advocate role was created to build on that foundation. Collaborating with many of you, the SAS Communities team has improved the site by clarifying its scope and updating it consistently with helpful content.

Design is an iterative process. One idea often builds on another.

-- businessman Mark Parker

The team is happy to report that recently developer.sas.com relaunched, with marked improvements in content, organization and navigation. Please check it out and share with others.

New overview page on developer.sas.com

The developer experience

The developer experience goes beyond the developer.sas.com portal. The Q&A below provides more perspective and background.

What is the developer experience?

Think of the developer experience (DX) as equivalent to the user experience (UX), only the developer interacts with the software through code, not points and clicks. Developers expect and require an easy interface to software code, good documentation, support resources and open communication. All this interaction occurs on the developer portal.

What is a developer portal?

The white paper Developer Portal Components captures the key elements of a developer portal. Without going into detail, the portal must contain (or link to) these resources: an overview page, onboarding pages, guides, API reference, forums and support, and software development kits (SDKs). In conjunction with the Developers Community, the site’s relaunch includes most of these items.

Who are these developers?

Many developers fit somewhere in these categories:

  • Data scientists and analysts who code in open source languages (mainly Python and R in this case).
  • Web application developers who create apps that require data and processing from SAS.
  • IT service admins who manage customer environments.

All need to interact with SAS but may not have written SAS code. We want this population to benefit from our software.

What is open source and how is SAS involved?

Simply put, open source software is just what the name implies: the source code is open to all. Many of the programs in use every day are based on open source technologies: operating systems, programming languages, web browsers and servers, etc. Leveraging open source technologies and integrating them with commercial software is a popular industry trend today. SAS is keeping up with the market by providing tools that allow open source developers to interact with SAS software.

What is an API?

All communications between open source and SAS are possible through APIs, or application programming interfaces. APIs allow software systems to communicate with one another. Software companies expose their APIs so developers can incorporate functionality and send or request data from the software.

Why does SAS care about APIs?

APIs allow the use of SAS analytics outside of SAS software. By allowing developers to communicate with SAS through APIs, customer applications easily incorporate SAS functions. SAS has created various libraries to aid in open source integration. These tools allow developers to code in the language of their choice, yet still interface with SAS. Most of these tools exist on github.com/sassoftware or on the REST API guides page.

A use case for SAS APIs

A classic use of SAS APIs is for a loan default application. A bank creates a model in SAS that determines the likelihood of a customer defaulting on a loan based on multiple factors. The bank also builds an application where a bank representative enters the information for a new potential customer. The bank application code uses APIs to communicate this information to the SAS model and return a credit decision.

What is a developer advocate?

A developer advocate is someone who helps developers succeed with a platform or technology. Their role is to act as a bridge between the engineering team and the developer community. At SAS, the developer advocate fields questions and comments on the Developers Community and works with R&D to provide answers. The administration of developer.sas.com also falls under the responsibility of the developer advocate.

We’re not done

The site will continue to evolve, with additions of other SAS products and offerenings, and other initiatives. Check back often to see what’s new.
Now that you are an open source and SAS expert, please check out the new developer.sas.com. We encourage feedback and suggestions for content. Leave comments and questions on the site or contact Joe Furbee: joe.furbee@sas.com.

developer.sas.com 2.0: More than just a pretty interface was published on SAS Users.

3月 072018
 

The R SWAT package (SAS Wrapper for Analytics Transfer) enables you to upload big data into an in-memory distributed environment to manage data and create predictive models using familiar R syntax. In the SAS Viya Integration with Open Source Languages: R course, you learn the syntax and methodology required to [...]

The post Use R to interface with SAS Cloud Analytics Services appeared first on SAS Learning Post.

5月 242017
 

According to Hyndman and Fan ("Sample Quantiles in Statistical Packages," TAS, 1996), there are nine definitions of sample quantiles that commonly appear in statistical software packages. Hyndman and Fan identify three definitions that are based on rounding and six methods that are based on linear interpolation. This blog post shows how to use SAS to visualize and compare the nine common definitions of sample quantiles. It also compares the default definitions of sample quantiles in SAS and R.

Definitions of sample quantiles

Suppose that a sample has N observations that are sorted so that x[1] ≤ x[2] ≤ ... ≤ x[N], and suppose that you are interested in estimating the p_th quantile (0 ≤ p ≤ 1) for the population. Intuitively, the data values near x[j], where j = floor(Np) are reasonable values to use to estimate the quantile. For example, if N=10 and you want to estimate the quantile for p=0.64, then j = floor(Np) = 6, so you can use the sixth ordered value (x[6]) and maybe other nearby values to estimate the quantile.

Hyndman and Fan (henceforth H&F) note that the quantile definitions in statistical software have three properties in common:

  • The value p and the sample size N are used to determine two adjacent data values, x[j]and x[j+1]. The quantile estimate will be in the closed interval between those data points. For the previous example, the quantile estimate would be in the closed interval between x[6] and x[7].
  • For many methods, a fractional quantity is used to determine an interpolation parameter, λ. For the previous example, the fraction quantity is (Np - j) = (6.4 - 6) = 0.4. If you use λ = 0.4, then an estimate the 64th percentile would be the value 40% of the way between x[6] and x[7].
  • Each definition has a parameter m, 0 ≤ m ≤ 1, which determines how the method interpolates between adjacent data points. In general, the methods define the index j by using j = floor(Np + m). The previous example used m=0, but other choices include m=0.5 or values of m that depend on p.

Thus a general formula for quantile estimates is q = (1 - λ) x[j]+ λ x[j+1], where λ and j depend on the values of p, N, and a method-specific parameter m.

You can read Hyndman and Fan (1986) for details or see the Wikipedia article about quantiles for a summary. The Wikipedia article points out a practical consideration: for values of p that are very close to 0 or 1, some definitions need to be slightly modified. For example, if p < 1/N, the quantity Np < 1 and so j = floor(Np) equals 0, which is an invalid index. The convention is to return x[1] when p is very small and return x[N] when p is very close to 1.

Compute all nine sample quantile definitions in SAS

SAS has built-in support for five of the quantile definitions, notably in PROC UNIVARIATE, PROC MEANS, and in the QNTL subroutine in SAS/IML. You can use the QNTLDEF= option to choose from the five definitions. The following table associates the five QNTLDEF= definitions in SAS to the corresponding definitions from H&F, which are also used by R. In R you choose the definition by using the type parameter in the quantile function.

SAS definitions of sample quantiles

It is straightforward to write a SAS/IML function to compute the other four definitions in H&F. In fact, H&F present the quantile interpolation functions as specific instances of one general formula that contains a parameter, which they call m. As mentioned above, you can also define a small value c (which depends on the method) such that the method returns x[1] if p < c, and the method returns x[N] if p ≥ 1 - c.

The following table presents the parameters for computing the four sample quantile definitions that are not natively supported in SAS:

Definitions of sample quantiles that are not natively supported in SAS

Visualizing the definitions of sample quantiles

Visualization of nine defniitions of sample quantiles, from Hyndman and Fan (1996)

You can download the SAS program that shows how to compute sample quantiles and graphs for any of the nine definitions in H&F. The differences between the definitions are most evident for small data sets and when there is a large "gap" between one or more adjacent data values. The following panel of graphs shows the nine sample quantile methods for a data set that has 10 observations, {0 1 1 1 2 2 2 4 5 8}. Each cell in the panel shows the quantiles for p = 0.001, 0.002, ..., 0.999. The bottom of each cell is a fringe plot that shows the six unique data values.

In these graphs, the horizontal axis represents the data and quantiles. For any value of x, the graph estimates the cumulative proportion of the population that is less than or equal to x. Notice that if you turn your head sideways, you can see the quantile function, which is the inverse function that estimates the quantile for each value of the cumulative probability.

You can see that although the nine quantile functions have the same basic shape, the first three methods estimate quantiles by using a discrete rounding scheme, whereas the other methods use a continuous interpolation scheme.

You can use the same data to compare methods. Instead of plotting each quantile definition in its own cell, you can overlay two or more methods. For example, by default, SAS computes sample quantiles by using the type=2 method, whereas R uses type=7 by default. The following graph overlays the sample quantiles to compare the default methods in SAS and R on this tiny data set. The default method in SAS always returns a data value or the average of adjacent data values; the default method in R can return any value in the range of the data.

Comparison of the default  quantile estimates in SAS and R on a tiny data set

Does the definition of sample quantiles matter?

As shown above, different software packages use different defaults for sample quantiles. Consequently, when you report quantiles for a small data set, it is important to report how the quantiles were computed.

However, in practice analysts don't worry too much about which definition they are using because the difference between methods is typically small for larger data sets (100 or more observations). The biggest differences are often between the discrete methods, which always report a data value or the average between two adjacent data values, and the interpolation methods, which can return any value in the range of the data. Extreme quantiles can also differ between the methods because the tails of the data often have fewer observations and wider gaps.

The following graph shows the sample quantiles for 100 observations that were generated from a random uniform distribution. As before, the two sample quantiles are type=2 (the SAS default) and type=7 (the R default). At this scale, you can barely detect any differences between the estimates. The red dots (type=7) are on top of the corresponding blue dots (type=2), so few blue dots are visible.

Comparison of the default  quantile estimates in SAS and R on a larger data set

So does the definition of the sample quantile matter? Yes and no. Theoretically, the different methods compute different estimates and have different properties. If you want to use an estimator that is unbiased or one that is based on distribution-free computations, feel free to read Hyndman and Fan and choose the definition that suits your needs. The differences are evident for tiny data sets. On the other hand, the previous graph shows that there is little difference between the methods for moderately sized samples and for quantiles that are not near gaps. In practice, most data analysts just accept the default method for whichever software they are using.

In closing, I will mention that there are other quantile estimation methods that are not simple formulas. In SAS, the QUANTREG procedure solves a minimization problem to estimate the quantiles. The QUANTREG procedure enables you to not only estimate quantiles, but also estimate confidence intervals, weighted quantiles, the difference between quantiles, conditional quantiles, and more.

SAS program to compute nine sample quantiles.

The post Sample quantiles: A comparison of 9 definitions appeared first on The DO Loop.

7月 312015
 

Last week, SAS released the 14.1 version of its analytics products, which are shipped as part of the third maintenance release of 9.4. If you run SAS/IML programs from a 64-bit Windows PC, you might be interested to know that you can now create matrices with about 231 ≈ 2 billion elements, provided that your system has enough RAM. (On Linux operating systems, this feature has been available since SAS 9.3.)

A numerical matrix with 2 billion elements requires 16 GB of RAM. In terms of matrix dimensions, this corresponds to a square numerical matrix that has approximately 46,000 rows and columns. I've written a handy SAS/IML program to determine how much RAM is required to store a matrix of a given size.

If you are running 64-bit SAS on Windows, this article describes how to set an upper limit for the amount of memory that SAS allocate for large matrices.

The MEMSIZE option

The amount of memory that SAS can allocate depends on the value of the MEMSIZE system option, which has a default value of 2GB on Windows. Many SAS sites do not override the default value, which means that SAS cannot allocate more than 2 GB of system memory.

You can run PROC OPTIONS to display the current value of the MEMSIZE option.

proc options option=memsize value;
run;
Option Value Information For SAS Option MEMSIZE
    Value: 2147483648
    Scope: SAS Session
    How option value set: Config File
    Config file name:
            C:Program FilesSASHomeSASFoundation9.4nlsensasv9.cfg

The value 2,147,483,648 is shown in the SAS log. The value is unfortunately in bytes. This number corresponds to 2 GB. Unless you change the MEMSIZE option, you will not be able to allocate a square matrix with more than about 16,000 rows and columns. For example, unless SAS can allocate 5 GB or more of RAM, the following SAS/IML program will produce an error message:

proc iml; 
/* allocate 25,000 x 25,000 matrix, which requires 4.7 GB */
x = j(25000, 25000, 0);


ERROR: Unable to allocate sufficient memory.

You can use the MEMSIZE system option to permit SAS to allocate a greater amount of system memory. SAS does not grab this memory and hold onto it. Instead, the MEMSIZE option specifies a maximum value for dynamic allocations.

The MEMSIZE option only applies when you launch SAS, so if SAS is currently running, save your work and exit SAS before continuing.

Changing the command-line invocation for SAS

If you run SAS locally on your PC, you can add the -MEMSIZE command-line option to the shortcut that you use to invoke SAS. This example uses "12G" to permit SAS to allocate up to 12 GB of RAM, but you can use different numbers, such as 8G or 16G.

  1. Locate the "SAS 9.4" icon on your Desktop or the "SAS 9.4" item on the Start menu.
  2. Right-click on the shortcut and select Properties
  3. A dialog box appears. Edit the Target field and insert -MEMSIZE 12G at the end of the current text, as shown in the image.

  4. memsize
  5. Click OK.

Every time you use this shortcut to launch SAS, the SAS process can allocate up to 12 GB of RAM. You can also specify -MEMSIZE 0, which permits allocations up to 80% of the available RAM. Personally, I do not use -MEMSIZE 0 because it permits SAS to consume most of the system memory, which does not leave much for other applications. I rarely permit SAS to use more than 75% of my RAM.

After editing the shortcut, launch SAS and call PROC OPTIONS. This time you should see something like the following:

Option Value Information For SAS Option MEMSIZE
    Value: 12884901888
    Scope: SAS Session
    How option value set: SAS Session Startup Command Line

SAS configuration files

A drawback of the command-line approach is that it only applies to a SAS session that is launched from the shortcut that you modified. In particular, it does not apply to launching SAS by double-clicking on a .sas or .sas7bdat file.

An alternative is to create or edit a configuration file. The SAS documentation has long and complete instructions about how to edit the sasv9.cfg file that sets the system options for SAS when SAS is launched.

SAS 9 creates two default configuration files during installation. Both configuration files are named SASV9.CFG. I suggest that you edit the one in !SASHOMESASFoundation9.4, which on many installations is c:program filesSASHomeSASFoundation9.4. By default, that configuration file has a -CONFIG option that points to a language-specific configuration file. Put the -MEMSIZE option and any other system options after the -CONFIG option, as follows:

-config "C:Program FilesSASHomeSASFoundation9.4nlsensasv9.cfg"
-RLANG
-MEMSIZE 12G

Notice that I also put the -RLANG option in this sasv9.cfg file. The -RLANG system option specifies that SAS/IML software can interface with the R language.

If you now double-click on a .sas file to launch SAS, PROC OPTIONS reports the following information:

Option Value Information For SAS Option MEMSIZE
    Value: 12884901888
    Scope: SAS Session
    How option value set: Config File
    Config file name:
            C:Program FilesSASHomeSASFoundation9.4SASV9.CFG

If you add multiple system options to the configuration file, you might want to go back to the SAS 9.4 Properties dialog box (in the previous section) and edit the Target value to point to the configuration file that you just edited.

Remote SAS servers

If you connect to a remote SAS server and submit SAS/IML programs through SAS/IML Studio, SAS Enterprise Guide, or SAS Studio, a SAS administrator has probably provided a configuration file that specifies how much RAM can be allocated by your SAS process. If you need a larger limit, discuss the situation with your SAS administrator.

Final thoughts on big matrices

You can create SAS/IML matrices that have millions of rows and hundreds of columns. However, you need to recognize that many matrix computations scale cubically with the number of elements in the matrix. For example, many computations on an n x n matrix require on the order of n3 floating point operations. Consequently, although you might be able to create extremely large matrices, computing with them can be very time consuming.

In short, allocating a large matrix is only the first step. The wise programmer will time a computation on a sequence of smaller problems as a way of estimating the time required to tackle The Big Problem.

tags: 14.1, R, SAS Programming

The post Large matrices in SAS/IML 14.1 appeared first on The DO Loop.

5月 132015
 

I didn’t play with SAS/IML for a while. I call it back when I need to read some R format data.

Technically, .Rdata is not a data format. It’s rather a big container to hold bunch of R objects:

Rdata

In this example, when a .Rdata is loaded, 3 objects are included where ‘data’(the ‘real’ data) and ‘desc’ (data description portion) are of our interests.

SAS/IML offers a nice interface to call R command which can be used to read the R format data:

proc iml;
submit / R;
load(“C:/data/w5/R data sets for 5e/GPA1.RData”)
endsubmit;

    call ImportDataSetFromR(“work.GPA1″, “data“);
call ImportDataSetFromR(“work.GPA1desc”, “desc“);

quit;

data _null_;
set GPA1desc end = eof ;
i+1;
II=left(put(i,3.));
call symputx(‘var’||II,variable);
call symputx(‘label’||II,label);
if eof then call symputx(‘n’,II);
run;

%macro labelit;
data gpa1;
set gpa1;

    label
%do i=1 %to &n;
&&var&i = &&label&i
%end;
;
run;
%mend;

%labelit

4月 022014
 

Last year I gave a talk in SESUG 2013 on list manipulation on SAS using a collection of function-like macros. Today I just explored in my recently upgraded SAS 9.4 that I can play with list natively, which means I can create a list, slice a list and do other list operations in Data Steps! This is not documented yet(which means it will not be supported by the software vendor) and I can see warning message in Log window like “WARNING: List object is preproduction in this release”,  and it is still limited somehow, so use it in your own risk (and of course, fun).  Adding such versatile list object will definitely make SAS programmers more powerful. I will keep watch its further development.

*************Update********

Some readers emailed to me that they can’t get the expected results as I did here. I think it’s best to check your own system:

I. Make sure you use the latest SAS software. I only tested on a 64-bit Window 7 machine with SAS 9.4 TS1M1:

SAS94

II. Make sure all hotfixes were applied (You can use this SAS Hot Fix Analysis, Download and Deployment Tool).

hotfix

*************Update End********

The followings are some quick plays and I will report more after more research:

1. Create a List

It’s easy to create a list:

data _null_;
a = ['apple', 'orange', 'banana'];
put a;
run;

the output in Log window:

list1

You can also transfer a string to a list:

data _null_;
a = ‘SAS94′
b =list(a)
put b;
run;

list2

2. Slice a List

Slicing a list is also pretty straightforward, like in R and Python:

data _null_;
a = ['apple', 'orange', 'banana'];
b = a[0];
c = a[:-1];
d = a[1:2];
put a;
put b;
put c;
put d;
run;

list3

3. List is Immutable in SAS!?

I felt much confortable to play list operations in SAS but a weird thing just happened. I tried to change a value in a list:

data _null_;
a = ['apple', 'orange', 'banana'];
a[0] = ‘Kivi’;
put a;
run;

Unexpectedly, I got an error:

list4

hhh, I need to create a new list to hold such modification? This is funny.

Based on my quick exploration, the list object in SAS is pretty intuitive from a programmers’ point of view. But since it’s undocumented and I don’t know how long it will stay in “preproduction” phase,  just be careful to implement it in your production work.

Personally I feel very exciting to “hack” such wonderful list features in SAS 9.4. If well implemented, it will easily beat R and Python (which claim themselves supporting rich data types and objects) as a scripting language for SAS programmers. I will keep update in this page.

12月 312013
 
Just some tips:

options(error=recover):  it will tell R to launch debug section and you can choose which one to debug


options(show.error.locations=TRUE):   let R show the source line number

Something else:

use traceback() to locate where the last error message is and then use browser() to run the function again to check what is wrong.