sas programming

11月 122019
 

The t-test is a very useful test that compares one variable (perhaps blood pressure) between two groups. T-tests are called t-tests because the test results are all based on t-values. T-values are an example of what statisticians call test statistics. A test statistic is a standardized value that is calculated from sample data during a hypothesis test. It is used to determine whether there is a significant difference between the means of two groups. With all inferential statistics, we assume the dependent variable fits a normal distribution. When we assume a normal distribution exists, we can identify the probability of a particular outcome. The procedure that calculates the test statistic compares your data to what is expected under the null hypothesis. There are several SAS Studio tasks that include options to test this assumption. Let's use the t-test task as an example.

You start by selecting:

Tasks and Utilities □ Tasks □ Statistics □ t Tests

On the DATA tab, select the Cars data set in the SASHELP library. Next request a Two-sample test, with Horsepower as the Analysis variable and Cylinders as the Groups variable. Use a filter to include only 4- or 6-cylinder cars. It should look like this:

On the OPTIONS tab, check the box for Tests for normality as shown below.

All the tests for normality for both 4-cylinder and 6-cylinder cars reject the null hypothesis that the data values come from a population that is normally distributed. (See the figure below.)

Should you abandon the t-test results and run a nonparametric test analysis such as a Wilcoxon Rank Sum test that does not require normal distributions?

This is the point where many people make a mistake. You cannot simply look at the results of the tests for normality to decide if a parametric test is valid or not. Here is the reason: When you have large sample sizes (in this data set, there were 136 4-cylinder cars and 190 6-cylinder cars), the tests for normality have more power to reject the null hypothesis and often result in p-values less than .05. When you have small sample sizes, the tests for normality will not be significant unless there are drastic departures from normality. It is with small sample sizes where departures from normality are important.

The bottom line is that the tests for normality often lead you to make the wrong decision. You need to look at the distributions and decide if they are somewhat symmetrical. The central limit theory states that the sampling distribution of means will be normally distributed if the sample size is sufficiently large. "Sufficiently large" is a judgment call. If the distribution is symmetrical, you may perform a t-test with sample sizes as small as 10 or 20.

The figure below shows you the distribution of horsepower for 4- and 6-cylinder cars.

With the large sample sizes in this data set, you should feel comfortable in using a t-test. The results, shown below, are highly significant.

If you are in doubt of your decision to use a parametric test, feel free to check the box for a nonparametric test on the OPTIONS tab. Running a Wilcoxon Rank Sum test (a nonparametric alternative to a t-test), you also find a highly significant difference in horsepower between 4- and 6-cylinder cars. (See the figure below.)

You can read more about assumptions for parametric tests in my new book, A Gentle Introduction to Statistics Using SAS Studio.

For a sneak preview check out the free book excerpt. You can also learn more about SAS Press, check out the up-and-coming titles, and receive exclusive discounts. To do so, make sure to subscribe to the newsletter.

Testing the Assumption of Normality for Parametric Tests was published on SAS Users.

11月 062019
 

I have been programming SAS for a LONG time and have never seen much in the way of programming standards. For example, most SAS programmers indent DATA and PROC statements (I like three spaces). Most programmers do not like to see more than one statement on a line and most agree that there should be blank lines between program boundaries (DATA and PROC steps).

I thought I would share some of my thoughts on programming standards, with the hope that others will chime in with their ideas.

    • I like to indent all the statements in a DO group or DO loop. If there are nested groups, each one gets indented as well.
    • I prefer variable names in proper case.
    • I am not a fan of camel-case. For example, I prefer Weight_Kg to WeightKg. The reason that some programmers like camel-case is that SAS will automatically split a variable name at a capital letter in some headings.
    • I like my TITLE statements in open code, not inside a PROC. To me, that makes sense because TITLE statements are global.
    • There should be no conversion messages (character to numeric or numeric to character) in the SAS log. For example use Num = INPUT(Char_Num,12.); instead of Num = 1*Char_Num;. The latter statement forces an automatic character to numeric conversion and places a message in the log.
    • I always use the statement ODS NOPROCTITLE;. This eliminates the default SAS procedure name at the top of the output.
    • Although fewer and fewer people are reading raw text data, I like my @ signs to all line up in my INPUT statement.
    • I like to use the /* and */ comments to define all macro variables. For example:

Notice that I prefer named parameters in my macros, instead of positional parameters.

If this seems like too much work - SAS Studio has an automatic formatting tool that can help standardize your programs. For example, look at the code below:

Really ugly, right? Here is how you can use the automatic formatting tool in SAS Studio.

When you click this icon, the program now looks like this:

That’s pretty much the way I would write it. By the way, if you don't like how Studio formatted your code, enter a control-z to undo it.

For more tips on writing code and how to get started in SAS Studio – check out my book, Learning SAS by Example: A Programmer’s Guide, Second Edition. You can also download a free book excerpt. To also learn more about SAS Press, check out the up-and-coming titles, and receive exclusive discounts make sure to subscribe to the SAS Books newsletter.

Making your SAS code more readable was published on SAS Users.

10月 292019
 

Thank you to Lora Delwiche and Susan Slaughter for providing the following information:

Six editions is a lot! If you had told us back when we wrote the first edition of The Little SAS Book that someday we would write a sixth, we would have wondered how we could possibly find that much to say. After all, it is supposed to be The Little SAS Book, isn’t it? But the developers at SAS are constantly hard at work inventing new and better ways of analyzing and visualizing data. And some of those ways turn out to be so fundamental that they belong even in a little book about SAS.

Interface independence

One of the biggest changes to SAS software in recent years is the proliferation of interfaces. SAS programmers have more choices than ever before. Previous editions contained some sections specific to the SAS windowing environment (also called Display Manager). We wrote this edition for all SAS programmers whether you use SAS Studio, SAS Enterprise Guide, the SAS windowing environment, or run in batch. That sounds easy, but it wasn’t. There are differences in how SAS behaves with different interfaces, and these differences can be very fundamental. In particular, the system option that sets the rules for names of variables varies depending on how you run SAS. So old sections had to be rewritten, and we added a whole new section showing how to use variable names containing blanks and special characters.

New ways to read and write Microsoft Excel files

Previous editions already covered how to read and write Microsoft Excel files, but SAS developers have created new ways that are even better. This edition contains new sections about the XLSX LIBNAME engine and the ODS EXCEL destination.

More PROC SQL

From the very first edition, The Little SAS Book always covered PROC SQL. But it was in an appendix, and over time we noticed that most people ignore appendices. So for this edition, we removed the appendix and added new sections on using PROC SQL to:

• Subset your data
• Join data sets
• Add summary statistics to a data set
• Create macro variables with the INTO clause

For people who are new to SQL, these sections provide a good introduction; for people who already know SQL, they provide a model of how to leverage SQL in your SAS programs.

Updates and additions throughout the book

Almost every section in this edition has been changed in some way. We added new options, made sure everything is up-to-date, and ran every example in every SAS interface noting any differences. For example, PROC SGPLOT has some new options, the default ODS style for PDF has changed, and the LISTING destination behaves differently in different interfaces. Here’s a short list, in no particular order, of new or expanded topics in the sixth edition:

• More examples with permanent SAS data sets, CSV files, or tab-delimited files
• More log notes throughout the book showing what to look for
• LIKE or sounds-like (=*) operators in WHERE statements
• CROSSLIST, NOCUM, and NOPRINT options in PROC FREQ
• Grouping data with a user-defined format and the PUT function
• Iterative DO groups
• DO WHILE and DO UNTIL statements
• %DO statements

Even though we have added a lot to this edition, it is still a little book. In fact, this edition is shorter than the last—by 12 pages! We think this is the best edition yet. For a sneak preview check out the free book excerpt. You can also learn more about SAS Press, check out the up-and-coming titles, and to exclusive discounts -- make sure to subscribe to the newsletter.

The Little SAS Book 6.0: The best-selling SAS book gets even better was published on SAS Users.

9月 232019
 

Although I do not typically blog about undocumented SAS options, I'll make an exception this time. For many years, I have known that the CONTENTS and COMPARE procedures support the BRIEF and SHORT options, but I always forget which option goes with which procedure. For the record, here are the documented options:

SAS provides an autocomplete feature that programmers can use to remind themselves which options are supported in each procedure. Both SAS Studio and SAS Enterprise Guide support the autocomplete feature, which displays a list of options as you type SAS code. Unfortunately, I turn off that feature because I find it distracting. Consequently, I cannot remember whether to use SHORT or BRIEF.

My "solution" to this dilemma is to randomly guess an option, run the program, and fix the error if I guessed wrong. I should be wrong 50% of the time, but I noticed that I am wrong only about 25% of the time. Upon investigating, I discovered that some seemingly "wrong" code actually works. Although it is not documented, it turns out that the COMPARE procedure supports the SHORT option as an alias for BRIEFSUMMARY.

This is awesome! No longer do I need to randomly guess the option. I can use the SHORT option in both PROC CONTENTS and PROC COMPARE! As a bonus, the SHORT option is also supported by7 PROC OPTIONS and PROC DATASETS.

If you have never used the SHORT option before, it's a great time saver because it produces condensed output, as shown in the following examples. For PROC CONTENTS, I like to use the options VARNUM SHORT when I want to display a list of the variables in a data set:

proc contents data=Sashelp.Cars varnum SHORT;      /* SHORT is documented option */
run;

For PROC COMPARE, I use the undocumented SHORT option (which is an alias for BRIEF) when I want to display whether two data sets are equal:

data cars; set Sashelp.cars; run;
 
proc compare base=sashelp.cars compare=cars SHORT; /* SHORT is not documented, but it works */
run;

As I mentioned, the SHORT option is also supported in PROC OPTIONS. For completeness, here is an example that writes the values of several options to the SAS log:

proc options option=(linesize pagesize memsize _LAST_) SHORT; run;    /* SHORT is documented option */
 LINESIZE=75 PAGESIZE=24 MEMSIZE=17179869184 _LAST_=WORK.CARS

In short (see what I did there?), you can use the SHORT option in several Base SAS procedures. If you can't remember that the SHORT option applies to PROC CONTENTS and the BRIEF option applies to PROC COMPARE, just use the SHORT option in both procedures and in PROC OPTIONS, too.

The post Use the SHORT option in Base SAS procedures to reduce output appeared first on The DO Loop.

9月 162019
 

SAS SPDS is lightning fastJust when you think you’ve seen it all, life can surprise you in a big way, making you wonder what else you've missed.

That is what happened when I recently had a chance to work with the SAS® Scalable Performance Data Server, a product that's been around a while, but I never crossed paths with before. I open an SPDS table of a hundred million records in SAS® Enterprise Guide, and I can scroll it as fast as if it were an Excel “baby” spreadsheet of a hundred rows. That’s how powerful it feels, to say nothing about the lighting speed of the queries.

What is the SAS Scalable Performance Data Server?

Also known as the SAS SPD Server (or SPDS), it's a data storage system designed for high-performance data delivery. Its primary purpose is to provide rapid queries of vast amounts of data. We are talking terabytes of data with tables containing billions of rows. SPDS employs parallel storage and efficient indexing, coupled with a multi-threaded server system concurrently processing tasks distributed across multiple processors.

Availability of the SPDS client in SAS® Viya effectively integrates SAS SPDS with SAS Viya, extending functionality of its applications beyond the native Cloud Analytic Services (CAS) where you can continue reaping all the benefits of the SAS SPDS.

SPDS library

In addition to connecting to SPD Server with explicit SQL pass-through, connection to SPD Server with a LIBNAME statement is available as well, for example:

libname mylibref sasspds 'serverdomain' host='nodename_or_ip' service='5400'
                         user='mySPDuserid' password='{SAS003}XXXXXXX...XXX';

This effectively creates an SPDS library, and the tables in that library can be referenced by two-level name mylibref.tablename as if this were a SAS BASE library.

Cluster tables vs. member tables

Besides ordinary data tables, SPDS library offers so called dynamic cluster tables – or clusters for short – enabling transparent access to large amounts of data.

Dynamic cluster tables (cluster tables or clusters) are virtual tables that allow users to access many server tables (member tables) as if they were one table. A dynamic cluster table is a collection of SPD Server tables that are presented to the end-user application as a single table through a metadata layer acting like a view.

Member tables can be added to the cluster as well as replaced and removed from the cluster.

The role of PROC SPDO

PROC SPDO is the SAS procedure for the SPD Server operator interface. It performs a wide range of SPD server, user and table management tasks:

  • create, list, modify, destroy, and undo dynamic cluster tables
  • add, remove, replace, and fix cluster table members
  • add, modify, list, and delete access control lists (ACLs) for server resources
  • define, describe, and remove WHERE constraints on tables for row-level security definition and management
  • issue system commands on server nodes

In addition to PROC SPDO, SPD Server plug-in for SAS® Data Management Console is also available.

Retrieving SPDS library contents

If you open an SPDS library in SAS Enterprise Guide, you won’t be able to tell which table in that library is a member table and which is a cluster table – they all look the same. But in many cases, we need to know what is what. Moreover, for data-driven processing we need to capture the SPDS library objects into a dataset and identify them whether they are clusters or member tables.

Luckily, PROC CONTENTS with OUT= option allows us to do just that. While MEMTYPE column is equal to ‘DATA’ for both, clusters and member tables, there is another, less known column inversely called TYPEMEM that has value of 'DATA' for clusters and blank value ' ' for member tables. The following simple code allows you to retrieve SPDS library objects list into WORK.SPDSTYPES dataset where TABLETYPE column specifies whether it’s a cluster or a member for each library object MEMNAME:

proc contents data=SPDSLIB._all_ out=WORK.ALLOBJECTS (keep=MEMNAME TYPEMEM);
run;
 
proc sort data=WORK.ALLOBJECTS nodupkey;
   by MEMNAME;
run;
 
data WORK.SPDSTYPES;
   set WORK.ALLOBJECT;
   attrib TABLETYPE $7 label='SPDS table type';
   select(TYPEMEM);
      when('DATA') TABLETYPE = 'CLUSTER';
      when('')     TABLETYPE = 'MEMBER';
      otherwise    TABLETYPE = '';
   end;
run;

In this code PROC CONTENTS produces one record per column NAME in every object MEMNAME in the SPDS library; PROC SORT reduces (un-duplicates) this list to one record per MEMNAME; finally, data step creates TABLETYPE column indicating which MEMNAME is CLUSTER and which is MEMBER.

Retrieving SPDS cluster’s member list

In addition to retrieving a list of objects in the SPDS library described above, we also need a way of capturing the content (a list of members) of the cluster itself in order to control removing or replacing its members. PROC SPDO’s CLUSTER LIST statement produces such a list, and its OUT= option allows you to dump that list into a dataset:

proc spdo lib=SPDSLIB;
   cluster list CLUSTER1 out=CLUSTER1_MEMBERS;
   cluster list CLUSTER1 out=CLUSTER2_MEMBERS;
   /* ... */
   cluster list CLUSTER1 out=CLUSTERN_MEMBERS;
quit;

This approach creates one output table per cluster, and you can’t use the same OUT= destination table for different clusters, for they will be overwritten with each subsequent CLUSTER LIST statement, not appended.

If you need to capture contents of several clusters into one dataset, then instead of the above method of outputting each cluster content into separate table and then appending (concatenating) them, the good old ODS OUTPUT with CLUSTERLIST= option allows us to do it in a single step:

ods noresults;
ods output clusterlist=WORK.CLUSTER_MEMS;
proc spdo lib=SPDSLIB;
   cluster list CLUSTER1;
   cluster list CLUSTER2;
   /* ... */
   cluster list CLUSTERN;
quit;
ods output close;
ods results;

As additional bonus ODS NORESULTS suppresses printed output when it’s not needed, e.g. for automatic data-driven processing.

Your thoughts?

What is your experience with SAS SPDS? How might you use it in the future? Please comment below.

How to retrieve contents of SAS® Scalable Performance Data Server library was published on SAS Users.

8月 282019
 

This article is not a tutorial on Hadoop, Spark, or big data. At the same time, no prerequisite knowledge of these technologies is required for understanding. We’ll give you enough background prior to diving into the details. In simplest terms, the Hadoop framework maintains the data and Spark controls and directs data processing. As an analogy, think of Hadoop as a train, big data as the payload, and Spark as the crew driving the train and organizing and distributing the goods.

Big data

I recently read that data volumes are doubling each year. Not that long ago we talked in terms of gigabytes. This quickly turned into terabytes and we’re now in the age of petabytes. The type of data is also changing. Data used to fit neatly into rows and columns. Now, nearly eighty percent of data is unstructured. All these trends and facts have led us to deal with massive amounts of data, aka big data. Maintaining and processing big data required creating technical frameworks. Next, we’ll investigate a couple of these tools.

Hadoop

Hadoop is a technology stack utilizing parallel processing on a distributed filesystem. Hadoop is useful to companies when data sets become so large or complex that their current solutions cannot effectively process the information in a reasonable amount of time. As the data science field has matured over the past few years, so has the need for a different approach to processing data.

Apache Spark

Apache Spark is a cluster-computing framework utilizing both iterative algorithms and interactive/exploratory data analysis. The goal of Spark is to keep the benefits of Hadoop’s scalable, distributed, fault-tolerant processing framework, while making it more efficient and easier to use. Using in-memory distributed computing, Spark provides capabilities over and above the batch model of Hadoop MapReduce. As a result, this brings to the big data world new applications of data science that were previously too expensive or slow on massive data sets.

Now let’s explore how SAS integrates with these technologies to maximize capturing, managing, and analyzing big data.

SAS capabilities to leverage Spark

SAS provides Hadoop data processing and data scoring capabilities using SAS/ACCESS Interface to Hadoop and In-Database Technologies to Hadoop with MapReduce or Spark as the processing framework. This addresses some of the traditional data management batch processing, huge volumes of extract, transform, load (ETL) data as well as faster, interactive and in-memory processing for quicker response with Spark.

In SAS Viya, SAS/ACCESS Interface to Hadoop includes SAS Data Connector to Hadoop. All users with SAS/ACCESS Interface to Hadoop can use the serial. Likewise, SAS Data Connect Accelerator to Hadoop can load or save data in parallel between Hadoop and SAS using SAS Embedded Process, as a Hive/MapReduce or Spark job.

Connecting to Spark in a Hadoop Cluster

There are two ways to connect to a Hadoop cluster using SAS/ACCESS Interface to Hadoop, based on the SAS environment: LIBNAME and CASLIB statements.

LIBNAME statement to connect to Spark from MVA SAS

options set=SAS_HADOOP_JAR_PATH="/third_party/Hadoop/jars/lib:/third_party/Hadoop/jars/lib/spark"; 
options set=SAS_HADOOP_CONFIG_PATH="/third_party/Hadoop/conf"; 
 
libname hdplib hadoop server="hadoop.server.com" port=10000 user="hive"
schema='default' properties="hive.execution.engine=SPARK";

Parameters

SAS_HADOOP_JAR_PATH Directory path for the Hadoop and Spark JAR files
SAS_HADOOP_CONFIG_PATH Directory path for the Hadoop cluster configuration files
Libref The hdplib libref specifies the location where SAS will find the data
SAS/ACCESS Engine Name HADOOP option to connect Hadoop engine
SERVER Hadoop Hive server to connect
PORT Listening Hive server Port. 10000 is the default, so it is not required. It is included just in case
USER and PASSWORD Are not always required
SCHEMA Hive schema to access. It is optional; by default, it connects to the “default” schema
PROPERTIES Hadoop properties. Choosing SPARK for the property hive.execution.engine enables SAS Viya to use Spark as the execution platform

 
CASLIB statement to connect from CAS

caslib splib sessref=mysession datasource=(srctype="hadoop", dataTransferMode="auto",username="hive", server="hadoop.server.com", 
hadoopjarpath="/opt/sas/viya/config/data/hadoop/lib:/opt/sas/viya/conf ig/data/hadoop/lib/spark", 
hadoopconfigdir="/opt/sas/viya/config/data/hadoop/conf", schema="default"
platform="spark"
dfdebug="EPALL" 
properties="hive.execution.engine=SPARK");

Parameters

CASLIB Space holder for the specified data access. The splib CAS library specifies the Hadoop data source
sessref Holds the CAS library in a specific CAS session. mysession is the current active CAS session
SRCTYPE Type of data source
DATATRANSFERMODE Type of data movement between CAS and Hadoop. Accepts one of three values – serial, parallel, auto. When AUTO is specified, CAS choose the type of data transfer based on available license in the system. If Data Connect Accelerator to Hadoop has been licensed, parallel data transfer will be used, otherwise serial mode of transfer is used
HADOOPJARPATH Hadoop and Spark JAR files location path on the CAS cluster
HADOOPCONFIGDIR Hadoop configuration files location path on the CAS cluster
PLATFORM Type of Hadoop platform to execute the job or transfer data using SAS Embedded Process. Default value is “mapred” for Hive MapReduce. When using “Spark”, data transfer and job executes as a Spark job
DFDEBUG Used to receive additional information back from SAS Embedded Process transfers data in the SAS log
PROPERTIES Hadoop properties. Choosing SPARK for the property hive.execution.engine enables SAS Viya to use Spark as the execution platform

 

Data Access using Spark

SAS Data Connect Accelerator for Hadoop with the Spark platform option uses Hive as the query engine to access Spark data. Data movement happens between Spark and CAS through SAS generated Scala code. This approach is useful when data already exists in Spark and either needs to be used for SAS analytics processing or moved to CAS for massively parallel data and analytics processing.

Loading Data from Hadoop to CAS using Spark

Processing data in CAS offers advanced data preparation, visualization, modeling and model pipelines, and finally model deployment. Model deployment can be performed using available CAS modules or pushed back to Spark if the data is already in Hadoop.

Load data from Hadoop to CAS using Spark

proc casutil 
      incaslib=splib
      outcaslib=casuser;
      load casdata="gas"
      casout="gas"
      replace;
run;

Parameters

PROC CASUTIL Used to process CAS action routines to process data
INCASLIB Input CAS library to read data
OUTCASLIB Output CAS library to write data
CASDATA Table to load to the CAS in-memory server
CASOUT Output CAS table name

 

We can look at the status of the data load job using Hadoop' resource management and job scheduling application, YARN. YARN is responsible for allocating system resources to the various applications running in a Hadoop cluster and scheduling tasks to be executed on different cluster nodes.

Loading Data from Hadoop to Viya CAS using Spark

In the figure above, the YARN application executed the data load as a Spark job. This was possible because the CASLIB statement had Platform= Spark option specified. The data movement direction, in this case Hadoop to CAS uses the Spark job name, “SAS CAS/DC Input,” where “Input” is data loaded into CAS.

Saving Data from CAS to Hadoop using Spark

You can save data back to Hadoop from CAS at many stages of the analytic life cycle. For example, use data in CAS to prepare, blend, visualize, and model. Once the data meets the business use case, data can be saved in parallel to Hadoop using Spark jobs to share with other parts of the organization.

Using the SAVE CAS action to move data to Hadoop using Spark

proc cas;
session mysession; 
      table.save /
      caslib="splib"
      table={caslib="casuser", name="gas"},
      name="gas.sashdat"
      replace=True;
quit;

Parameters

PROC CAS Used to execute CAS actionsets and actions to process data.
“table” is the actionset and “save” is the action
TABLE Location and name of the source table
NAME Name of the target table saved to the Hadoop library using Spark

 

We can verify the status of saving data from CAS to Hadoop using YARN application. Data from CAS saves as a Hadoop table using, Spark as the execution platform. Furthermore, as SAS Data Connect Accelerator for Hadoop transfers data in parallel, individual Spark executors in each of the Spark executor nodes handles data execution for that specific Hadoop cluster node.

Saving Data from Viya CAS to Hadoop using Spark

Finally, the SAVE data executed as a Spark job. As we can see from YARN, the Spark job named “SAS CAS/DC Output” specifies that the data moves from CAS to Hadoop.

Where we are; where we're going

We have so far traveled across the Spark pond to setup SAS libraries for Spark, Load and Save data from and to Hadoop using Spark. In the next section we’ll look at ways to Score data and execute SAS code inside Hadoop using Spark.

Data and Analytics Innovation using SAS & Spark - part 1 was published on SAS Users.

8月 072019
 
2-D binning of counts

Do you want to bin a numeric variable into a small number of discrete groups? This article compiles a dozen resources and examples related to binning a continuous variable. The examples show both equal-width binning and quantile binning. In addition to standard one-dimensional techniques, this article also discusses various techniques for 2-D binning.

SAS procedures that support binning include the HPBIN, IML, KDE, RANK, and UNIVARIATE procedures.

Equal-width binning in SAS

The simplest binning technique is to form equal-width bins, which is also known as bucket binning. If a variable has the range [Min, Max] and you want to split the data into k equal-width bins (or buckets), each bin will have width (Max - Min) / k.

Quantile binning in SAS

In bucket binning, some bins have more observations than others. This enables you to estimate the density of the data, as in a histogram. However, you might want all bins to contain about the same number of observations. In that case, you can use quantiles of the data as cutpoints. If you want four bins, use the 25th, 50th, and 75th percentiles as cutpoints. If you want 10 bins, use the sample deciles as cutpoints. Here are several resources for quantile binning:

Binning by using arbitrary cutpoints in SAS

Sometimes you need to bin based on scientific standards or business rules. For example, the Saffir-Simpson hurricane scale uses specific wind speeds to classify a hurricane as Category 1, Category 2, and so forth. In these cases, you need to be able to define custom cutpoints and assign observations to bins based on those cutpoints.

2-D binning and bivariate histograms in SAS

A histogram is a visualization of a univariate equal-width binning scheme. You can perform similar computations and visualizations for two-dimensional data. If your goal is to understand the density of continuous bivariate data, you might want to use a bivariate histogram rather than a scatter plot (which, for large samples, suffers from overplotting).

In summary, this guide provides many links to programs and examples that bin data in SAS. Whether you want to use equal-width bins, quantile bins, or two-dimensional bins, hopefully, you will find an example to get you started. If I've missed an important topic, or if you have a favorite binning method that I have not covered, leave a comment.

The post The essential guide to binning in SAS appeared first on The DO Loop.

8月 052019
 

Binning transforms a continuous numerical variable into a discrete variable with a small number of values. When you bin univariate data, you define cut point that define discrete groups. I've previously shown how to use PROC FORMAT in SAS to bin numerical variables and give each group a meaningful name such as 'Low,' 'Medium,' and 'High.' This article uses PROC HPBIN to create bins that are assigned numbers. If you bin the data into k groups, the groups have the integer values 1, 2, 3, ..., k. Missing values are put into the zeroth group.

There are two popular ways to choose the cut points. You can use evenly spaced points, or you can use quantiles of the data. If you use evenly spaced cut points (as in a histogram), the number of observations in each bin will usually vary. Using evenly spaced cut points is called the "bucket binning" method. Alternatively, if you use quantiles as cut points (such as tertiles, quartiles, or deciles), the number of observations in each bin tend to be more balanced. This article shows how to use PROC HPBIN in SAS to perform bucket binning and quantile binning. It also shows how to use the CODE statement in PROC HPBIN to create a DATA step that uses the same cut points to bin future observations.

Create sample data

The following statements create sample data from the Sashelp.Heart data. An ID variable is added to the data so that you can identify each observation. A call to PROC MEANS produces descriptive statistics about two variables: Cholesterol and Systolic blood pressure.

data Heart;
format PatientID Z5.;
set Sashelp.Heart(keep=Sex Cholesterol Systolic);
PatientID = 10000 + _N_;
run;
 
proc means data=Heart nolabels N NMISS Min Max Skewness;
   var Cholesterol Systolic;
run;
Desriptive statistics for two variables

The output shows the range of the data for each variable. It also shows that the Cholesterol variable has 152 missing values. If your analysis requires nonmissing observations, you can use PROC HPIMPUTE to replace the missing values. For this article, I will not replace the missing values so that you can see how PROC HPBIN handles missing values.

Each variable has a skewed distribution, as indicated by the values of the skewness statistic. This usually indicates that equal-length binning will result in bins in the tail of the distribution that have only a few observations.

Use PROC HPBIN to bin data into equal-length bins

A histogram divides the range of the data by using k evenly spaced cutpoints. The width of each bin is (Max – Min) / k. PROC HPBIN enables you to create new variables that indicate to which bin each observation belongs. You can use the global NUMBIN= option on the PROC HPBIN statement to set the default number of bins for each variable. You can use the INPUT statement to specify which variables to bin. You can override the default number of bins by using the NUMBIN= option on any INPUT statement.

Suppose that you want to bin the Cholesterol data into five bins and the remaining variables into three bins.

  • The range of the Cholesterol data is [96, 568], so the width of the five bins that contain nonmissing values will be 94.4.
  • The range of the Systolic data is [82, 300], so the width of the three bins will be 72.66.

The following call to PROC HPBIN bins the variables. The output data set, HeartBin, contains the bin numbers for each observation.

/* equally divide the range each variable (bucket binning) */
proc hpbin data=Heart output=HeartBin numbin=3;  /* global NUMBIN= option */
   input Cholesterol / numbin=5;                 /* override global NUMBIN= option */
   input Systolic;                 
   id PatientID Sex;
run;
Cutpoints and frequency of observations for bucket binning using PROC HPBIN in SAS

Part of the output from PROC HPBIN is shown. (You can suppress the output by using the NOPRINT option.) The first table shows that PROC HPBIN used four threads on my PC to compute the results in parallel. The second table summarizes the transformation that bins the data. For each variable, the second column gives the names of the binned variables in the OUTPUT= data set. The third column shows the cutpoints for each bin. The Frequency and Proportion column show the frequency and proportion (respectively) of observations in each bin. As expected for these skewed variables, bins in the tail of each variable contain very few observations (less than 1% of the total).

The OUTPUT= option creates an output data set that contains the indicator variables for the bins. You can use PROC FREQ to enumerate the bin values and (again) count the number of observations in each bin:

proc freq data=HeartBin;
   tables BIN_Cholesterol BIN_Systolic / nocum;
run;
Frequency of observations in each bin for bucket binning using PROC HPBIN in SAS

Notice that the Cholesterol variable was split into six bins even though the syntax specified NUMBIN=5. If a variable contains missing values, a separate bin is created for them. In this case, the zeroth bin contains the 152 missing values for the Cholesterol variable.

Bucket binning divides the range of the variables into equal-width intervals. For long-tailed data, the number of observations in each bin might vary widely, as for these data. The next section shows an alternative binning strategy in which the width of the bins vary and each bin contains approximately the same number of observations.

Use PROC HPBIN to bin data by using quantiles

You can use evenly-spaced quantiles as cutpoints in an attempt to balance the number of observations in the bins. However, if the data are rounded or have duplicate values, the number of observations in each bin can still vary. PROC HPBIN has two ways methods for quantile binning. The slower method (the QUANTILE option) computes cutpoints based on the sample quantiles and then bins the observations. The faster method (the PSEUDO_QUANTILE option) uses approximate quantiles to bin the data. The following call uses the PSEUDO_QUANTILE option to bin the data into approximately equal groups:

/* bin by quantiles of each variable */
proc hpbin data=Heart output=HeartBin numbin=3 pseudo_quantile;
   input Cholesterol / numbin=5;    /* override global NUMBIN= option */
   input Systolic;                  /* use global NUMBIN= option */
   id PatientID Sex;
   code file='C:/Temp/BinCode.sas'; /* generate scoring code */
run;
Cutpoints and frequency of observations in each bin for quantile binning using PROC HPBIN in SAS

The output shows that the number of observations in each bin is more balanced. For the Systolic variable, each bin has between 1,697 and 1,773 observations. For the Cholesterol variable, each bin contains between 975 and 1,056 observations. Although not shown in the table, the BIN_Cholesterol variable also contains a bin for the 152 missing values for the Cholesterol variable.

Use PROC HPBIN to write DATA step code to bin future observations

In the previous section, I used the CODE statement to specify a file that contains SAS DATA step code that can be used to bin future observations. The statements in the BinCode.sas file are shown below:

*****************      BIN_Systolic     ********************;
length BIN_Systolic 8;
if missing(Systolic) then do; BIN_Systolic = 0; end;
else if Systolic < 124.0086 then do; BIN_Systolic =     1; end;
else if 124.0086 <= Systolic < 140.0098 then do; BIN_Systolic =     2; end;
else if 140.0098 <= Systolic then do; BIN_Systolic =     3; end;
 
*****************      BIN_Cholesterol     ********************;
length BIN_Cholesterol 8;
if missing(Cholesterol) then do; BIN_Cholesterol = 0; end;
else if Cholesterol < 190.0224 then do; BIN_Cholesterol =     1; end;
else if 190.0224 <= Cholesterol < 213.0088 then do; BIN_Cholesterol =     2; end;
else if 213.0088 <= Cholesterol < 234.0128 then do; BIN_Cholesterol =     3; end;
else if 234.0128 <= Cholesterol < 263.0408 then do; BIN_Cholesterol =     4; end;
else if 263.0408 <= Cholesterol then do; BIN_Cholesterol =     5; end;

You can see from these statements that the zeroth bin is reserved for missing values. Nonmissing values will be split into bins according to the approximate tertiles (NUMBIN=3) or quintiles (NUMBIN=5) of the training data.

The following statements show how to use the file that was created by the CODE statement. New data is contained in the Patients data set. Simply use the SET statement and the %INCLUDE statement to bin the new data, as follows:

data Patients;
length Sex $6;
input PatientID Sex Systolic Cholesterol;
datalines;
13021 Male    96 . 
13022 Male   148 242 
13023 Female 144 217 
13024 Female 164 376 
13025 Female .   248 
13026 Male   190 238 
13027 Female 158 326 
13028 Female 188 266 
;
 
data MakeBins;
set Patients;
%include 'C:/Temp/BinCode.sas';   /* include the binning statements */
run;
 
/* Note: HPBIN puts missing values into bin 0 */
proc print data=MakeBins;  run;
Binning new observations by using PROC HPBIN in SAS

The input data can contain other variables (PatientID, Sex) that are not binned. However, the data should contain the Systolic and Cholesterol variables because the statements in the BinCode.sas file refer to those variables.

Summary

In summary, you can use PROC HPBIN in SAS to create a new discrete variable by binning a continuous variable. This transformation is common in machine learning algorithms. Two common binning methods include bucket binning (equal-length bins) and quantile binning (unequal-length bins). Missing values are put into their own bin (the zeroth bin). The CODE statement in PROC HPBIN enables you to write DATA step code that you can use to bin future observations.

The post How to use PROC HPBIN to bin numerical variables appeared first on The DO Loop.

8月 052019
 

Binning transforms a continuous numerical variable into a discrete variable with a small number of values. When you bin univariate data, you define cut point that define discrete groups. I've previously shown how to use PROC FORMAT in SAS to bin numerical variables and give each group a meaningful name such as 'Low,' 'Medium,' and 'High.' This article uses PROC HPBIN to create bins that are assigned numbers. If you bin the data into k groups, the groups have the integer values 1, 2, 3, ..., k. Missing values are put into the zeroth group.

There are two popular ways to choose the cut points. You can use evenly spaced points, or you can use quantiles of the data. If you use evenly spaced cut points (as in a histogram), the number of observations in each bin will usually vary. Using evenly spaced cut points is called the "bucket binning" method. Alternatively, if you use quantiles as cut points (such as tertiles, quartiles, or deciles), the number of observations in each bin tend to be more balanced. This article shows how to use PROC HPBIN in SAS to perform bucket binning and quantile binning. It also shows how to use the CODE statement in PROC HPBIN to create a DATA step that uses the same cut points to bin future observations.

Create sample data

The following statements create sample data from the Sashelp.Heart data. An ID variable is added to the data so that you can identify each observation. A call to PROC MEANS produces descriptive statistics about two variables: Cholesterol and Systolic blood pressure.

data Heart;
format PatientID Z5.;
set Sashelp.Heart(keep=Sex Cholesterol Systolic);
PatientID = 10000 + _N_;
run;
 
proc means data=Heart nolabels N NMISS Min Max Skewness;
   var Cholesterol Systolic;
run;
Desriptive statistics for two variables

The output shows the range of the data for each variable. It also shows that the Cholesterol variable has 152 missing values. If your analysis requires nonmissing observations, you can use PROC HPIMPUTE to replace the missing values. For this article, I will not replace the missing values so that you can see how PROC HPBIN handles missing values.

Each variable has a skewed distribution, as indicated by the values of the skewness statistic. This usually indicates that equal-length binning will result in bins in the tail of the distribution that have only a few observations.

Use PROC HPBIN to bin data into equal-length bins

A histogram divides the range of the data by using k evenly spaced cutpoints. The width of each bin is (Max – Min) / k. PROC HPBIN enables you to create new variables that indicate to which bin each observation belongs. You can use the global NUMBIN= option on the PROC HPBIN statement to set the default number of bins for each variable. You can use the INPUT statement to specify which variables to bin. You can override the default number of bins by using the NUMBIN= option on any INPUT statement.

Suppose that you want to bin the Cholesterol data into five bins and the remaining variables into three bins.

  • The range of the Cholesterol data is [96, 568], so the width of the five bins that contain nonmissing values will be 94.4.
  • The range of the Systolic data is [82, 300], so the width of the three bins will be 72.66.

The following call to PROC HPBIN bins the variables. The output data set, HeartBin, contains the bin numbers for each observation.

/* equally divide the range each variable (bucket binning) */
proc hpbin data=Heart output=HeartBin numbin=3;  /* global NUMBIN= option */
   input Cholesterol / numbin=5;                 /* override global NUMBIN= option */
   input Systolic;                 
   id PatientID Sex;
run;
Cutpoints and frequency of observations for bucket binning using PROC HPBIN in SAS

Part of the output from PROC HPBIN is shown. (You can suppress the output by using the NOPRINT option.) The first table shows that PROC HPBIN used four threads on my PC to compute the results in parallel. The second table summarizes the transformation that bins the data. For each variable, the second column gives the names of the binned variables in the OUTPUT= data set. The third column shows the cutpoints for each bin. The Frequency and Proportion column show the frequency and proportion (respectively) of observations in each bin. As expected for these skewed variables, bins in the tail of each variable contain very few observations (less than 1% of the total).

The OUTPUT= option creates an output data set that contains the indicator variables for the bins. You can use PROC FREQ to enumerate the bin values and (again) count the number of observations in each bin:

proc freq data=HeartBin;
   tables BIN_Cholesterol BIN_Systolic / nocum;
run;
Frequency of observations in each bin for bucket binning using PROC HPBIN in SAS

Notice that the Cholesterol variable was split into six bins even though the syntax specified NUMBIN=5. If a variable contains missing values, a separate bin is created for them. In this case, the zeroth bin contains the 152 missing values for the Cholesterol variable.

Bucket binning divides the range of the variables into equal-width intervals. For long-tailed data, the number of observations in each bin might vary widely, as for these data. The next section shows an alternative binning strategy in which the width of the bins vary and each bin contains approximately the same number of observations.

Use PROC HPBIN to bin data by using quantiles

You can use evenly-spaced quantiles as cutpoints in an attempt to balance the number of observations in the bins. However, if the data are rounded or have duplicate values, the number of observations in each bin can still vary. PROC HPBIN has two ways methods for quantile binning. The slower method (the QUANTILE option) computes cutpoints based on the sample quantiles and then bins the observations. The faster method (the PSEUDO_QUANTILE option) uses approximate quantiles to bin the data. The following call uses the PSEUDO_QUANTILE option to bin the data into approximately equal groups:

/* bin by quantiles of each variable */
proc hpbin data=Heart output=HeartBin numbin=3 pseudo_quantile;
   input Cholesterol / numbin=5;    /* override global NUMBIN= option */
   input Systolic;                  /* use global NUMBIN= option */
   id PatientID Sex;
   code file='C:/Temp/BinCode.sas'; /* generate scoring code */
run;
Cutpoints and frequency of observations in each bin for quantile binning using PROC HPBIN in SAS

The output shows that the number of observations in each bin is more balanced. For the Systolic variable, each bin has between 1,697 and 1,773 observations. For the Cholesterol variable, each bin contains between 975 and 1,056 observations. Although not shown in the table, the BIN_Cholesterol variable also contains a bin for the 152 missing values for the Cholesterol variable.

Use PROC HPBIN to write DATA step code to bin future observations

In the previous section, I used the CODE statement to specify a file that contains SAS DATA step code that can be used to bin future observations. The statements in the BinCode.sas file are shown below:

*****************      BIN_Systolic     ********************;
length BIN_Systolic 8;
if missing(Systolic) then do; BIN_Systolic = 0; end;
else if Systolic < 124.0086 then do; BIN_Systolic =     1; end;
else if 124.0086 <= Systolic < 140.0098 then do; BIN_Systolic =     2; end;
else if 140.0098 <= Systolic then do; BIN_Systolic =     3; end;
 
*****************      BIN_Cholesterol     ********************;
length BIN_Cholesterol 8;
if missing(Cholesterol) then do; BIN_Cholesterol = 0; end;
else if Cholesterol < 190.0224 then do; BIN_Cholesterol =     1; end;
else if 190.0224 <= Cholesterol < 213.0088 then do; BIN_Cholesterol =     2; end;
else if 213.0088 <= Cholesterol < 234.0128 then do; BIN_Cholesterol =     3; end;
else if 234.0128 <= Cholesterol < 263.0408 then do; BIN_Cholesterol =     4; end;
else if 263.0408 <= Cholesterol then do; BIN_Cholesterol =     5; end;

You can see from these statements that the zeroth bin is reserved for missing values. Nonmissing values will be split into bins according to the approximate tertiles (NUMBIN=3) or quintiles (NUMBIN=5) of the training data.

The following statements show how to use the file that was created by the CODE statement. New data is contained in the Patients data set. Simply use the SET statement and the %INCLUDE statement to bin the new data, as follows:

data Patients;
length Sex $6;
input PatientID Sex Systolic Cholesterol;
datalines;
13021 Male    96 . 
13022 Male   148 242 
13023 Female 144 217 
13024 Female 164 376 
13025 Female .   248 
13026 Male   190 238 
13027 Female 158 326 
13028 Female 188 266 
;
 
data MakeBins;
set Patients;
%include 'C:/Temp/BinCode.sas';   /* include the binning statements */
run;
 
/* Note: HPBIN puts missing values into bin 0 */
proc print data=MakeBins;  run;
Binning new observations by using PROC HPBIN in SAS

The input data can contain other variables (PatientID, Sex) that are not binned. However, the data should contain the Systolic and Cholesterol variables because the statements in the BinCode.sas file refer to those variables.

Summary

In summary, you can use PROC HPBIN in SAS to create a new discrete variable by binning a continuous variable. This transformation is common in machine learning algorithms. Two common binning methods include bucket binning (equal-length bins) and quantile binning (unequal-length bins). Missing values are put into their own bin (the zeroth bin). The CODE statement in PROC HPBIN enables you to write DATA step code that you can use to bin future observations.

The post How to use PROC HPBIN to bin numerical variables appeared first on The DO Loop.

7月 172019
 

Problem solving: thinking inside the box

Have you ever tried to pass comma-delimited values to SAS macro or to a SAS macro function? How can SAS distinguish commas separating parameters or arguments from commas separating parts of the values?

Passing comma-delimited value as an argument to a SAS macro function

Let’s say you want to extract the first word from the following string of characters (these words represent column names in the SASHELP.CARS data table):

make, model, type, origin

If you run the following code:

%let firstvar = %scan(make, model, type, origin, 1);

you get is the following ERROR in your SAS log:

ERROR: Macro function %SCAN has too many arguments.

That is because %scan macro function sees and treats those make, model, type and origin as arguments since commas between them are interpreted as argument separators.

Even if you “hide” your comma-delimited value within a macro variable, it still won’t do any good since the macro variable gets resolved during macro compilation before being passed on to a macro or macro function for execution.

%let mylist = make, model, type, origin;
%let firstvar = %scan(&mylist, 1);

You will still get the same ERROR:

ERROR: Macro function %SCAN has too many arguments.

Passing comma-delimited value as a parameter to a SAS macro

Try submitting the following code that passes your macro variable value to a SAS macro as a parameter:

%let mylist = make, model, type, origin;
%macro subset(dsname=, varlist=);
   proc sql;
      select &varlist
      from &dsname;
   quit;
%mend subset;
%subset(dsname=SASHELP.CARS, varlist=&mylist)

You will get another version of the SAS log ERROR:

ERROR: All positional parameters must precede keyword parameters.
NOTE: Line generated by the macro variable "MYLIST".
1                 type, origin
                  ----
                  180
ERROR 180-322: Statement is not valid or it is used out of proper order.

In this case, macro %subset gets as confused as the %scan function above because your macro variable will get resolved during macro compilation, and SAS macro processor will see the macro invocation as:

%subset(dsname=SASHELP.CARS, varlist=make, model, type, origin)

treating each comma as a parameter separator.

All this confusion happens because SAS functions’ arguments and SAS macros’ parameters use commas as their separators, while resolved macro variables introduce their own values’ comma delimiters into the functions/macros constructs’ picture, thus wreaking havoc on your SAS program.

It’s time for a vacation

But don’t panic! To fight that chaos, you need to take a vacation. Not a stay-home, do-nothing vacation, but some serious vacation, with faraway destination and travel arrangements. While real vacation is preferable, an imaginary one would do it too. I mean to start fighting the mess with comma-separated values, pick your destination, book your hotel and flight, and start packing your stuff.

Do you have a “vacation items list”? In my family, we have an individual vacation list for every family member. How many items do you usually take with you? Ten, twenty, a hundred?

Regardless, you don’t show up at the airport checkpoint with a pile of your vacation items. That would’ve been too messy. I don’t think you would be even allowed boarding with an unpacked heap of your stuff. You come to an airport neatly rolling a single item that is called a suitcase. Well, I suppose that some of you may have two of them, but I can’t imagine more than that.

You only started your fantasy vacation, you haven’t even checked in to your flight, but you have already have a solution in your sight, a perfect combine-and-conquer solution for passing comma-delimited values. Even if you have not yet realized that it’s in your plain view.

Thinking inside the box

Forget about “thinking outside the box” metaphor. You can’t solve all your problems with a single strategy. Sometimes, you need to turn your thinking on its head to solve, or even to see the problem.

As for your airport check-in, instead of thinking outside the box, you thought “inside the box” and brought your many items “boxed” as a single item – a suitcase. A container, in a broader sense.

That is exactly how we are going to approach our comma-delimited lists problem. We are going to check them in to a macro or a macro function as a single, boxed item. Just like this:
Passing a comma-separated value to SAS macro or SAS macro function
Or like this:
passing SAS macro variable with comma-separated value to SAS macro or SAS macro function

Not surprisingly, SAS macro language provides a variety of these wonder boxes for many special occasions collectively known as macro quoting functions. Personally, I would prefer calling them “macro masking functions,” as they have nothing to do with “quoting” per se and have everything to do with masking various characters during macro compilation or macro processing. But that is what “macro quoting” means – masking, boxing, - similar to “quoting” a character string to make it a single entity.

Different macro quoting functions mask different special characters (+ - , / ; = etc.) and mnemonics (AND OR GT EQ etc.) so that the macro facility interprets them as text instead of as language symbols.

Here are all 7 SAS macro quoting functions, two of which work at macro compilation - %STR() and %NRSTR(), while other 5 work at macro execution - %QUOTE() and %NRQUOTE(), %BQUOTE() and %NRBQUOTE(), and %SUPERQ().

You may look up what symbols they mask and the timing they apply (macro compilation vs. macro execution) in this macro quoting functions summary. You may also want to look at the following cheat sheet: Deciding When to Use a Macro Quoting Function and Which Function to Use.

As general rule of thumb, use macro quoting functions at compilation time when you mask text constants - (make, model, type, origin); use macro quoting functions at execution time when you mask macro or macro variable references containing & or % - (&mylist).

NOTE: There are many other SAS macro functions that besides their main role also perform macro quoting, e.g. %QSCAN(), %QSUBSTR() and others; they all start with %Q.

Masking commas within a comma-delimited value passed as an argument or a parameter

It turns out that to mask (or to “box”) comma-separated values in a macro function or a SAS macro, any macro quoting function will work. In this case I would suggest using the simplest (and shortest) %STR(). %STR() applies during macro compilation and serves as a perfect “box” for our comma-delimited values to hide (mask) commas to receiving macro function or a macro does not confuse them with its own commas separating arguments / parameters.

With it we can re-write our above examples as:

%let firstvar = %scan(%str(make, model, type, origin), 1);
%put &=firstvar;

SAS log will produce exactly what we expected:

FIRSTVAR=make

Similarly, we can call the above SAS macro as:

%subset(dsname=SASHELP.CARS, varlist=%str(make, model, type, origin) )

It will run without ERRORs and produce a print of the SASHELP.CARS data table with 4 columns specified by the varlist parameter value:

SAS output table as a result of macro run

Masking commas within a macro variable value passed as an argument or parameter

When you assign a comma-delimited list as a value to a macro variable, we want to mask commas within the resolved value during execution. Any of the execution time macro quoting functions will mask comma.

Again, in case of multiple possibilities I would use the shortest one - %QUOTE().

With it we can re-write our above examples as:

%let mylist = make, model, type, origin;
 
%let firstvar = %scan(%quote(&mylist), 1);
 
%subset(dsname=SASHELP.CARS, varlist=%quote(&mylist))

But just keep in mind that the remaining 4 execution time macro quoting functions - %NRQUOTE(), %BQUOTE(), %NRBQUOTE() and %SUPERQ() - will work too.

NOTE: The syntax of the %SUPERQ() function is quite different from the rest of the pack. The %SUPERQ() macro function takes as its argument either a macro variable name without an ampersand or a macro text expression that yields a macro variable name without an ampersand.

Get it going

I realize that macro quoting is not a trivial matter. That is why I attempted to explain its concept on a very simple yet powerful use case. Hope you will expand on this to empower your SAS coding skills.

Passing comma-delimited values into SAS macros and macro functions was published on SAS Users.