3月 182020
 

A SAS/IML programmer asked about the best way to print multiple SAS/IML variables when each variable needs a different format. He wanted the output to resemble the "Parameter Estimates" table that is produced by PROC REG and other SAS/STAT procedures.

This article shows four ways to print SAS/IML vectors in a table in which each variable has a different format. The methods are:

  1. Use the SAS/IML PRINT statement and assign a format to each column. Optionally, you can also assign a label to each variable.
  2. Create a SAS/IML table and use the TablePrint statement. You can use the TableSetVarFormat subroutine to assign formats to columns.
  3. If you want to format the variables according to a SAS/STAT template, you can use the TablePrint subroutine and specify the name of the template.
  4. Write the vectors to a data set and use PROC PRINT.

An example table

Suppose you have computed a linear regression in the SAS/IML language. You have six vectors (each with three observations) that you want to print in table that resembles the ParameterEstimates table that is produced by PROC REG and other SAS/STAT procedures. Here are the vectors:

proc iml;
factors = {'Intercept', 'MPG_Highway','Weight'};      /* Variable */
df      = {1,           1,            1};             /* DF = degrees of freedom */
est     = {-3.73649,    0.87086,      0.00011747};    /* Estimate */
SE      = {1.25091,     0.02446,      0.0001851};     /* StdErr = standard error */
t       = {-2.99,       35.6,         0.63};          /* tValue = value t test that the parameter is zero */
pvalue= {0.0029803,     1.36E-129,    0.525902};      /* Probt = two-sided p-value for hypothesis test */

The following PRINT statement displays these vectors:
     PRINT factors df est SE t pvalue;
However, there are two problems with the output:

  1. The column headers are the names of the variables. You might want to display more meaningful column headers.
  2. By default, each vector is printed by using the BEST9. format. However, it can be desirable to print each vector in a different format.

For this table, the programmer wanted to use the same formats that PROC REG uses for the ParameterEstimates table, which are as follows:

1. The PRINT statement

IML programmers would use the PRINT statement to print all six columns in a single table. You can use the LABEL= option (or L=, for short) on the PRINT statement to specify the column header. You can use the FORMAT= option (or F=, for short) to specify the format. Thus, you can use the following PRINT statement to create a table in which each column uses its own format:

/* 1: Use PRINT statement. Assign format to each column. 
      You cannot get a spanning header when you print individual columns */
print "-------- Parameter Estimates --------";
print factors[L='Variable']
      df[L='DF'        F=2.0]
      est[L='Estimate' F=D11.3]
      SE[L='StdErr'    F=D11.3]
      t[L='tValue'     F=7.2]
      pvalue[L='Probt' F=PVALUE6.4];

The result is acceptable. The PRINT statement enables you to apply formats to each column. However, unfortunately, there is no way to put a spanning header that says "Parameter Estimates" across the top of the table. This example uses a separate PRINT statement to emulate a header.

2. The TablePrint statement

If you have SAS/IML 14.2 or later (or the iml action in SAS Viya 3.5 or later), you can put the variables into a SAS/IML table and use the TablePrint statement to display the table. The IML language supports the TableSetVarFormat statement. which you can use to set the formats for individual columns, as shown in the following statements:

/* 2: Put variables into a table and use the TABLEPRINT statement. */
Tbl = TableCreate('Variable', factors);
call TableAddVar(Tbl, {'DF' 'Estimate' 'StdErr' 'tValue' 'Probt'},
                        df || est    || SE    || t     || pvalue);
call TableSetVarFormat(Tbl, {'DF' 'Estimate' 'StdErr' 'tValue' 'Probt'},
                            {'2.0' 'D11.3'   'D11.3'  '7.2'    'PVALUE6.4'});
call TablePrint(Tbl) ID='Variable' label='Parameter Estimates';

You can see that the TablePrint routine does an excellent job printing the table. You can use the ID= option to specify that the Variable column should be used as row headers. The LABEL= option puts a spanning header across the top of the table.

3. TablePrint can use a template

In the previous section, I used the TableSetVarFormat function to manually apply formats to specific columns. However, if your goal is to apply the same formats that are used by a SAS/STAT procedure, there is an easier way. You can use the TEMPLATE= option on the TablePrint subroutine to specify the name of an ODS template. You need to make sure that the names of variables in the table are the same as the names that are used in the template, but if you do that then the template will automatically format the variables and display a spanning header for the table.

/* 3. An easier way to duplicate the ParameterEstimates table from SAS/STAT procedure. 
      Use the TEMPLATE= option. (Make sure the names of variables are what the template expects. */
Tbl = TableCreate('Variable', factors);
call TableAddVar(Tbl, {'DF' 'Estimate' 'StdErr' 'tValue' 'Probt'},
                        df || est    || SE    || t     || pvalue);
call TablePrint(Tbl) label='Parameter Estimates'
                     template="Stat.REG.ParameterEstimates";

Remember: You can use ODS TRCE ON or the SAS/STAT documentation to find the name of ODS tables that a procedure creates.

4. Use PROC PRINT

If none of the previous methods satisfy your needs, you can always write the data to a SAS data set and then use Base SAS methods to display the table. For example, the following statements create a SAS data set and use PROC PRINT to display the table. The LABEL statement is used to specify the column headers. The FORMAT statement is used to apply formats to the columns:

/* 4. Write a SAS data set and use PROC PRINT */
create PE var {'factors' 'df' 'est' 'SE' 't' 'pvalue'};
append;
close;
QUIT;
 
proc print data=PE noobs label;
   label factors='Variable' df='DF' est='Estimate' SE='StdErr' t='tValue' pvalue='Probt';
   format df 2.0 est SE D11.3 t 7.2 pvalue PVALUE6.4;
   var factors df est SE t pvalue;
run;

Further reading

To learn more about SAS/IML table and other non-matrix data structures, see Wicklin (2017), More Than Matrices: SAS/IML Software Supports New Data Structures."

The post Print SAS/IML variables with formats appeared first on The DO Loop.

3月 182020
 

COVID-19 is truly a global health issue affecting everyone and causing concern for you, our customers. As we all continue to navigate the uncertainty of this outbreak, SAS is committed to supporting your business as a valued partner and will ensure continuity of service. We created this guide to address customers’ queries about our products, services and business continuity management program.

Will SAS products and services be impacted by COVID-19?

We do not anticipate any limits or restrictions regarding the availability of SAS products or services as a result of COVID-19. While certain SAS office locations have closed or implemented a remote working policy, we are continuing to ensure excellent customer service.

Will my cloud services be impacted?

We do not anticipate any impact to SAS’ critical supply chain, logistics arrangements and facilities.  Additionally, our SaaS solutions and managed services are supported and operated by teams of skilled personnel who can work remotely to provide agreed service-level coverage.

What are SAS’ strategies for providing customer support?

In response to the current COVID-19 pandemic, SAS is taking precautions to protect employees and minimize business interruptions. SAS has Business Continuity Management (BCM) plans in place for crisis management as well as for resumption of key customer-facing services, such as technical support.  SAS’ business continuity plans provide for staff to maintain critical services and, where appropriate, for staff to work remotely, or to utilize geographically disperse staff. Plans also address comprehensive security measures and regular off-site rotation of data backups for SAS’ hosted solutions. SAS’ global 24-hour technical support is always ready to provide services and is not impacted by the virus outbreak.

SAS incorporates a Communicable Disease Plan response in its BCM policies to prepare SAS’ timely and efficient response in the event a communicable disease affects or poses a credible threat of transmission in one or more of SAS’ global workplaces. In a communicable disease event, as with other disruptive incidents, SAS’ Emergency Operations Command (EOC) may be engaged along with other key internal parties. SAS’ Communicable Disease Plan provides pandemic guidance aligned to the latest guidance from World Health Organization (WHO) and Centers for Disease Control and Prevention (CDC), including points of contact for international agencies.

Will there be delays in SAS’ supply chain due to COVID-19?

We do not anticipate any impact to SAS’ critical supply chain, logistics arrangements and facilities. SAS has minimal external dependencies. Third-party dependencies are qualified for their criticality and as appropriate, additional steps may be taken to assure continuity of SAS' business and/or consideration of the use of alternate suppliers. If service levels are disrupted, customers will be informed through communication channels which may include company phone messaging, the SAS customer support website, personal contact with account managers and/or other staff and business partners.

I have questions about my license relating to expiration or expanding usage while employees are required to work from home. Are you making any special accommodations for businesses coping with disruption related to the virus?

We will work with you to make accommodations that make it possible for you to continue using the software even if your staff is working from home or your business is disrupted in other ways due to the virus outbreak. We are here to help.

If you cannot reach your account team, contact the global office closest to you or reach out to the SAS Customer Contact Center.

Who at SAS can help address any COVID-19 requests regarding specific data and analytic needs?

We recognize that today -- more than ever -- there is opportunity for SAS to help its customers with critical business challenges. Please reach out to your account team or global office for additional assistance. Some examples of how SAS is already helping customers use data and analytics in the fight against the virus:

  • Helping the public sector predict the spread of the virus.
  • Ensuring strong supply chains for medical, food and retail supplies.
  • Optimizing health care workforce and facilities.

What is happening with SAS Global Forum and all other planned customer events?

Our top priority is the health and well-being of attendees and employees. After careful consideration of the evolving concerns around COVID-19, and based on guidelines and recommendations from various global health organizations, we’ve made the very difficult decision to cancel in-person events through June – including SAS Global Forum 2020. We are postponing SAS Global Forum at this time and will provide updates on future plans at the SAS Global Forum event site. SAS is looking at options to provide certain event content virtually and will keep customers updated.

What is the operational status of the SAS office near me?

The SAS executive team has been meeting daily to discuss how we can safeguard our employees, continue to support our customers and partners, and meet SAS’ business obligations during this very difficult period. SAS has always had a flexible work environment that allows employees to work with their managers on arrangements that best suit their personal needs.  With the latest US national emergency and North Carolina state of emergency declared, SAS has directed US employees to work remotely through March 31st, and the Cary campus is not allowing visitors during this time. Outside the US, regional leadership will continue to make decisions based on the situation in the region, following local guidelines and directives.

Any offices experiencing mandatory remote working policies will maintain essential business processes and customer support to prevent disruptions to customer services.

Please contact your SAS account executive with any further questions or reach out to the SAS Customer Contact Center.

How can I get updates on COVID-19 from SAS?

As we continue to closely follow developments related to COVID-19, including recommendations from the CDC, WHO and other government and health authorities, SAS will share any new developments through this FAQ page. In the meantime, know that our thoughts are with all those around the world who have been affected by COVID-19.

Frequently asked questions related to COVID-19 and SAS customer support was published on SAS Users.

3月 182020
 

If you are reading this, you are part of the extended SAS family. Together, we are navigating uncharted territory as we care for our families, our friends and our communities during the coronavirus outbreak. Over the past few weeks, the SAS executive team has been meeting throughout the day, every day, to discuss how we [...]

Caring for customers, employees and our community during the coronavirus outbreak was published on SAS Voices by Oliver Schabenberger

3月 172020
 

Data management has never been the shiny object that caught the imagination of the mainstream. And let’s be honest, it's not nearly as interesting as analytics, machine learning or artificial intelligence. In fact, entire movies get created about analytics, and people actually pay to see them! Data management? Not so [...]

Past, present and future ... it's always been about data management was published on SAS Voices by Todd Wright

3月 162020
 

As a long-time SAS 9 programmer, I typically accomplish my data preparation tasks through some combination of the DATA Step, Proc SQL, Proc Transpose and some housekeeping procs like Proc Contents and Proc Datasets. With the introduction of SAS Viya, SAS released a new scripting language called CASL – a language that interacts with SAS Cloud Analytics Services (CAS).

CASL statements include actions, logically organized into action sets based on common functionality. For example, the Table action set allows you to load a table in CAS, view table metadata, change table metadata such as drop or rename a column, fetch (print) sample rows, save or drop a table from CAS, among other things. Steven Sober provides a great overview of CASL in his 2019 SAS Global Forum paper.

Learning CASL is a good idea assuming you want to leverage the power of CAS, because CASL is the language of CAS. While you can continue to use Viya-enabled procs for many of your data processing needs, certain new functionality is only available through CASL. CAS actions also provide a more granular access to options which otherwise may not be available as procedure options. But old habits die hard, and for a while I found myself bouncing between SAS 9.4 and CASL. I'd pull the data down from CAS just to get it to process in the SAS Programming Runtime Environment (SPRE) because it took less effort than figuring out how to get it done properly in CAS.

Then I started a project with a seriously large data set and quickly hit the limit on how much data I could pull down to process in SPRE. And although I could adjust the DATALIMIT option to retrieve more data than the default limit, I was wasting time and server resources unnecessarily moving the data between CAS and SPRE. All this, just so I could process the data “old school.”

I decided to challenge myself to do ALL my data preparation in CASL. I love a good challenge! I started collecting various useful CASL code snippets. In this post, I am sharing the tidbits I’ve accumulated, along with some commentary. Note, you can execute CAS actions from multiple clients, including SAS, Python, R, Lua and Java. Since my objective was to transition from traditional SAS code to CASL, I’ll focus solely on CAS actions from the SAS client perspective. While I used SAS Viya 3.5 for this work, most of the code snippets should work on prior versions as well.

The sections below cover: how to submit CASL code; loading, saving, dropping and deleting data; exploring data; table metadata management; and data transformation. Feel free to jump ahead to any section of interest.

How do you submit CASL code?

You use PROC CAS to submit CASL code from a SAS client. For example:

proc cas;
   <cas action 1>;
   <cas action 2>;
   …;
quit;

Similarly to other interactive procs that use run-group processing, separate CAS actions by run; statements. For example:

proc cas;
   <cas action 1>;
   run;
   <cas action 2>;
   run;
quit;

In fact, you can have the entire data preparation and analysis pipeline wrapped inside a single PROC CAS, passing data and results in the form of CASL variables from one action to the next. It can really be quite elegant.

Moving Data Using PROC CAS

Loading SASHDAT data in CAS

Your data must be in the SASHDAT format for CAS to process it. To load a SASHDAT table into CAS, use the table.loadtable CAS action. The code below assumes your SASHDAT table is saved to a directory on disk associated with your current active caslib, and you are loading it into the same caslib. (This usually occurs when you already performed the conversion to SASHDAT format, but the data has been unloaded. If you are just starting out and are wondering how to get your data into the SASHDAT format in the first place, the next session covers it, so keep reading.)

proc cas; 
     table.loadtable / path="TABLE_NAME.sashdat" casOut="TABLE_NAME"; 
     table.promote /name="TABLE_NAME" drop=true; 
quit;

The table.promote action elevates your newly loaded CAS table to global scope, making it available to other CAS sessions, including any additional sessions you start, or to other users assuming they have the right privileges. I can’t tell you how many times I forgot to promote my data, only to find that my hard-earned output table disappeared because I took a longer coffee break than expected! Don’t forget to promote or save off your data (or both, to be safe).

If you are loading from a directory other than the one associated with your active caslib, modify the path= statement to include the relative path to the source directory – relative to your active caslib. If you are looking to load to a different caslib, modify the casOut= statement by placing the output table name and library in curly brackets. For example:

proc cas;
    table.loadtable / path="TABLE_NAME.sashdat" casOut={name="TABLE_NAME" caslib="CASLIB2"};
    table.promote /name="TABLE_NAME" drop=true;
quit;

You can also place a promote=true option inside the casOut= curly brackets instead of calling the table.promote action, like so:

proc cas;
    table.loadtable / path="TABLE_NAME.sashdat" 
                      casOut={name="TABLE_NAME" caslib="CASLIB2" promote=true};
quit;

Curly brackets are ubiquitous in CASL (and quite unusual for SAS 9.4). If you take away one thing from this post, make it “watch your curly brackets.”

Loading SAS7BDAT, delimited data, and other file formats in CAS

If you have a SAS7BDAT file already on disk, load it in CAS with this code:

proc cas;
    table.loadtable /path="TABLE_NAME.sas7bdat" casout="TABLE_NAME" 
                     importoptions={filetype="basesas"};
quit;

Other file formats load similarly – just use the corresponding filetype= option to indicate the type of data you are loading, such as CSV, Excel, Document (.docx, .pdf, etc.), Image, Video, etc. The impressive list of supported file types is available here.

proc cas;
    table.loadtable / path="TABLE_NAME.csv" casout="TABLE_NAME" 
                      importoptions={filetype="csv"};
    run;
quit;

You can include additional parameters inside the importOptions= curly brackets, which differ by the file type. If you don’t need any additional parameters, use the filetype=”auto” and let CAS determine the best way to load the file.

When loading a table in SAS7BDAT, delimited or some other format, the table.loadtable action automatically converts your data to SASHDAT format.

Loading data in CAS conditionally

Imagine you are building a script to load data conditionally – only if it’s not already loaded. This is handy if you have a reason to believe the data might already be in CAS. To check if the data exists in CAS and load conditionally, you can leverage the table.tableExists action in combination with if-then-else logic. For example:

proc cas;
    table.tableExists result =r / name="TABLE_NAME";
    if r=0  then do;
        table.loadtable / path="TABLE_NAME.sashdat" casOut={name="TABLE_NAME"};
        table.promote /name="YOUR_TABLE_NAME" drop=true;
    end;
    else print("Table already loaded");
quit;

Notice that the result=r syntax captures the result code from the tableExists action, which is evaluated before the loadtable and promote actions are executed. If the table is already loaded in CAS, “Table already loaded” is printed to the log. Otherwise, the loadtable and promote actions are executed.

The ability to output CAS action results to a CASL variable (such as result=r in this example) is an extremely powerful feature of CASL. I include another example of this further down, but you can learn more about this functionality from documentation or this handy blog post.

Saving your CAS data

Let’s pretend you’ve loaded your data, transformed it, and promoted it to global scope. You or your colleagues can access it from other CAS sessions. You finished your data preparation, right? Wrong. As the header of this section suggests, you also need to save your prepared CAS data. Why? Because up to this point, your processed and promoted data exists only in memory. You will lose your work if your SAS administrator reboots the server or restarts the CAS controller. If you need to quickly reload prepared data, you must back it up to a caslib’s data source. See the CAS data lifecycle for more details.

To save off CAS data, naturally, you use the table.save action. For example:

proc cas;
    table.save / table="TABLE_NAME" name="TABLE_NAME.sashdat" replace=true;
quit;

In this example, you save off the CAS table to disk as a SASHDAT file, defaulting to the location associated with your active caslib. You can modify the table.save parameters to save or export the data to an alternative data storage solution with full control over the file format (including but not limited to such popular options as HDFS, Oracle, SQL Server, Salesforce, Snowflake and Teradata), compression, partitioning and other options.

Dropping and deleting data

To drop a table from CAS, execute a table.droptable action. For example:

proc cas;
    table.droptable / name="TABLE_NAME" quiet=true;
quit;

The quiet=true option prevents CAS from generating an error if the table does not exist in CAS. Dropping a table deletes it from memory. It’s a good practice to drop tables you no longer need, particularly the one you have promoted. Local-scope tables disappear on their own when the session expires, whereas global tables will stay in memory until they are unloaded.

Dropping a table does not delete the underlying source data. To delete the source of a CAS table, use the table.deleteSource action. For example:

proc cas;
    table.deletesource / source="TABLE_NAME.sashdat" quiet=true;
quit;

Exploring Data Using PROC CAS

After taking a close look at moving the data using PROC CAS, let’s look at some useful ways to start exploring and manipulating CAS data.

Fetching sample data

When preparing data, I find it useful to look at sample data. The table.fetch action is conceptually similar to PROC PRINT and, by default, outputs the first 20 rows of a CAS table:

proc cas;
table.fetch / table="Table_Name";
quit;

You can modify the table.fetch options to control which observations and variables to display and how to display them. For example:

proc cas;
table.fetch / table={name="TABLE_NAME" where="VAR1 in ('value1','value2')"},              /*1*/
	      orderby={{name="VAR1"},                                                     /*2*/
                             {name="VAR2", order="descending"}
                             },	
	     fetchvars={{name="VAR1", label="Variable 1"},                                /*3*/
	                     {name="VAR2", label="Variable 2"}, 
 		             {name="VAR3", label="Variable 3", format=comma12.1}
                             },
	     to=50,                                                                       /*4*/
	     index=false;							          /*5*/
quit;

In the code snippet above:

  • #1 – where= statement limits the records to those meeting the where criteria.
  • #2 – orderby= option defines the sort order. Ascending is the default and is not required. If sorting by more than one variable, put them in a list inside curly brackets, as shown in this example. If a list item has a subparameter (such as order= here), encase each item in curly brackets.
  • #3 – fetchvars= option defines the variables to print as well as their display labels and formats. If you select more than one variable, put them in a list inside curly brackets, as shown here. And again, if a list item includes a subparmeter, then enclose each list item in curly brackets.
  • #4 – to= option defines the number of rows to print.
  • #5 – index= false option deactivates the index column in the output (the default is index=true). This is similar to the noobs option in PROC PRINT.

As mentioned earlier, make sure to watch your curly brackets!

Descriptive statistics and variable distributions

The next step in data exploration is looking at descriptive statistics and variable distributions. I would need a separate blog post to cover this in detail, so I only touch upon a few of the many useful CAS actions.

To look at statistics for numeric variables, use the simple.summary action, which computes standard descriptive statistics, such as minimum, maximum, mean, standard deviation, number missing, and so on. For example:

proc cas;
    simple.summary / table="TABLE_NAME";
quit;

Among its other features, the simple.summary action allows analysis by one or more group-by variables, as well as define the list of desired descriptive statistics. For example:

proc cas;
simple.summary / table={name="TABLE_NAME", groupBy="VAR1", vars={"NUMVAR1","NUMVAR2”}},
                 subSet={"MAX", "MIN", "MEAN", "NMISS"};
quit;

Another useful action is simple.topK, which selects the top K and bottom K values for variables in a data set, based on a user-specified ranking order. The example below returns the top 5 and bottom 5 values for two variables based on their frequency:

proc cas;
simple.topk / table="TABLE_NAME" 
              aggregator="N",                        
              inputs={"VAR1","VAR2"},
              topk=5,
              bottomk=5;
quit;

Simple is a rich action set with heaps of useful options covered in the documentation.

You may be wondering – what about crosstabs and frequency tables? The simple action set includes freq and crosstab actions. In addition, the action closely imitating the functionality of the beloved PROC FREQ is freqTab.freqTab. For example, the code snippet below creates frequency tables for VAR1, VAR2 and a crosstab of the two.

proc cas;
freqtab.freqtab / table="TABLE_NAME"
                  tabulate={"VAR1","VAR2",
                  {vars={"VAR1","VAR2"}}
                  };
quit;

Managing CAS Table Variables

Changing table metadata

One of the basic tasks after exploring your data is changing table metadata, such as dropping unnecessary variables, renaming tables and columns, and changing variable formats and labels. The table.altertable action helps you with these housekeeping tasks. For example, the code snippet below renames the table, drops two variables and renames and changes labels for two variables:

proc cas;
    table.altertable / table="TABLE_NAME" rename="ANALYTIC_TABLE"
                       drop={"VAR1",”VAR2”}
                       columns={{name="VAR3" rename="ROW_ID" label="Row ID"},
                                {name="VAR4" rename="TARGET" label="Outcome Variable"}
                                }
                       ;
quit;

Outputting variable list to a data set

Another useful trick I frequently use is extracting table columns as a SAS data set. Having a list of variables as values in a data set makes it easy to build data-driven scripts leveraging macro programming. The code snippet below provides an example. Here we encounter another example of capturing action result as a CASL variable and using it in further processing – I can’t stress enough how helpful this is!

proc cas;
    table.columninfo r=collinfo / table={name="TABLE_NAME"};       /*1*/
    collist=collinfo["ColumnInfo"];                                /*2*/
    saveresult collist casout="collist";                           /*3*/
quit;

In the snippet above:

  • #1 - the columninfo action collects column information. The action result is passed to a CASL variable collinfo. Notice, instead of writing out result=, I am using an alias r =.
  • #2 - the portion of the a CASL variable collinfo containing column data is extracted into another CASL variable collist.
  • #3 - the saveresult statement sends the data to a CAS table collist. If you want to send the results to a SAS7BDAT data set, replace casout= with dataout=, and provide the library.table_name information.

Transforming the Data

Lastly, let’s look at some ways to use CAS actions to transform your data. Proc SQL and DATA step are the two swiss-army knives in SAS 9 developers’ toolkit that take care of 90% of the data prep. The good news is you can execute both DATA Step and SQL directly from PROC CAS. In addition, call the transpose action to transpose your data.

Executing DATA Step code

The dataStep.runCode action enables you to run DATA step code directly inside PROC CAS. You must enclose your DATA step code in quotation marks after the code= statement. For example:

proc cas;
    dataStep.runCode /
    code="
        data table_name;
        set table_name;
        run;
        ";
quit;

Running DATA step code in CAS allows access to sophisticated group-by processing and the use of such popular programming techniques as first- and last-dot. Refer to the documentation for important nuances related to processing in a distributed, multi-threaded environment of CAS.

Executing FedSQL

To run SQL in CASL, use the fedSQL.execDirect action. Enclose the SQL query in quotation marks following the query= statement. Optionally, you can use the casout= statement to save the results to a CAS table. For example:

proc cas;
    fedsql.execDirect/
    query=
         "
          select  *
          from TABLE1 a  inner join TABLE2 b
          on a.VAR1 = b.VAR1
         "
    casout={name="TABLE3", replace=True};
quit;

Similarly to DATA step, be aware of the many nuances when executing SQL in CAS via FedSQL. Brian Kinnebrew provides an excellent overview of FedSQL in his SAS Communities article, and the documentation has up-to-date details on the supported functionality.

Transposing data

Transposing data in PROC CAS is a breeze. The example below uses transpose.transpose action to restructure rows into columns.

proc cas;
    transpose.transpose /
          table={name="TABLE_NAME", groupby={"VAR1"}}
          transpose={"VAR2"}
          id={"VAR3"}
          prefix="Prefix"
    casout={name="TRANSPOSED" replace=true};
run;

You can transpose multiple variables in the same transpose action. Simply place additional variables inside the curly brackets following transpose=, in quotes, separated by a comma.

Conclusion

PROC CAS is a wrapper procedure enabling you to leverage SAS’ new programming language - CASL. CASL enables you to submit CAS actions directly to SAS Cloud Analytic Services engine from a SAS client. This post provided examples of loading, managing, exploring and transforming your data through CAS actions. Certain new functionality in CAS is only available through CAS actions, so getting comfortable with CASL makes sense. Fear not, and let the curly brackets guide the way 😊.

Acknowledgement

I would like to thank Brian Kinnebrew for his thoughtful review and generous help on my journey learning CASL.

Challenge accepted: Learning data prep in CASL was published on SAS Users.

3月 162020
 

Books about statistics and machine learning often discuss the tradeoff between bias and variance for an estimator. These discussions are often motivated by a sophisticated predictive model such as a regression or a decision tree. But the basic idea can be seen in much simpler situations. This article presents a simple situation that is discussed in a short paper by Dan Jeske (1993). Namely, if a random process produces integers with a known set of probabilities, what method should you use to predict future values of the process?

I will start by briefly summarizing Jeske's result, which uses probability theory to derive the best biased and unbiased estimators. I then present a SAS program that simulates the problem and compares two estimators, one biased and one unbiased.

A random process that produces integers

Suppose a gambler asks you to predict the next roll of a six-sided die. He will reward you based on how close your guess is to the actual value he rolls. No matter what number you pick, you only have a 1/6 chance of being correct. But if the strategy is to be close to the value rolled, you can compute the expected value of the six faces, which is 3.5. Assuming that the gambler doesn't let you guess 3.5 (which is not a valid outcome), one good strategy is to round the expected value to the nearest integer. For dice, that means you would guess ROUND(3.5) = 4. Another good strategy is to randomly guess either 3 or 4 with equal probability.

Jeske's paper generalizes this problem. Suppose a random process produces the integers 1, 2, ..., N, with probabilities p1, p2, ..., pN, where the sum of the probabilities is 1. (This random distribution is sometimes called the "table distribution.") If your goal is to minimize the mean squared error (MSE) between your guess and a series of future random values, Jeske shows that the optimal solution is to guess the value that is closest to the expected value of the random variable. I call this method the ROUNDING estimator. In general, this method is biased, but it has the smallest expected MSE. Recall that the MSE is a measure of the quality of an estimator (smaller is better).

An alternative method is to randomly guess either of the two integers that are closest to the expected value, giving extra weight to the integer that is closer to the expected value. I call this method the RANDOM estimator. The random estimator is unbiased, but it has a higher MSE.

An example

The following example is from Jeske's paper. A discrete process generates a random variable, X, which can take the values 1, 2, and 3 according to the following probabilities:

  • P(X=1) = 0.2, which means that the value 1 appears with probability 0.2.
  • P(X=2) = 0.3, which means that the value 2 appears with probability 0.3.
  • P(X=3) = 0.5, which means that the value 3 appears with probability 0.5.

A graph of the probabilities is shown to the right. The expected value of this random variable is E(X) = 1(0.2) + 2(0.3) + 3(0.5) = 2.3. However, your guess must be one of the feasible values of X, so you can't guess 2.3. The best prediction (in the MSE sense) is to round the expected value. Since ROUND(2.3) is 2, the best guess for this example is 2.

Recall that an estimator for X is biased if its expected value is different from the expected value of X. Since E(X) ≠ 2, the rounding estimator is biased.

You can construct an unbiased estimator by randomly choosing the values 2 and 3, which are the two closest integers to E(X). Because E(X) = 2.3 is closer to 2 than to 3, you want to choose 2 more often than 3. You can make sure that the guesses average to 2.3 by guessing 2 with probability 0.7 and guessing 3 with probability 0.3. Then the weighted average of the guesses is 2(0.7) + 3(0.3) = 2.3, and this method produces an unbiased estimate. The random estimator is unbiased, but it will have a larger MSE.

Simulate the prediction of a random integer

Jeske proves these facts for an arbitrary table distribution, but let's use SAS to simulate the problem for the previous example. The first step is to compute the expected values of X. This is done by the following DATA step, which puts the expected value into a macro variable named MEAN:

/* Compute the expected value of X where 
   P(X=1) = 0.2
   P(X=2) = 0.3
   P(X=3) = 0.5
*/
data _null_;
array prob[3] _temporary_ (0.2, 0.3, 0.5);
do i = 1 to dim(prob);
   ExpectedValue + i*prob[i];       /* Sum of i*prob[i] */
end;
call symputx("Mean", ExpectedValue);
run;
 
%put &=Mean;
MEAN=2.3

The next step is to predict future values of X. For the rounding estimator, the predicted value is always 2. For the random estimator, let k be the greatest integer less than E(X) and let F = E(X) - k be the fractional part of E(x). To get an unbiased estimator, you can randomly choose k with probability 1-F and randomly choose k+1 with probability F. This is done in the following DATA step, which makes the predictions, generates a realization of X, and computes the residual difference for each method for 1,000 random values of X:

/* If you know mean=E(X)=expected value of X, Jeske (1993) shows that round(mean) is the best 
   MSE predictor, but it is biased.
   Randomly guessing the two closest integers is the best UNBIASED MSE predictor
   https://www.academia.edu/15728006/Predicting_the_value_of_an_integer-valued_random_variable
 
   Use these two predictors for 1000 future random variates.
*/
%let NumGuesses = 1000;
data Guess(keep = x PredRound diffRound PredRand diffRand);
call streaminit(12345);
array prob[3] _temporary_ (0.2, 0.3, 0.5);  /* P(X=i) */
 
/* z = floor(z) + frac(z) where frac(z) >= 0 */
/* https://blogs.sas.com/content/iml/2020/02/10/fractional-part-of-a-number-sas.html */
k = floor(&Mean);
Frac = &Mean - k;                        /* distance from E(X) to x */
do i = 1 to &NumGuesses;
   PredRound = round(&Mean);             /* guess the nearest integer */
   PredRand = k + rand("Bern", Frac);    /* random guesses between k and k+1, weighted by Frac */
   /* The guesses are made. Now generate a new instance of X and compute residual difference */
   x = rand("Table", of prob[*]);
   diffRound = x - PredRound;            /* rounding estimate */
   diffRand  = x - PredRand;             /* unbiased estimate */
   output;
end;
run;
 
/* sure enough, ROUND is the best predictor in the MSE sense */
proc means data=Guess n USS mean;
   var diffRound DiffRand;
run;

The output from PROC MEANS shows the results of generating 1,000 random integers from X. The uncorrected sum of squares (USS) column shows the sum of the squared residuals for each estimator. (The MSE estimate is USS / 1000 for these data.) The table shows that the USS (and MSE) for the rounding estimator is smaller than for the random estimator. On the other hand, The mean of the residuals is not close to zero for the rounding method because it is a biased method. In contrast, the mean of the residuals for the random method, which is unbiased, is close to zero.

It might be easier to see the bias of the estimators if you look at the predicted values themselves, rather than at the residuals. The following call to PROC MEANS computes the sample mean for X and the two methods of predicting X:

/* the rounding method is biased; the random guess is unbiased */
proc means data=Guess n mean stddev;
   var x PredRound PredRand;
run;

This output shows that the simulated values of X have a sample mean of 2.34, which is close to the expected value. In contrast, the rounding method always predicts 2, so the sample mean for that column is exactly 2.0. The sample mean for the unbiased random method is 2.32, which is close to the expected value.

In summary, you can use SAS to simulate a simple example that compares two methods of predicting the value of a discrete random process. One method is biased but has the lowest MSE. The other is unbiased but has a larger MSE. In statistics and machine learning, practitioners often choose between an unbiased method (such as ordinary least squares regression) and a biased method (such as ridge regression or LASSO regression). The example in this article provides a very simple situation that you can use to think about these issues.

The post Predict a random integer: The tradeoff between bias and variance appeared first on The DO Loop.

3月 132020
 

If you have been using SAS for long, you have probably noticed that there is generally more than one way to do anything. The Little SAS Book has long covered reading and writing Microsoft Excel files with the IMPORT and EXPORT procedures, but for the Sixth Edition we decided it was time to add two more ways: The ODS EXCEL destination makes it easy to convert procedure results into Excel files, while the XLSX LIBNAME engine allows you to access Excel files as if they were SAS data sets.

With the XLSX LIBNAME engine, you can convert an Excel file to a SAS data set (or vice versa) if you want to, but you can also access an Excel file directly without the need for a SAS data set. This engine works for files created using any version of Microsoft Excel 2007 or later in the Windows or UNIX operating environments. You must have SAS 9.4M2 or higher and SAS/ACCESS Interface to PC Files software. A nice thing about this engine is that it works with any combination of 32 bit and 64 bit systems.

The XLSX LIBNAME engine uses the first line in your file for the variable names, scans each full column to determine the variable type (character or numeric), assigns lengths to character variables, and recognizes dates, and numeric values containing commas or dollar signs. While the XLSX LIBNAME engine does not offer many options, because you are using an Excel file like a SAS data set, you can use some standard data set options. For example, you can use the RENAME= data set option to change the names of variables, and FIRSTOBS= and OBS= to select a subset of rows.

Reading an Excel file as is 

Suppose you have the following Excel file containing data about magnolia trees:


With the XLSX LIBNAME engine, SAS can read the file, without first converting it to a SAS data set. Here is a PROC PRINT that prints the data directly from the Excel file.

* Read Excel spreadsheet XLSX LIBNAME;
LIBNAME exfiles XLSX ‘c:\MyExcel\Trees.xlsx’;
PROC PRINT DATA = exfiles.sheet1;
   TITLE ‘PROC PRINT of Excel File’;
RUN;

Here are the results of the PROC PRINT. Notice that the variable names were taken from the first row in the file.

Converting an Excel file to a SAS data set 

If you want to convert an Excel file to a SAS data set, you can do that too. Here is a DATA step that reads the Excel file. The RENAME= data set option changes the variable name MaxHeight to MaxHeightFeet. Then a new variable is computed which is equal to the height in meters.

* Import Excel into a SAS data set;
DATA magnolia;
SET exfiles.sheet1 (RENAME = (MaxHeight = MaxHeightFeet));
MaxHeightMeters = ROUND(MaxHeightFeet * 0.3048);
RUN;

Here is the SAS data set with the renamed and new variables:

Writing to an Excel file 

It is just as easy to write to an Excel file as it is to read from it.

* Write a new sheet to the Excel file;
DATA exfiles.trees;
   SET magnolia;
RUN;
LIBNAME exfiles CLEAR;

Here is what the Excel file looks like with the new sheet. Notice that the new tab is labeled with the name of the SAS data set TREES.

Another nice thing about the XLSX LIBNAME is that it only locks a spreadsheet while SAS is accessing it. So generally speaking, it’s not necessary to issue a second LIBNAME statement to clear the libref. However, I did find, when I ran this in SAS Enterprise Guide, that I could not open the Excel spreadsheet unless I cleared the libref. So you can probably skip the LIBNAME CLEAR statement if you are using Display Manager or SAS Studio.

The XLSX LIBNAME engine is so flexible and easy to use that we think it’s a great addition to any SAS programmer’s skill set.

For more about the XLSX LIBNAME engine, I recommend this blog by Chris Hemedinger.

3月 122020
 

In parts one and two of this blog series, we introduced hybrid marketing as a method that combines both direct and digital marketing capabilities while absorbing insights from machine learning. According to Daniel Newman (Futurum Research) and Wilson Raj (SAS) in the October 2019 research study Experience 2030: “Brands must [...]

SAS Customer Intelligence 360: Hybrid marketing and analytic's last mile [Part 3] was published on Customer Intelligence Blog.

3月 112020
 
Regular polygons approximate a circle

Recently, I saw a graphic on Twitter by @neilrkaye that showed the rapid convergence of a regular polygon to a circle as you increase the number of sides for the polygon. The author remarked that polygons that have 40 or more sides "all look like circles to me." That is, a regular polygon with a large number of sides is visually indistinguishable from a circle.

I had two reactions to the graphic. The first was "Cool! I want to create a similar figure in SAS!" (I have done so on the right; click to enlarge.) The second reaction was that the idea is both mathematically trivial and mathematically sophisticated. It is trivial because it is known to Archimedes more than 2,000 years ago and is readily understood by using high school math. But it is sophisticated because it is a particular instance of a mathematical limit, which is the foundation of calculus and the mathematical field of analysis.

The figure also demonstrates the fact that you can approximate a smooth object (a circle, a curve, a surface, ...) by a large number of small, local, linear approximations. It is not an exaggeration to say that that concept is among the most important concepts in applied mathematics. It is used in numerical methods to solve nonlinear equations, integrals, differential equations, and more. It is used in numerical optimization. It forms the foundations of computer graphics. A standard technique in computational algorithms that involves a nonlinear function is to approximate the function by the first term of the Taylor expansion—a process known as linearization.

An approximation of pi

Archimedes used this process (local approximation by a linear function) to approximate pi (π), which is the ratio of the circumference of a circle to its diameter. He used two sets of regular polygons: one that is inscribed in the unit circle and the other that circumscribes the unit circle. The circumference of an inscribed polygon is less than the circumference of the circle. The circumference of a circumscribed polygon is greater than the circumference of the circle. They converge to a common value, which is the circumference of the circle. If you apply this process to a unit circle, you approximate the value 2π. Archimedes used the same construction to approximate the area of the unit circle, which is π.

You can use Archimedes's ideas to approximate π by using trigonometry and regular polygons, as shown in the following paragraph.

Suppose that a regular polygon has n sides. Then the central angle between adjacent vertices is θ = 2π/n radians. The following diagram illustrates the geometry of inscribed and circumscribed polygons. The right triangle that is shown is half of the triangle between adjacent vertices. Consequently,

  • The half-length of a side is b, where b = cos(θ/2) for the inscribed polygon and b = tan(θ/2) for the circumscribed polygon.
  • The height of the triangle is h, where h = |sin(θ/2)| for the inscribed polygon and h = 1 for the circumscribed polygon.
  • The circumfernce of the regular polygon is 2 n b
  • The area of the regular polygon is 2 n ( b h/2 ).

Approximate pi by using Archimedes's method

Although Archimedes did not have access to a modern computer, you can easily write a SAS DATA step program to reproduce Archimedes's approximations to the circumference and area of the unit circle, as shown below:

/* area and circomference */
data ApproxCircle;
pi = constant('pi');           /* Hah! We must know pi in order to evaluate the trig functions! */
south = 3*pi/2;                /* angle that is straight down */
do n = 3 to 100;
   angle = 2*pi/n;
   theta = south + angle/2;    /* first vertex for this n-gon */
   /* Circumference and Area for circumscribed polygon */
   b = tan(angle/2);
   h = 1;
   C_Out = 2*n * b;
   A_Out = 2*n * 0.5*b*h;
   /* Circumference and Area for inscribed polygon */
   b = cos(theta);
   h = abs( sin(theta) );
   C_In = 2*n * b;
   A_In = 2*n * 0.5*b*h;
   /* difference between the circumscribed and inscribed circles */
   CDiff = C_Out - C_In;
   ADiff = A_Out - A_In;
   output;
end;
label CDiff="Circumference Difference" ADiff="Area Difference";
keep n C_In C_Out A_In A_Out CDiff ADiff;
run;
 
/* graph the circumference and area of the n-gons as a function of n */
ods graphics / reset height=300px width=640px;
%let curveoptions = curvelabel curvelabelpos=min;
title "Circumference of Regular Polygons";
proc sgplot data=ApproxCircle;
   series x=n y=C_Out / &curveoptions;
   series x=n y=C_In  / &curveoptions;
   refline 6.2831853 / axis=y;              /* reference line at 2 pi */
   xaxis grid values=(0 to 100 by 10) valueshint;
   yaxis values=(2 to 10) valueshint label="Regular Polygon Approximation";
run;
 
title "Area of Regular Polygons";
proc sgplot data=ApproxCircle;
   series x=n y=A_Out / &curveoptions;
   series x=n y=A_In  / &curveoptions;
   refline 3.1415927 / axis=y;              /* reference line at pi */
   xaxis grid values=(0 to 100 by 10) valueshint;
   yaxis  values=(2 to 10) valueshint label="Regular Polygon Approximation";
run;

As you can see from the graph, the circumference and area of the regular polygons converge quickly to 2π and π, respectively. After n = 40 sides, the curves are visually indistinguishable from their limits, which is the same result that we noticed visually when looking at the grid of regular polygons.

The DATA step also computes the difference between the measurements of the circumscribed and inscribed polygons. You can print out a few of the differences to determine how close these estimates are to the circumference and area of the unit circle:

 
proc print data=ApproxCircle noobs;
   where n in (40, 50, 56, 60, 80, 100);
   var n CDiff ADiff;
run;

The table shows that the difference between the circumference of the circumscribed and inscribed polygons is about 0.02 when n = 40. For n = 56, the difference is less than 0.01, which means that the circumference of a regular polynomial approximates the circumference of the unit circle to two decimal places when n ≥ 56. If you use a regular 100-gon, the circumference is within 0.003 of the circumference of the unit circle. Although it is not shown, it turns out you need to use 177 sides before the difference is within 0.001, meaning that a 177-gon approximates the circumference of the unit circle to three decimal places.

Similar results hold for the area of the polygons and the area of a unit circle.

In conclusion, not only does a regular n-gon look very similar to the circle when n is large, but you can quantify how quickly the circumference and areas of an n-gon converges to the values 2 π and π, respectively. For n=56, the polygon values are accurate to two decimal places; for n=177, the polygon values are accurate to three decimal places.

Approximating a smooth curve by a series of discrete approximations is the foundation of calculus and modern numerical methods. The idea had its start in ancient Greece, but the world had to wait for Newton, Leibnitz, and others to provide the mathematical machinery (limits, convergence, ...) to understand the concepts rigorously.

The post Polygons, pi, and linear approximations appeared first on The DO Loop.

3月 112020
 

Automating SAS applications development

SAS variable labels are unique features of SAS data tables (aka data sets) that allow SAS users to enhance reading and interpretation of tables and reports.

Whether you use SAS data table as a data source in any of the reporting procedures or interactive interface such as SAS Visual Analytics, you will benefit from pre-assigning meaningful labels during the data preparation process. Besides being more efficient, such an early label assignment secures consistency of the data elements descriptions (labels) across different developers.

The most direct way of creating column labels is by explicitly assigning them to the data variables. You can do it during the data table creation in a DATA step using either LABEL statement or ATTRIB statement. Alternatively, you can do it after your data table is already created by using PROC DATASETS’ MODIFY statement with the LABEL= option.

However, in many situations there are ways of automating this tedious and voluminous process of column labels creation. Let’s look at one of them that I found useful for bulk column labeling. Plus, we are going to explore SAS coding technique using _DATA_and_LAST_special data sets.

Deriving variable labels from variable names

This method is suitable when variable names are well-formed, for example CUSTOMER_ADDRESS, FIRST_NAME, LAST_NAME, COMPANY_NAME, PLACE_OF_BIRTH, etc. Kudos to data designer!

We can transform these names into labels by replacing underscores with space characters and converting words from upper case to proper case. These are the labels we will get: Customer Address, First Name, Last Name, Company Name, Place Of Birth.

Let’s say our original data table is DEMO:

data DEMO;
   input CUSTOMER_CITY $ 1-15 FIRST_NAME $ 16-26 LAST_NAME $27-37 COMPANY_NAME $38-50 COUNTRY_OF_BIRTH $51-65;
   datalines;
Washington     Peter      Birn       Citibank     USA
Denver         Lisa       Roth       IBM          UK
Cary           Antony     Bessen     SAS          Spain
;

Then the following macro will create variable labels out of the variable names as described above:

options mprint; 
%macro ilabel (dataset);
   %local lbref dsname vname vlabel nvars;
 
   %if %index(&dataset,.) %then
   %do; /* 2-level dataset name */
      %let lbref  = %scan(&dataset,1,'.');
      %let dsname = %scan(&dataset,2,'.');
   %end;
   %else
   %do; /* 1-level dataset name */
      %let lbref  = WORK;
      %let dsname = &dataset;
   %end;
 
   /* get variable names */
   proc contents data=&dataset out=_data_(keep=name) noprint;
   run;
 
   /* create name/label pairs */
   data _null_;
      set _last_ end=eof nobs=n;
      call symput('vname'!!strip(put(_n_,best.)),name);
      lbl = propcase(translate(name,' ','_'));
      call symput('vlabel'!!strip(put(_n_,best.)),trim(lbl));
      if eof then call symputx('nvars',n);
   run;
 
   /* modify variable labels */
   proc datasets lib=&lbref nolist;
      modify &dsname;
         label
            %do i=1 %to &nvars;
               &&vname&i = "&&vlabel&i"
            %end;
         ;
   quit;
 
%mend ilabel;

You can invoke this macro by either one line of code:

%ilabel(DEMO)

or

%ilabel(WORK.DEMO)

Here are how our DEMO table looks before and after %ilabel macro modifies/assigns the labels based on the column names:

BEFORE:

Data table showing column names

 

AFTER:

Data table showing column labels

Macro code highlights

In this macro, we:

  1. Define local macro variables to make sure their names will not interfere with possible namesakes in the calling program.
  2. Determine libref and one-level data set name for the input data set.
  3. Create a table containing variable names in the input data set using PROC CONTENTS.
  4. Use DATA _NULL_ step to read through the variable names, and derive labels as

    lbl = propcase(translate(name,' ','_'));

    Here, transalate() function replaces underscores with blanks, then propcase() function converts every word in an argument to proper case (upper case for the first character and lower case for the remaining characters). We also create macro variables for each name/label pair (vname1, vlabel1, vname2, vlabel2, …) and macro variable nvars representing the number of such pairs.

  5. Use PROC DATASETS with MODIFY and LABEL statements to assign generated column labels to the source data set.

If some of the labels assigned by this macro are not what you need you may run another PROC DATASETS to individually adjust (re-assign) them according to your wishes or specification. But when you need to label data set columns on a large scale (many tables with dozens or hundreds of columns) this can be a good first draft that can save you time and efforts.

_DATA_ and _LAST_ special data sets

You might notice that I used _data_ data set name in the out= option of the PROC CONTENTS. This is not an explicit data set name; it is a keyword, a special data set that allows SAS to assign one of the available data set names dynamically. The created output data set will have a name that looks something like DATA1 or DATA2, etc. Try running this code:

data _data_;
   x=1;
run;

and look in the SAS LOG at what data set is created. I got:

NOTE: The data set WORK.DATA1 has 1 observations and 1 variables.

Special data set name _data_ tells SAS to create a data set in the WORK library from a list of names DATA1, DATA2, … according to the DATAn naming convention. These names (as well as WORK library) are unique for a given SAS session. The first time you use _data_ within a SAS session it will create data set named WORK.DATA1, the second time you use _data_ it will create WORK.DATA2, and so on.

Consequently, I used special data set name _last_ in the SET statement of the DATA step following the PROC CONTENTS. Again, here _last_ is a keyword, not a data set name; it is a special data set that refers to the name of the last created data set during your SAS session. That causes SAS to use the latest data set created prior to the _last_ reference.

Special data sets _data_ and _last_ are reserved names (or SAS keywords) along with special data set _null_ that is used in the DATA _NULL_ statement and causes SAS to execute the DATA step without creating a data set. (By the way, using DATA _NULL_ can increase your code efficiency when you use the DATA step for custom report writing or creating macro variables or other processing for which the output data set is not needed as it does not consume computer resources for writing and storing the output data set.)

If I were using an explicit table name in this macro instead, and your calling program accidentally were using the same table name, then the macro would overwrite your table which would wreak havoc to your program. Using _data_ and _last_ special data sets protect your SAS program from a possibility of inadvertently overwriting your other data set with the same name by executing the %ilabel macro. It is similar to using %LOCAL for macro variable names for protecting from possible overwrites of your %GLOBAL macro variables with the same names.

A WORD OF CAUTION: Remember, that the _data_ keyword creates table names that are unique only within a SAS session, so it works perfectly for the WORK data library which itself is a unique instance for a SAS session. While it is syntactically correct to use special data set notation _data_ for creating permanent data sets such as libref._data_ (including SASUSER._data_), I have to warn you against using it as it will not guarantee the name uniqueness in the permanent data library, and you may end up overwriting data sets that already exist there.

Your thoughts?

Do you find this post useful? How do you handle the task of assigning variable labels on a mass scale? Do you use _data_ and _last_ special data sets in your SAS coding? Please share in the comments section below.

Automating SAS variable labels creation was published on SAS Users.