tips & techniques

1月 132021
 

Running SAS programs in parallel reduces run time

As earth completes its routine annual circle around the sun and a new (and hopefully better) year kicks in, it is a perfect occasion to reflect on the idiosyncrasy of time.

While it is customary to think that 3+2=5, it is only true in sequential world. In parallel world, however, 3+2=3. Think about it: if you have two SAS programs one of which runs 3 hours, and the second one runs 2 hours, their total duration will be 5 hours if you run them one after another sequentially, but it will take only 3 hours if you run them simultaneously, in parallel.

I am sure you remember those “filling up a swimming pool” math problems from elementary school. They clearly and convincingly demonstrate that two pipes will fill up a swimming pool faster than one. That’s the power of running water in parallel.

The same principle of parallel processing (or parallel computing) is applicable to SAS programs (or non-SAS programs) by running their different independent pieces in separate SAS sessions at the same time (in parallel).  Divide and conquer.

You might be surprised at how easily this can be done, and at the same time how powerful it is. Let’s take a look.

SAS/CONNECT

MP CONNECT) parallel processing functionality was added to SAS/CONNECT enabling you to execute multiple SAS sessions asynchronously. When a remote SAS session kicks off asynchronously, a portion of your SAS program is sent to the server session for execution and control is immediately returned to the client session. The client session can continue with its own processing or spawn one or more additional asynchronous remote server sessions.

Running programs in parallel on a single machine

Sometimes, what comes across as new is just well forgotten old. They used to be Central Processing Units (CPU), but now they are called just processors. Nowadays, practically every single computer is a “multi-machine” (or to be precise “multi-processor”) device. Even your laptop. Just open Task Manager (Ctrl-Alt-Delete), click on the Performance tab and you will see how many physical processors (or cores) and logical processors your laptop has:

Parallel processing on a single machine

That means that this laptop can run eight independent SAS processes (sessions) at the same time. All you need to do is to say nicely “Dear Mr. & Mrs. SAS/CONNECT, my SAS program consists of several independent pieces. Would you please run each piece in its own SAS session, and run them all at the same time?” And believe me, SAS/CONNECT does not care how many logical processors you have, whether your logical processors are far away from each other “remote machines” or they are situated in a single laptop or even in a single chip.

Here is how you communicate your request to SAS/CONNECT in SAS language.

Spawning multiple SAS sessions using MP Connect

Suppose you have a SAS code that consists of several pieces – DATA or PROC steps that are independent of each other, i.e. they do not require to be run in a specific sequence. For example, each of the two pieces generates its own data set.

Then we can create these two data sets in two separate “remote” SAS sessions that run in parallel. Here is how you do this.  (For illustration purposes, I just create two dummy data sets.)

options sascmd="sas";
 
/* Current datetime */
%let _start_dt = %sysfunc(datetime());
 
/* Prosess 1 */
signon task1;
rsubmit task1 wait=no;
 
   libname SASDL 'C:\temp';
 
   data SASDL.DATA_A (keep=str);
      length str $1000;
      do i=1 to 1150000;
         str = '';
         do j=1 to 1000;
            str = cats(str,'A');
         end;
         output;
      end;
   run;
 
endrsubmit;
 
/* Process 2 */
signon task2;
rsubmit task2 wait=no;
 
   libname SASDL 'C:\temp';
 
   data SASDL.DATA_B (keep=str);
      length str $1000;
      do i=1 to 750000;
         str = '';
         do j=1 to 1000;
            str = cats(str,'B');
         end;
         output;
      end;
   run;
 
endrsubmit;
 
waitfor _all_;
signoff _all_;
 
/* Print total duration */
data _null_;
   dur = datetime() - &_start_dt;
   put 30*'-' / ' TOTAL DURATION:' dur time13.2 / 30*'-';
run;

In this code, the key elements are:

SIGNON Statement - initiates a connection between a client session and a server session.

ENDRSUBMIT statement - marks the end of a block of statements that a client session submits to a server session for execution.

SIGNOFF Statement - ends the connection between a client session and a server session.

Parallel processing vs. threaded processing

There is a distinction between parallel processing described above and threaded processing (aka multithreading). Parallel processing is achieved by running several independent SAS sessions, each processing its own unit of SAS code.

Threaded processing, on the other hand, is achieved by developing special algorithms and implementing executable codes that run on multiple processors (threads) within the same SAS session. Amdahl's law, which provides theoretical background and estimation of potential time saving achievable by parallel computing on multiple processors.

Passing information to and from “remote” SAS sessions

Besides passing pieces of SAS code from client sessions to server sessions, MP CONNECT allows you to pass some other SAS objects.

Passing data library definitions

For example, if you have a data library defined in your client session, you may pass that library definition on to multiple server sessions without re-defining them in each server session.

Let’s say you have two data libraries defined in your client session:

libname SRCLIB oracle user=myusr1 password=mypwd1 path=mysrv1;
libname TGTLIB '/sas/data/datastore1';

In order to make these data libraries available in the remote session all you need is to add

rsubmit task1 wait=no inheritlib=(SRCLIB TGTLIB);

This will allow libraries that are defined in the client session to be inherited by and available in the server session. As an option, each client libref can be associated with a libref that is named differently in the server session:

rsubmit task1 wait=no inheritlib=(SRCLIB=NEWSRC TGTLIB=NEWTGT);

Passing macro variables from client to server session

options sascmd="sas";
%let run_dt = %sysfunc(datetime());
signon task1;
%syslput rem_run_dt=&run_dt / remote=task1;
rsubmit task1 wait=no;
   %put &=rem_run_dt;
endrsubmit;
 
waitfor task1;
signoff task1;

Passing macro variables from server to client session

Similarly,

  • %SYSRPUT_USER_ </LIKE=‘character-string’>;
  • (/LIKE=<‘character-string’ >specifies a subset of macro variables whose names match a user-specified character sequence, or pattern.)

    Here is a code example that passes two macro variables, rem_start and rem_year from the remote session and outputs them to the SAS log in the client session:

    options sascmd="sas";
    signon task1;
    rsubmit task1 wait=no;
       %let start_dt = %sysfunc(datetime());
       %sysrput rem_start=&start_dt;
       %sysrput rem_year=2021;
    endrsubmit;
     
    waitfor task1;
    signoff task1;
    %put &=rem_start &=rem_year;

    Summary

    SAS’ Multi-Process Connect is a simple and efficient tool enabling parallel execution of independent programming units. Compared to sequential processing of time-intensive programs, it allows to substantially reduce overall duration of your program execution.

    Additional resources

    Running SAS programs in parallel using SAS/CONNECT® was published on SAS Users.

    12月 172020
     

    There’s nothing worse than being in the middle of a task and getting stuck. Being able to find quick tips and tricks to help you solve the task at hand, or simply entertain your curiosity, is key to maintaining your efficiency and building everyday skills. But how do you get quick information that’s ALSO engaging? By adding some personality to traditionally routine tutorials, you can learn and may even have fun at the same time. Cue the SAS Users YouTube channel.

    With more than 50 videos that show personality published to-date and over 10,000 hours watched, there’s no shortage of learning going on. Our team of experts love to share their knowledge and passion (with personal flavor!) to give you solutions to those everyday tasks.

    What better way to round out the year than provide a roundup of our most popular videos from 2020? Check out these crowd favorites:

    Most viewed

    1. How to convert character to numeric in SAS
    2. How to import data from Excel to SAS
    3. How to export SAS data to Excel

    Most hours watched

    1. How to import data from Excel to SAS
    2. How to convert character to numeric in SAS
    3. Simple Linear Regression in SAS
    4. How to export SAS data to Excel
    5. How to Create Macro Variables and Use Macro Functions
    6. The SAS Exam Experience | See a Performance-Based Question in Action
    7. How it Import CSV files into SAS
    8. SAS Certification Exam: 4 tips for success
    9. SAS Date Functions FAQs
    10. Merging Data Sets in SAS Using SQL

    Latest hits

    1. Combining Data in SAS: DATA Step vs SQL
    2. How to Concatenate Values in SAS
    3. How to Market to Customers Based on Online Behavior
    4. How to Plan an Optimal Tour of London Using Network Optimization
    5. Multiple Linear Regression in SAS
    6. How to Build Customized Object Detection Models

    Looking forward to 2021

    We’ve got you covered! SAS will continue to publish videos throughout 2021. Subscribe now to the SAS Users YouTube channel, so you can be notified when we’re publishing new videos. Be on the lookout for some of the following topics:

    • Transforming variables in SAS
    • Tips for working with SAS Technical Support
    • How to use Git with SAS

    2020 roundup: SAS Users YouTube channel how to tutorials was published on SAS Users.

    12月 072020
     

    In my previous blog post, I talked about using PROC CAS to accomplish various data preparation tasks. Since then, my colleague Todd Braswell and I worked through some interesting challenges implementing an Extract, Transform, Load (ETL) process that continuously updates data in CAS. (Todd is really the brains behind getting this to work – I am just along for the ride.)

    In a nutshell, the process goes like this:

    1. PDF documents drop into a "receiving" directory on the server. The documents have a unique SubmissionID. Some documents are very large – thousands of pages long.
    2. A Python job runs and converts PDFs into plain text. Python calls an API that performs Optical Character Recognition (OCR) and saves off the output as a CSV file, one row per page, in the PDF document.
    3. A SAS program, running in batch, loads the CSV file with new records into a CAS table. SubmissionID is passed to the batch program as a macro variable, which is used as part of the CAS table name.
    4. Records loaded from the CSV file are appended to the Main table. If records with the current SubmissionID already exist in the Main table, they are deleted and replaced with new records.
      The Main table is queried by downstream processes, which extract subsets of data, apply model score code, and generate results for the customer.

    Continuously update data process flow

    Due to the volume of data and the amount of time it takes to OCR large PDFs, the ETL process runs in multiple sessions simultaneously. And here is a key requirement: the Main table is always available, in a promoted state, with all up-to-date records, in order for the model score code to pick up the needed records.

    What does "promoted state" mean?

    The concept of table scope, which was introduced with the first release of SAS Viya, presents a challenge. CAS tables are in-memory tables that can have one of two "scopes":

    • Session scope – the table exists within the scope of your current CAS session and drops from memory as soon as the session ends. Functionally, this is somewhat similar to the data you write to the WORK library in SAS 9 – once you disconnect, the data drops from the WORK library.
    • Global scope – the table is available to all sessions. If your session ends, you will still have access to it when you start a new session. Your colleagues also maintain access, assuming they have the necessary permissions. For the table to assume global scope, it needs to be promoted.

    Common promotion techniques for a table are the DATA STEP, PROC CAS, or PROC CASUTIL. For example:

    /*promote a table using DATA STEP*/
    *this snippet copies a SAS 9 table into CAS and promotes in one step;
    data mycaslib.my_table (promote=yes);
         set mylib.my_table;
         run;
     
    /*promote using PROC CASUTIL*/
    *this snippet promotes a session-scope CAS table to global scope;
    proc casutil incaslib='mycaslib' outcaslib='mycaslib';
         promote casdata='my_table';
         quit;
     
    /*Promote using PROC CAS*/
    *same as above, this snippet promotes a session-scope table to global scope;
    proc cas;
    table.promote / 
         caslib='mycaslib'
         targetcaslib='mycaslib' 
         name='my_table' 
         target='my_table';
        quit;

    Fun Facts About Table Promotion

    You cannot promote a table that has already been promoted. If you need to promote a new version of the same table, you need to first drop the existing table and promote the new version.

    To discover the current table state, use the

    proc cas;
         table.tableinfo / 
         caslib='mycaslib' 
         name='main';
         quit;

    If you append rows to a promoted table using the DATA STEP append option, the new rows are automatically promoted. For example, in this snippet the mycaslib.main table, which is promoted, remains promoted when the rows from mycaslib.new_rows are appended to it:

    data mycaslib.main(append=yes);
         set mycaslib.new_rows;
         run;

    When you manipulate a promoted table using the DATA STEP apart from appending rows, it creates a new, session-scope version of the same table. You will have two versions of the table: the global-scope table, which remains unchanged, and the session-scope version which has the changes you implemented. Even if you don't change anything in the table and simply run:

    data mycaslib.my_table;
         set mycaslib.my_table;
         run;

    in which mycaslib.my_table is promoted, you end up with a promoted and an unpromoted version of this table in the mycaslib library – a somewhat unexpected and hardly desired result. Appendix 1 walks through a quick exercise you can try to verify this.

    As you probably guessed, this is where we ran into trouble with our ETL process: the key requirement was for the Main table to remain promoted, yet we needed to continuously update it. The task was simple if we just needed to append the rows; however, we also needed to replace the rows if they already existed. If we tried to delete the existing rows using the DATA STEP, we would have to deal with the changes applied to a session-scope copy of the global-scope table.

    Initially, we designed the flow to save off the session-scope table with changes, then drop the (original) global-scope version, and finally reload the up-to-date version. This was an acceptable workaround, but errors started to pop up when a session looked for the Main table to score data, while a different concurrent session reloaded the most up-to-date data. We were uncertain how this would scale as our data grew.

    PROC CAS to the rescue!

    After much research, we learned the deleteRows, which allows you to delete rows directly from a global-scope table. The data is never dropped to session-scope – exactly what we needed. Here's an example:

    proc cas;
         table.deleteRows /
         table={caslib="mycaslib", name="Main", where="SubmissionID = 12345"};
         quit;

    In case you are wondering, the Tables action set also has an

    /*1. Load new rows. SubmissionID macro variable is a parameter passed to the batch program*/
    /*New rows are written to the casuser library, but it does not really matter which caslib you choose – 
       we are not persisting them across sessions*/
    proc casutil;
           load file="/path_to_new_rows/New_rows_&SubmissionID..csv" outcaslib="casuser" casout="new_rows";
           quit;
    /*2. Delete rows with the current SubmissionID */
    proc cas;
           table.deleteRows /
           table={caslib="prod", name="Main", where="SubmissionID = &SubmissionID."};
           quit;
    /*3. Append new rows*/
    data mycaslib.main(append=yes);
    	set mycaslib.new_rows;
    	run;
    /*4. Save the main table to ensure we have a disk backup of in-memory data*/
    proc casutil incaslib="prod" outcaslib="prod";
    	save casdata="main" replace;
    	quit;

    Conclusion

    We learned how to continuously update data in CAS while ensuring the data remains available to all sessions accessing it asynchronously. We learned the append option in DATA STEP automatically promotes new rows but manipulating the data in a global-scope table through DATA STEP in other ways leads to data being copied to session-scope. Finally, we learned that to ensure the table remains promoted while it is updated, we can fall back on PROC CAS.

    Together, these techniques enabled implementation of a robust data flow that overcomes concurrency problems due to multiple processes updating and querying the data.

    Acknowledgement

    We thank Brian Kinnebrew for his generous help in investigating this topic and the technical review.

    Appendix 1

    Try the following exercise to verify that manipulating a promoted table in DATA STEP leads to two copies of the table – session- AND global-scope.

    /*copy and promote a sample SAS 9 table*/
    data casuser.cars(promote=yes);
         set sashelp.cars;
         run;
    /*check the number of rows and confirm that the table is promoted*/
    proc cas;
         table.tableinfo / caslib='casuser' name='cars';
         quit; /*The table is promoted and has 428 rows*/
     
    /*delete some rows in the promoted table*/
    data casuser.cars;
         set casuser.cars;
         if make='Acura' then delete;
         run;
    /*check again – how may rows does the table have? Is it promoted?*/
    proc cas;
         table.tableinfo / caslib='casuser' name='cars';
         quit;  /*The table is has 421 rows but it is no longer promoted*/
     
    /*reset your CAS session*/
    /*kill your current CAS session */
    cas _all_ terminate;
    /*start a new CAS session and assign caslibs*/
    cas; 
    caslib _all_ assign;
     
    /*check again – how may rows does the table have? Is it promoted?*/
    proc cas;
         table.tableinfo / caslib='casuser' name='cars';
         quit;  /*The table is promoted and has 428 rows*/

    What we see here is, manipulating a global-scope table in DATA STEP leads to duplication of data. CAS copies the table to session-scope and applies the changes there. The changes go away if you terminate the session. One way to get around this issue is instead of trying to overwrite the promoted table, create a new table, then drop the old table and promote the new table under the old table's name. Otherwise, use table.Update actions to update/delete rows in place, as described in this post.

    Append and Replace Records in a CAS Table was published on SAS Users.

    11月 202020
     

    If you’re like me and the rest of the conference team, you’ve probably attended more virtual events this year than you ever thought possible. You can see the general evolution of virtual events by watching the early ones from April or May and compare them to the recent ones. We at SAS Global Forum are studying the virtual event world, and we’re learning what works and what needs to be tweaked. We’re using that knowledge to plan the best possible virtual SAS Global Forum 2021.

    Everything is virtual these days, so what do we mean by virtual?

    Planning a good virtual event takes time, and we’re working through the process now. One thing is certain -- we know the importance of providing quality content and an engaging experience for our attendees. We want to provide attendees with the opportunity as always, but virtually, to continue to learn from other SAS users, hear about new and exciting developments from SAS, and connect and network with experts, peers, partners and SAS. Yes, I said network. We realize it won’t be the same as a live event, but we are hopeful we can provide attendees with an incredible experience where you connect, learn and share with others.

    Call for content is open

    One of the differences between SAS Global Forum and other conferences is that SAS users are front and center, and the soul of the conference. We can’t have an event without user content. And that’s where you come in! The call for content opened November 17 and lasts through December 21, 2020. Selected presenters will be notified in January 2021. Presentations will be different in 2021; they will be 30 minutes in length, including time for Q&A when able. And since everything is virtual, video is a key component to your content submission. We ask for a 3-minute video along with your title and abstract.

    The Student Symposium is back

    Calling all postsecondary students -- there’s still time to build a team for the Student Symposium. If you are interested in data science and want to showcase your skills, grab a teammate or two and a faculty advisor and put your thinking caps on. Applications are due by December 21, 2020.

    Learn more

    I encourage you to visit the SAS Global Forum website for up-to-date information, follow #SASGF on social channels and join the SAS communities group to engage with the conference team and other attendees.

    Connect, learn and share during virtual SAS Global Forum 2021 was published on SAS Users.

    7月 232020
     

    Splitting one into smaller pieces

    In his blog post, How to split one data set into many, Chris Hemedinger showed how to subset or split SAS data sets based on the values of categorical variables. For example, based on a value of variable REGION you may split a data set MARKETING into MARKETING_ASIA, MARKETING_AMERICA, MARKETING_EUROPE, and so on.

    In some cases, however, we need to split a large data set into many – not by a subsetting variable values, but by a number of observations in order to produce smaller, better manageable data sets. Such an approach can be dictated by restrictions on the data set size imposed by hardware (memory size, transmission channel bandwidth etc.), processing time, or user interface convenience (e.g. search results displayed by pages).

    We might need to split a data set into smaller tables of K observations or less each; or to split a data set into S equal (or approximately equal) pieces.

    We might need to split a data set into sequentially selected subsets where the first K observations go into the first data set, the second K observations go into the second data set, and so on. Alternatively, we might need to randomly select observations from a data set while splitting it into smaller tables.

    This blog post provides possible coding solutions for such scenarios.

    Splitting a data set into smaller data sets sequentially

    Let’s say we need to split a data set SASHELP.CARS (number of observation N=428) into several smaller datasets. We will consider the following two sequential observation selection scenarios:

    1. Each smaller data set should have maximum of K observations.
    2. There should be S smaller data sets of approximately same size.

    Ideally, we would like to split a data set into K observations each, but it is not always possible to do as the quotient of dividing the number of observations in the original dataset N by K is not always going to be a whole number. Therefore, we will split it into several smaller data sets of K observations each, but the last smaller data set will have the number of observations equal to the remainder of the division N by K.

    Similarly, with the scenario 2, we will split the source data set into several smaller data sets of the same size, but the last smaller data set will have the number of observations equal to the remainder of the division N by K.

    Below is a SAS macro code that covers both these scenarios.

    %macro split (SRC_DATASET=, OUT_PREFIX=, SPLIT_NUM=, SPLIT_DEF=);
    /* Parameters:
    /*   SRC_DATASET - name of the source data set     */
    /*   OUT_PREFIX - prefix of the output data sets   */
    /*   SPLIT_NUM - split number                      */
    /*   SPLIT_DEF - split definition (=SETS or =NOBS) */
     
       %local I K S TLIST;
     
       /* number of observations &K, number of smaller datasets &S */
       data _null_;
          if 0 then set &SRC_DATASET nobs=N;
          if upcase("&SPLIT_DEF")='NOBS' then
             do;
                call symputx('K',&SPLIT_NUM); 
                call symputx('S',ceil(N/&SPLIT_NUM));
                put "***MACRO SPLIT: Splitting into datasets of no more than &SPLIT_NUM observations";
             end;
             else if upcase("&SPLIT_DEF")='SETS' then
             do;
                call symputx('S',&SPLIT_NUM); 
                call symputx('K',ceil(N/&SPLIT_NUM));
                put "***MACRO SPLIT: Splitting into &SPLIT_NUM datasets";
            end;
             else put "***MACRO SPLIT: Incorrect SPLIT_DEF=&SPLIT_DEF value. Must be either SETS or NOBS.";
          stop; 
       run;
     
     
       /* terminate macro if nothing to split */
       %if (&K le 0) or (&S le 0) %then %return;
     
        /* generate list of smaller dataset names */
       %do I=1 %to &S;
          %let TLIST = &TLIST &OUT_PREFIX._&I;
       %end;
     
       /* split source dataset into smaller datasets */
       data &TLIST;
          set &SRC_DATASET;
          select;
             %do I=1 %to &S;
                when(_n_ <= &K * &I) output &OUT_PREFIX._&I; 
             %end;
          end;
       run;
     
    %mend split;

    The following are examples of the macro invocations:

    %split(SRC_DATASET=SASHELP.CARS, OUT_PREFIX=WORK.CARS, SPLIT_NUM=100, SPLIT_DEF=SET);
     
    %split(SRC_DATASET=SASHELP.CARS, OUT_PREFIX=WORK.CARS, SPLIT_NUM=100, SPLIT_DEF=NOBS);
     
    %split(SRC_DATASET=SASHELP.CARS, OUT_PREFIX=WORK.CARS, SPLIT_NUM=3, SPLIT_DEF=SETS);

    These invocations will produce the following SAS logs:

    %split(SRC_DATASET=SASHELP.CARS, OUT_PREFIX=WORK.CARS, SPLIT_NUM=100, SPLIT_DEF=SET);
    ***MACRO SPLIT: Incorrect SPLIT_DEF=SET value. Must be either SETS or NOBS.
     
    %split(SRC_DATASET=SASHELP.CARS, OUT_PREFIX=WORK.CARS, SPLIT_NUM=100, SPLIT_DEF=NOBS);
    ***MACRO SPLIT: Splitting into datasets of no more than 100 observations
    NOTE: There were 428 observations read from the data set SASHELP.CARS.
    NOTE: The data set WORK.CARS_1 has 100 observations and 15 variables.
    NOTE: The data set WORK.CARS_2 has 100 observations and 15 variables.
    NOTE: The data set WORK.CARS_3 has 100 observations and 15 variables.
    NOTE: The data set WORK.CARS_4 has 100 observations and 15 variables.
    NOTE: The data set WORK.CARS_5 has 28 observations and 15 variables.
     
    %split(SRC_DATASET=SASHELP.CARS, OUT_PREFIX=WORK.CARS, SPLIT_NUM=3, SPLIT_DEF=SETS);
    ***MACRO SPLIT: Splitting into 3 datasets
    NOTE: There were 428 observations read from the data set SASHELP.CARS.
    NOTE: The data set WORK.CARS_1 has 143 observations and 15 variables.
    NOTE: The data set WORK.CARS_2 has 143 observations and 15 variables.
    NOTE: The data set WORK.CARS_3 has 142 observations and 15 variables.

    Splitting a data set into smaller data sets randomly

    For randomly splitting a data set into many smaller data sets we can use the same approach as above with a slight modification. In essence, we are going to randomly shuffle observations of our source data set first, and then apply the sequential splitting.

    In order to implement this, we just need to replace the last data step in the above macro with the following 3 steps:

    /* generate random numbers, R */
       data;
          set &SRC_DATASET;
          call streaminit(1234);
          R = rand('uniform');
       run;
     
       /* sort data in R order */
       proc sort;
          by R;
       run;
     
       /* split source dataset into smaller datasets */
       data &TLIST (drop=R);
          set;
          select;
             %do I=1 %to &S;
                when(_n_ <= &K * &I) output &OUT_PREFIX._&I; 
             %end;
          end;
       run;

    This modified code will produce similar results (with the same information in the SAS log), however, smaller data sets will have their observations randomly selected from the source data set.

    DATAn naming convention

    You may have noticed that in this random splitting code I have not specified data set names neither in the DATA statement of the first DATA step, nor in the PROC SORT and not even in the SET statement of the last DATA step. Not only these shortcuts possible due to SAS’ DATAn naming convention, but it is a very robust way of dynamically assigning temporary data set names. This method is especially useful and appropriate for SAS macros as it guarantees that you do not accidentally overwrite a data set with the same name in SAS program that invokes your macro. Think about it: if you are a macro developer you need to make sure that whatever temporary data sets you create within your macro their names must be unique for a SAS session in order not to interfere with any data sets that may be created in the calling SAS program outside of your macro.

    Here are defaults in SAS’ DATAn naming convention:

    • If you do not specify a name for the output data set in a DATA statement, SAS automatically assigns the default names WORK.DATA1, WORK.DATA2, and so on, to each successive data set that you create.
    • If you do not specify a name for the input data set in a SET statement, SAS automatically uses the last data set that was created. SAS keeps track of the most recently created data set through the reserved name _LAST_. When you execute a DATA or PROC step without specifying an input data set, by default, SAS uses the _LAST_ data set.

    For more information on this useful SAS coding technique see special data set names and examples and warning on using special data set names.

    Your thoughts?

    Do you find this post useful? Have you ever split data sets into smaller ones based on a number of observations? Do you use special data set names and DATAn naming convention in your SAS coding? Please share your thoughts in the comments section below.

    Splitting a data set into smaller data sets was published on SAS Users.

    6月 042020
     

    Learning never stops. When SAS had to change this year’s SAS Global Forum (SGF) to a virtual event, everyone was disappointed. I am, however, super excited about all of the papers and stream of video releases over the last month (and I encourage you to register for the upcoming live event in June). For now, I made a pact with myself to read or watch one piece of SGF related material per day. While I haven’t hit my goal 100%, I sure have learned a lot from all the reading and viewing. One particular paper, Using Jupyter to Boost Your Data Science Workflow, and its accompanying video by Hunter Glanz caught my eye this week. This post elaborates on one piece of his material: how to save Jupyter notebooks in other file formats.

    Hunter’s story

    Hunter is a professor who teaches multiple classes using SAS® University Edition, which comes equipped with an integrated Jupyter notebook. His focus is on SAS programming and he requires his students to create notebooks to complete assignments; however he wants to see the results of their work, not to run their raw code. The notebooks include text, code, images, reports, etc. Let's explore how the students can transform their navitve notebooks into other, more consumable formats. We'll also discuss other use cases in which SAS users may want to create a copy of their work from a notebook, to say a .pdf, .html, or .py file, just to name a few.

    What you’ll find here and what you won’t

    This post will not cover how to use Jupyter notebooks with SAS or other languages. There is a multitude of other resources, starting with Hunter’s work, to explore those topics. This post will cover how to produce other file formats in SAS, Python, and R. I’ll outline multiple methods including a point-and-click method, how to write inline code directly in the notebook, and finally using the command line.

    Many of the processes discussed below are language agnostic. When there are distinct differences, I’ll make a note.

    A LITTLE about Jupyter notebooks

    A Jupyter notebook is a web application allowing clients to run commands, view responses, include images, and write inline text all in one concourse. The all-encompassing notebook supports users to telling complete story without having to use multiple apps. Jupyter notebooks were originally created for the Python language, and are now available for many other programming languages. JupyterLab, the notebooks’ cousin, is a later, more sophisticated version, but for this writing, we’ll focus on the notebook. The functionality in this use case is similar.

    Where do we start? First, we need to install the notebook, if you're not working in a SAS University Edition.

    Install Anaconda

    The easiest way to get started with the Jupyter Notebook App is by installing Anaconda (this will also install JupyterLab). Anaconda is an open source distribution tool for the management and deployment of scientific computing. Out-of-the-box, the notebook from the Anaconda install includes the Python kernel. For use with other languages, you need to install additional kernels.

    Install additional language kernels

    In this post, we’ll focus on Python, R, and SAS. The Python kernel is readily available after the Anaconda install. For the R language, follow the instructions on the GitHub R kernel repository. I also found the instructions on How to Install R in Jupyter with IRKernel in 3 Steps quite straight forward and useful. Further, here are the official install instructions for the SAS kernel and a supporting SAS Community Library article.

    With the additional kernels are in place, you should see all available languages when creating a new notebook as pictured below.

    Available kernels list

    File conversion methods

    Now we’re ready to dive into the export process. Let’s look at three approaches in detail.

    Download (Export) option

    Once you’ve opened your notebook and run the code, select File-> Download As (appears as Export Notebook As… in JupyterLab).

    "Download As"  option in Jupyter notebook

    "Export Notebook As" option in JupyterLab

    HTML format output

    Notice the list of options, some more familiar than others. Select the HTML option and Jupyter converts your entire notebook: text, commands, figures, images, etc, into a file with a .html extension. Opening the resulting file would display in a browser as expected. See the images below for a comparison of the .ipynb and .html files.

    SAS code in a Jupyther notebook

    Corresponding SAS code notebook in html form

    SAS (aka script) format output

    Using the Save As-> SAS option renders a .sas file and is depicted in Enterprise Guide below. Note: when using a different kernel, say Python or R, you have the option to save in that language specific script format.

    SAS code saved from a notebook displayed in Enterprise Guide

    One thing to note here is only the code appears in the output file. The markdown code, figures, etc., from the original notebook, are not display options in EG, so they are removed.

    PDF format output

    There is one (two actually) special case(s) I need to mention. If you want to create a PDF (or LaTeX, which is used to create pdf files) output of your notebook, you need additional software. For converting to PDF, Jupyter uses the TeX document preparation ecosystem. If you attempt to download without TeX, the conversion fails, and you get a message to download TeX. Depending on your OS the TeX software will have a different name but will include TeX in the name. You may also, in certain instances, need Pandoc for certain formats. I suggest installing both to be safe. Install TeX from its dowload site. And do the same for Pandoc.

    Once I’ve completed creating the files, the new files appear in my File Explorer.

    New SAS file in Windows File Explorer

    Cheaters may never win, but they can create a PDF quickly

    Well, now that we’ve covered how to properly convert and download a .pdf file, there may be an easier way. While in the notebook, press the Crtl + P keys. In the Print window, select the Save to PDF option, choose a file destination and save. It works, but I felt less accomplished afterward. Your choice.

    Inline code option

    Point-and-click is a perfectly valid option, but let’s say you want to introduce automation into your world. The jupyter nbconvert command provides the capability to transform the current notebook into any format mentioned earlier. All you must do is pass the command with a couple of parameters in the notebook.

    In Python, the nbconvert command is part of the os library. The following lines are representative of the general structure.

    import os
    os.system("jupyter nbconvert myNotebook.ipynb --to html")

    An example with Python

    The example below is from a Python notebook. The "0" out code represents success.

    Code to create a PDF file from a Python notebook

    An example with SAS

    As you see with the Python example, the code is just that: Python. Generally, you cannot run Python code in a Jupyter notebook running the SAS kernel. Luckily we have Jupyter magics, which allow us to write and run Python code inside a SAS kernel. The magics are a two-way street and you can also run SAS code inside a Python shell. See the SASPy documentation for more information.

    The code below is from a SAS notebook, but is running Python code (triggered by the %%python magic).

    Code to create a PDF file from a SAS notebook

    The EmployeeChurnSASCode.pdf file is created in same directory as the original notebook file:

    Jupyter file system display in a web browser

    An example with R

    Things are fairly straight forward in an R notebook. However, you must install and load the nbconvert package.

    Code to create an HTML file from an R notebook

    The first line installs the package, the second line loads the package, and the third actually does the conversion. Double-check your paths if you run into trouble.

    The command line

    The last method we look at is the command line. This option is the same regardless of the language with which you’re working. The possibilities are endless for this option. You could include it in a script, use it in code to run and display in a web app, or create the file and email it to a colleague. The examples below were all run on a Windows OS machine using the Anaconda command prompt.

    An example with a SAS notebook

    Convert sasNotebook.ipynb to a SAS file.

    >> ls -la |grep sasNotebook
    -rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
     
    >> jupyter nbconvert --to script sasNotebook.ipynb
    [NbConvertApp] Converting notebook sasNotebook.ipynb to script
    [NbConvertApp] Writing 351 bytes to sasNotebook.sas
     
    >> ls -la |grep sasNotebook
    -rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
    -rw-r--r-- 1 jofurb 1049089     369 May 29 14:57 sasNotebook.sas

    An example with a Python notebook

    Convert 1_load_data.ipynb to a PDF file

    >> ls -la |grep 1_load
    -rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
     
    >> jupyter nbconvert 1_load_data.ipynb --to pdf
    [NbConvertApp] Converting notebook 1_load_data.ipynb to pdf
    [NbConvertApp] Writing 27341 bytes to .\notebook.tex
    [NbConvertApp] Building PDF
    [NbConvertApp] Running xelatex 3 times: ['xelatex', '.\\notebook.tex', '-quiet']
    [NbConvertApp] Running bibtex 1 time: ['bibtex', '.\\notebook']
    [NbConvertApp] WARNING | b had problems, most likely because there were no citations
    [NbConvertApp] PDF successfully created
    [NbConvertApp] Writing 32957 bytes to 1_load_data.pdf
     
    >> ls -la |grep 1_load
    -rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
    -rw-r--r-- 1 jofurb 1049089  32957 May 29 15:23 1_load_data.pdf

    An example with an R notebook

    Convert HR_R.ipynb to an R file.

    >> ls -la | grep HR
    -rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
     
    >> jupyter nbconvert HR_R.ipynb --to script
    [NbConvertApp] Converting notebook HR_R.ipynb to script
    [NbConvertApp] Writing 981 bytes to HR_R.r
     
    >> ls -la | grep HR
    -rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
    -rw-r--r-- 1 jofurb 1049089   1021 May 29 15:44 HR_R.r

    Wrapping things up

    Whether you’re a student of Hunter’s, an analyst creating a report, or a data scientist monitoring data streaming models, you may have the need/requirement to transform you work from Jupyter notebook to a more consumable asset. Regardless of the language of your notebook, you have multiple choices for saving your work including menu options, inline code, and from the command line. This is a great way to show off your creation in a very consumable mode.

    How to save Jupyter notebooks in assorted formats was published on SAS Users.

    6月 042020
     

    Learning never stops. When SAS had to change this year’s SAS Global Forum (SGF) to a virtual event, everyone was disappointed. I am, however, super excited about all of the papers and stream of video releases over the last month (and I encourage you to register for the upcoming live event in June). For now, I made a pact with myself to read or watch one piece of SGF related material per day. While I haven’t hit my goal 100%, I sure have learned a lot from all the reading and viewing. One particular paper, Using Jupyter to Boost Your Data Science Workflow, and its accompanying video by Hunter Glanz caught my eye this week. This post elaborates on one piece of his material: how to save Jupyter notebooks in other file formats.

    Hunter’s story

    Hunter is a professor who teaches multiple classes using SAS® University Edition, which comes equipped with an integrated Jupyter notebook. His focus is on SAS programming and he requires his students to create notebooks to complete assignments; however he wants to see the results of their work, not to run their raw code. The notebooks include text, code, images, reports, etc. Let's explore how the students can transform their navitve notebooks into other, more consumable formats. We'll also discuss other use cases in which SAS users may want to create a copy of their work from a notebook, to say a .pdf, .html, or .py file, just to name a few.

    What you’ll find here and what you won’t

    This post will not cover how to use Jupyter notebooks with SAS or other languages. There is a multitude of other resources, starting with Hunter’s work, to explore those topics. This post will cover how to produce other file formats in SAS, Python, and R. I’ll outline multiple methods including a point-and-click method, how to write inline code directly in the notebook, and finally using the command line.

    Many of the processes discussed below are language agnostic. When there are distinct differences, I’ll make a note.

    A LITTLE about Jupyter notebooks

    A Jupyter notebook is a web application allowing clients to run commands, view responses, include images, and write inline text all in one concourse. The all-encompassing notebook supports users to telling complete story without having to use multiple apps. Jupyter notebooks were originally created for the Python language, and are now available for many other programming languages. JupyterLab, the notebooks’ cousin, is a later, more sophisticated version, but for this writing, we’ll focus on the notebook. The functionality in this use case is similar.

    Where do we start? First, we need to install the notebook, if you're not working in a SAS University Edition.

    Install Anaconda

    The easiest way to get started with the Jupyter Notebook App is by installing Anaconda (this will also install JupyterLab). Anaconda is an open source distribution tool for the management and deployment of scientific computing. Out-of-the-box, the notebook from the Anaconda install includes the Python kernel. For use with other languages, you need to install additional kernels.

    Install additional language kernels

    In this post, we’ll focus on Python, R, and SAS. The Python kernel is readily available after the Anaconda install. For the R language, follow the instructions on the GitHub R kernel repository. I also found the instructions on How to Install R in Jupyter with IRKernel in 3 Steps quite straight forward and useful. Further, here are the official install instructions for the SAS kernel and a supporting SAS Community Library article.

    With the additional kernels are in place, you should see all available languages when creating a new notebook as pictured below.

    Available kernels list

    File conversion methods

    Now we’re ready to dive into the export process. Let’s look at three approaches in detail.

    Download (Export) option

    Once you’ve opened your notebook and run the code, select File-> Download As (appears as Export Notebook As… in JupyterLab).

    "Download As"  option in Jupyter notebook

    "Export Notebook As" option in JupyterLab

    HTML format output

    Notice the list of options, some more familiar than others. Select the HTML option and Jupyter converts your entire notebook: text, commands, figures, images, etc, into a file with a .html extension. Opening the resulting file would display in a browser as expected. See the images below for a comparison of the .ipynb and .html files.

    SAS code in a Jupyther notebook

    Corresponding SAS code notebook in html form

    SAS (aka script) format output

    Using the Save As-> SAS option renders a .sas file and is depicted in Enterprise Guide below. Note: when using a different kernel, say Python or R, you have the option to save in that language specific script format.

    SAS code saved from a notebook displayed in Enterprise Guide

    One thing to note here is only the code appears in the output file. The markdown code, figures, etc., from the original notebook, are not display options in EG, so they are removed.

    PDF format output

    There is one (two actually) special case(s) I need to mention. If you want to create a PDF (or LaTeX, which is used to create pdf files) output of your notebook, you need additional software. For converting to PDF, Jupyter uses the TeX document preparation ecosystem. If you attempt to download without TeX, the conversion fails, and you get a message to download TeX. Depending on your OS the TeX software will have a different name but will include TeX in the name. You may also, in certain instances, need Pandoc for certain formats. I suggest installing both to be safe. Install TeX from its dowload site. And do the same for Pandoc.

    Once I’ve completed creating the files, the new files appear in my File Explorer.

    New SAS file in Windows File Explorer

    Cheaters may never win, but they can create a PDF quickly

    Well, now that we’ve covered how to properly convert and download a .pdf file, there may be an easier way. While in the notebook, press the Crtl + P keys. In the Print window, select the Save to PDF option, choose a file destination and save. It works, but I felt less accomplished afterward. Your choice.

    Inline code option

    Point-and-click is a perfectly valid option, but let’s say you want to introduce automation into your world. The jupyter nbconvert command provides the capability to transform the current notebook into any format mentioned earlier. All you must do is pass the command with a couple of parameters in the notebook.

    In Python, the nbconvert command is part of the os library. The following lines are representative of the general structure.

    import os
    os.system("jupyter nbconvert myNotebook.ipynb --to html")

    An example with Python

    The example below is from a Python notebook. The "0" out code represents success.

    Code to create a PDF file from a Python notebook

    An example with SAS

    As you see with the Python example, the code is just that: Python. Generally, you cannot run Python code in a Jupyter notebook running the SAS kernel. Luckily we have Jupyter magics, which allow us to write and run Python code inside a SAS kernel. The magics are a two-way street and you can also run SAS code inside a Python shell. See the SASPy documentation for more information.

    The code below is from a SAS notebook, but is running Python code (triggered by the %%python magic).

    Code to create a PDF file from a SAS notebook

    The EmployeeChurnSASCode.pdf file is created in same directory as the original notebook file:

    Jupyter file system display in a web browser

    An example with R

    Things are fairly straight forward in an R notebook. However, you must install and load the nbconvert package.

    Code to create an HTML file from an R notebook

    The first line installs the package, the second line loads the package, and the third actually does the conversion. Double-check your paths if you run into trouble.

    The command line

    The last method we look at is the command line. This option is the same regardless of the language with which you’re working. The possibilities are endless for this option. You could include it in a script, use it in code to run and display in a web app, or create the file and email it to a colleague. The examples below were all run on a Windows OS machine using the Anaconda command prompt.

    An example with a SAS notebook

    Convert sasNotebook.ipynb to a SAS file.

    >> ls -la |grep sasNotebook
    -rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
     
    >> jupyter nbconvert --to script sasNotebook.ipynb
    [NbConvertApp] Converting notebook sasNotebook.ipynb to script
    [NbConvertApp] Writing 351 bytes to sasNotebook.sas
     
    >> ls -la |grep sasNotebook
    -rw-r--r-- 1 jofurb 1049089  448185 May 29 14:34 sasNotebook.ipynb
    -rw-r--r-- 1 jofurb 1049089     369 May 29 14:57 sasNotebook.sas

    An example with a Python notebook

    Convert 1_load_data.ipynb to a PDF file

    >> ls -la |grep 1_load
    -rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
     
    >> jupyter nbconvert 1_load_data.ipynb --to pdf
    [NbConvertApp] Converting notebook 1_load_data.ipynb to pdf
    [NbConvertApp] Writing 27341 bytes to .\notebook.tex
    [NbConvertApp] Building PDF
    [NbConvertApp] Running xelatex 3 times: ['xelatex', '.\\notebook.tex', '-quiet']
    [NbConvertApp] Running bibtex 1 time: ['bibtex', '.\\notebook']
    [NbConvertApp] WARNING | b had problems, most likely because there were no citations
    [NbConvertApp] PDF successfully created
    [NbConvertApp] Writing 32957 bytes to 1_load_data.pdf
     
    >> ls -la |grep 1_load
    -rw-r--r-- 1 jofurb 1049089   6004 May 29 07:37 1_load_data.ipynb
    -rw-r--r-- 1 jofurb 1049089  32957 May 29 15:23 1_load_data.pdf

    An example with an R notebook

    Convert HR_R.ipynb to an R file.

    >> ls -la | grep HR
    -rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
     
    >> jupyter nbconvert HR_R.ipynb --to script
    [NbConvertApp] Converting notebook HR_R.ipynb to script
    [NbConvertApp] Writing 981 bytes to HR_R.r
     
    >> ls -la | grep HR
    -rw-r--r-- 1 jofurb 1049089   5253 Nov 19  2019 HR_R.ipynb
    -rw-r--r-- 1 jofurb 1049089   1021 May 29 15:44 HR_R.r

    Wrapping things up

    Whether you’re a student of Hunter’s, an analyst creating a report, or a data scientist monitoring data streaming models, you may have the need/requirement to transform you work from Jupyter notebook to a more consumable asset. Regardless of the language of your notebook, you have multiple choices for saving your work including menu options, inline code, and from the command line. This is a great way to show off your creation in a very consumable mode.

    How to save Jupyter notebooks in assorted formats was published on SAS Users.

    4月 232020
     

    SAS macro function

    SAS® Macro Language immensely empowers SAS programmers with versatility and efficiency of their code development. It allows SAS users to modularize programming code with “once written – many times used” components, and in many cases automatically generate data-driven SAS code.

    Macro language and macro processor

    Generally, SAS software processes your SAS program step by step, first scanning it for macro language objects - macro variables referenced as &somename, and macros referenced as %somename. If found, SAS software activates macro processor which resolves and substitutes those macro references according to the macro language syntax before SAS compiles and executes your programming steps.

    SAS macro language vs. SAS programming language

    A SAS program usually consists of two, often interwoven layers – macro layer and non-macro layer, each with its own syntax and its own timing of compilation and execution. In other words, SAS code is a combination of two distinct languages:

    • SAS programming language (comprised of DATA steps, PROC steps and global statements such as LIBNAME, OPTIONS, TITLE etc.)
    • SAS macro language (comprised of %LET, %IF, %DO, macro functions etc.) which is processed separately from and before SAS compiler executes SAS programming language code.

    The difference between them is like a difference between cooking a meal and eating the meal. In this analogy meal=code, cooking=SAS macro language, eating=SAS programming language. Clear understanding of this difference is the key to becoming a successful SAS programmer.

    Two types of SAS macros

    There are two distinct types of SAS macros:

    1. Macros that generate some SAS programming language code which can span across SAS statements or steps;
    2. Macros that generate some string values which can be used as part of SAS programming language code or data values, but they are not complete SAS statements or steps. This type does not generate any SAS executable code, just a value.

    What is a SAS macro function?

    SAS macro function is a SAS macro that generates a value. In other words, it is the type 2 macro described above. As any SAS macros, SAS macro functions can have any number (zero or more) of positional or/and named parameters (arguments). SAS users may define their own macro functions, but in doing so you may not utilize any SAS language syntax; only SAS macro language syntax is allowed. You can use existing macro functions in your own macro function definition. Among others, one of the most powerful is %SYSFUNC macro function which brings a wealth of SAS language functions into SAS macro language.

    Sources of SAS macro functions

    SAS macro functions may come from the following three sources.

    1. Pre-built macro functions

    Pre-built macro functions that are part of the macro processor. These are such macro functions as %eval, %length, %quote, %scan, %str, %sysfunc, %upcase, etc. Here is a complete list of the pre-built SAS macro functions.

    2. Auto-call macro functions

    Auto-call macros, some of them are type 1 (macros), and some – type 2 (macro functions) such as %cmpres, %left, %lowcase, %trim, %verify, etc. These macro functions supplement the pre-built macro functions. The main difference from the pre-built macro functions is that the auto-call macro functions are program samples of the user-defined macro functions that are written in SAS macro language and made available to you without having to define or include them in your programs. The auto-call macro functions come with your SAS software installation and usually pre-configured for you by setting MAUTOSOURCE and SASAUTOS= macro system options. They may include several macro libraries depending on the SAS products licensed at your site. For example, for my SAS BASE installation the auto-call macro library is in the following folder:

    C:\Program Files\SASHome\SASFoundation\9.4\core\sasmacro

    Here is a selected list of auto-call macros provided with SAS software.

    From the usage standpoint, you will not notice any difference between the pre-built and the auto-call macro functions. For example, macro function %upcase() is pre-built, while macro function %lowcase() is auto-call macro function. They belong to entirely different families, but we use them as if they are complementary siblings.

    3. User-defined macro functions

    Finally, there are user-defined macro functions that do not come with SAS installation. These are the macro functions that you define on your own. Usually, they are kept separately from the auto-call macros mainly in order to distinguish them from SAS-supplied ones.

    To enable access to your own SAS macro library in addition to the auto-call macro library (or libraries), you can use the INSERT= system option:

    options insert=(sasautos="path_to_your_own_macro_library_folder");

    Instead of replacing the SASAUTOS value, this option inserts an additional value into the existing SASAUTOS option as the first value, thus allowing you to tap into your own macro library first, and then also into pre-set SAS auto-call libraries.

    Creating user-defined macro function

    Let’s consider the following example. Suppose, we want to create a macro function that takes a data set name as an argument and returns a value equal to the number of observations in that data set.

    We know that the following code calculates the number of observations in a data set:

    data _null_;
       set SASHELP.CARS (obs=0) nobs=n;
       call symputx('NOBS',n);
    run;
     
    %put &=NOBS;
    NOBS=428

    Can we create a SAS macro function by enclosing this code into macro? Something like this:

    %macro nobs(dset=,result=);
       data _null_;
          set &dset (obs=0) nobs=n;
          call symputx("&result",n);
       run;
    %mend nobs;

    The answer is “No”. Yes, we created a valid macro; we can invoke this macro to produce the result:

    %nobs(dset=SASHELP.CARS, result=NOBS);
    %put &=NOBS;
    NOBS=428

    But this is not a macro function. Remember type 2 macro that does not generate any SAS programming language code, just a value? But this macro does generate SAS code which assigns a value to the macro variable specified as the second argument (result=NOBS).

    In order to create a valid macro function, our macro should not have any SAS language code in it – neither a DATA step, nor a PROC step. It may only be comprised of the SAS macro language code. Here it is:

    %macro nobs(dset);
       %local dsid n rc;
       %let dsid = %sysfunc(open(&dset));
       %if &dsid %then
       %do;
          %let n = %sysfunc(attrn(&dsid,nlobs));
          %let dsid = %sysfunc(close(&dsid));
       %end;
       %else %put %sysfunc(sysmsg());
       &n
    %mend nobs;

    When macro processor executes this macro, the only object that gets passed to the SAS language compiler is the value shown in the line right before the %mend. This is the calculated value of the number of observations (denoted by &n ). This is the only thing that is visible by the SAS language compiler, the rest is the macro language code visible and being handled by SAS macro processor.

    IMPORTANT: When defining SAS macro function always use %local statement to list ALL macro variables that are created in your macro to ensure they will not accidentally overwrite same-named macro variables in the calling environment. You don’t need to declare %local for macro parameters as they are always local automatically.

    SAS macro functions usage examples

    When a macro function is defined this way, wherever you place its invocation %nobs(SASHELP.CARS) in your SAS code it will be evaluated and replaced with the corresponding value (in this case it is number 428) by the SAS macro processor. That way you can avoid substandard hard-coding and make your SAS code dynamic and powerful. You can use macro functions in many SAS coding contexts. For example:

    • Assignment statements for macro variable: %let NOBS=%nobs(SASHELP.CARS);
    • Assignment statement in a DATA step: x = %nobs(SASHELP.CARS);
    • As a value of the iterative do loop: do i=1 to %nobs(SASHELP.CARS);
    • As part of condition in IF statement: if %nobs(SASHELP.CARS) > 500 then do;

    And so on.

    Your thoughts?

    Do you find this post useful? Do you use SAS macro functions? Can you suggest other usage examples? Please share with us in the Comments below.

    Additional resources

     

    How to create and use SAS macro functions was published on SAS Users.

    4月 072020
     

    As many of us are learning to navigate the changing world we are living in amid the COVID-19 outbreak, and as we care for our loved ones, friends, and our community, many of us now find ourselves working and studying from home much more than we did before. As an employee at SAS and an instructor at NC State University, I have found myself splitting my time between setting up my temporary home office and an at-home routine while also trying to help my students feel safe and comfortable as we move to a virtual classroom. But, as my commute and previous social time becomes Facetime calls and text messages, I’ve found myself with more downtime then I previously had, the time I want to dedicate to the training I’ve been wanting to do for the past year.

    At SAS, we are striving to care for our users during this time—in that spirit, I wanted to share with you some free SAS offerings, as well as coping techniques I am doing from home.

    Take care of yourself and your family

    First and foremost, make sure you and your family are taking time for self-care. Whether it be meditation, using a mobile app or YouTube video, or getting some exercise together. I am finding my daily walk something I need to help relax my mind and get myself back to focus on my tasks.

    Retrain on skills you haven’t touched in awhile

    Sometimes we all need that reboot on tools or methods we use every day. I love SAS’ Statistics 1 course and SAS Programming 1, both of which are free. It’s a great refresher to those who haven’t taken a math course in a few years, or for those just wanting to start out with data science using SAS. Understanding the fundamentals and getting some refreshers on SAS language tricks is something I try to push through with my work constantly. I am also preparing to take the SAS Certification Exam at the end of the year, so the practice exam is something I also plan to use often during my study sessions.

    Learn a new skill

    It is also a great time to learn something you have always wanted to do. I started by taking some free online photography classes. I also have on my list enrolling in the SAS Academy for Data Science, which is free until the end of May 2020. The advanced analytics professional courses are something I have wanted to complete for a long time, so I am excited to get started on learning more about data modeling. The SAS e-books collection is now also free until April 30, 2020, so I’ve downloaded some great additional materials using code FREEBOOKS at checkout. 

    Be kind to others

    Being stuck inside can sometimes make you feel like you are spending more time with family than you are used to, or maybe you are spending more time alone. Using this time to connect with those I haven’t made time to talk to has been something I am really thankful for. I call my sister who is a nurse in Florida and check on her. As this outbreak affects us all differently, this is a great time to come together to connect. It is also a great time to think of others affected by the outbreak and who don’t have the ability to continue working. There are some great ways you can help others by getting involved in volunteer work or donating to a helpful cause.

    Do fun activities

    Though we are stuck at home, this time has been great for enjoying things I often don’t get to during my normal busy schedule. Besides taking free training, I’ve been playing some new video games (#ACNH) and some games I’ve neglected for far too long. I’ve also used this time to find what brings me joy from my home. Making time for reading is bringing great joy to my life right now.

    As we all move through this turbulent time, make sure to take care of yourself and others. I hope some of these free tools and training will come in handy as you work towards your personal goals while remaining safe and healthy.

    Working remotely? A list of 5 ways to spend your down time was published on SAS Users.

    4月 032020
     

    Whether you like it or not, Microsoft Excel is still a big hit in the data analysis world. From small to big customers, we still see fit for daily routines such as filtering, generating plots, calculating items on ad-hoc analysis or even running statistical models. Whenever I talk to customers, there is always someone who will either ask: Can this be exported to excel or can we import data from excel?. Recently, other questions started to come up more often: Can we run Python within SAS? How do I allow my team to choose their language of preference? How do I provide an interface that looks like Microsoft Excel, but has SAS functionalities?.

    Well… good news is: we can answer YES to all of these questions. With the increase in number of users performing analytics and the number of analytical tools available, for me it was clear that we would end up having lots of disparate processes. For a while this was a problem, but naturally, companies started developing ways to integrate these siloed teams.

    In the beginning of last decade, SAS developed SAS Add-in for Microsoft Office. The tool allows customers to run/embed SAS analytic capabilities inside Microsoft Office applications. More recently, SAS released a new version of PROC FCMP allowing users to write Python code and call, if as a function, inside SAS programs.

    These advancements provide users the ability to run Python inside Excel. When I say inside, I really mean from within Excel's interface.

    Before we jump to how we can do it, you may ask yourself: Why is this relevant to me? If I know SAS, I import the dataset and work with the data in SAS; If I know Python, I open a Jupyter notebook, import the data set and do my thing. Well… you are kind of right, but let me tell you a story.

    The use case

    Recently I worked with a customer and his business process was like this: I have a team of data scientists that is highly technical and knowledgeable in Python and SAS. Additionally, I have a team of analysts with little Python knowledge, but are always working with Excel to summarize data, create filters, graphs, etc. My teams need to communicate and collaborate. The normal chain of events follows:

    1. the Python team works on the data, and exports the results to Excel
    2. the analytics team picks up the data set, and runs SAS scripts and excel formulas

    This is a problem of inefficiency for the customer. Why can't the data scientist pass his or her code to the analyst to execute it on the same project without having to wait on the Python specialist to run the code?

    I know this sounds overly complicated, but as my SAS colleague Mike Zizzi concludes in his post SAS or Python? Why not use both? Using Python functions inside SAS programs, at the end of the day what matters is that you get your work done. No matter which language, software or IDE you are using. I highly recommend Mike's article if you want a deep dive on what PROC FCMP has to offer.

    The process

    Let's walk through a data scoring scenario similar to my customer's story. Imagine I am a SAS programmer using Excel to explore data. I am also part of a team that uses Python, creating scoring data code using analytical models developed in Python. My job is to score and analyze the data on Excel and pass the results to the service representative, so they can forward the response to the customer.

    Importing data

    The data set we'll work with in this example will help me analyze which customers are more likely to default on a loan. The data and all code used in this article are in the associated GitHub repository. The data dictionary for the data set is located here. First, I open the data set as seen on Sheet1 below in Excel.

    Upload data to SAS

    Before we jump to the coding part with SAS and Python, I need to send the data to SAS. We'll use the SAS add-in, in Excel to send data to the local server. I cover the steps in detail below.

    I start by selecting the cells I want to upload to the library.

    Next, I move to the SAS tab and select the Copy to SAS Server task.

    A popup shows up where I confirm the selected cells.

    After I click OK, I configure column, table, naming and location options.

    SAS uploads the table to the requested library. Additionally, a new worksheet with the library.table name displays the results. As you can see on the image below, the sheet created follows the name WORK.IMPORTED_DATA we setup on the previous step. This represents the table in the SAS library memory. Notice, however, we are still working in Excel.

    The next step is to incorporate the code sent from my teammate.

    The Python code

    The code our colleague sent is pure Python. I don't necessarily have to understand the code details, just what it does. The Python code below imports and scores a model and returns a score. Note: if you're attempting this in your own environment, make sure to update the hmeq_model.sav file location in the # Import model pickle file section.

    def score_predictions(CLAGE, CLNO, DEBTINC,DELINQ, DEROG, LOAN, MORTDUE, NINQ,VALUE, YOJ):
    	"Output: scored"
    	# Imporing libraries
    	import pandas as pd
    	from sklearn.preprocessing import OneHotEncoder
    	from sklearn.compose import ColumnTransformer
    	from sklearn.externals import joblib
     
    	# Create pandas dataframe with input vars
    	dataset = pd.DataFrame({'CLAGE':CLAGE, 'CLNO':CLNO, 'DEBTINC':DEBTINC, 'DELINQ':DELINQ, 'DEROG':DEROG, 'LOAN':LOAN, 'MORTDUE':MORTDUE, 'NINQ':NINQ, 'VALUE':VALUE, 'YOJ':YOJ}, index=[0])
     
    	X = dataset.values
     
    	# Import model pickle file
    	loaded_model = joblib.load("C://assets/hmeq_model.sav")
     
    	# Score the input dataframe and get 0 or 1 
    	scored = int(loaded_model.predict_proba(X)[0,1])
     
    	# Return scored dataframe
    	return scored

    My SAS code calls this Python code from a SAS function defined in the next section.

    The SAS code

    Turning back to Excel, in the SAS Add-in side of the screen, I click on Programs. This displays a code editor, and as explained on this video, is like any other SAS code editor.

    We will use this code editor to write, run and view results from our code.

    The code below defines a FCMP function called Score_Python, that imports the Python script from my colleague and calls it from a SAS datastep. The output table, HMEQ_SCORED, is saved on the WORK library in SAS. Note: if you're attempting this in your own environment, make sure to update the script.py file location in the /* Getting Python file */ section.

    proc fcmp outlib=work.fcmp.pyfuncs;
     
    /* Defining FCMP function */
    proc fcmp outlib=work.fcmp.pyfuncs;
    	/* Defining name and arguments of the Python function to be called */
     
    	function Score_Python(CLAGE, CLNO, DEBTINC, DELINQ, DEROG, LOAN, MORTDUE, NINQ, VALUE, YOJ);
    		/* Python object */
    		declare object py(python);
     
    		/* Getting Python file  */
    		rc = py.infile("C:\assets\script.py");
     
    		/* Send code to Python interpreter */
    		rc = py.publish();
     
    		/* Call python function with arguments */
    		rc = py.call("score_predictions",CLAGE, CLNO, DEBTINC, DELINQ, DEROG, LOAN, MORTDUE, NINQ, VALUE, YOJ);
     
    		/* Pass Python results to SAS variable */
    		MyFCMPResult = py.results["scored"];
     
    		return(MyFCMPResult);
    	endsub;
    run;
     
    options cmplib=work.fcmp;
     
    /* Calling FCMP function from data step */
    data work.hmeq_scored;
    	set work._excelexport;
    	scored_bad = Score_Python(CLAGE, CLNO, DEBTINC, DELINQ, DEROG, LOAN, MORTDUE, NINQ, VALUE, YOJ);
    	put scored_bad=;
    run;

     

    I place my code in the editor.

    We're now ready to run the code. Select the 'Running man' icon in the editor.

    The output

    The result represents the outcome of the Python model scoring code. Once the code completes the WORK.HMEQ_SCORED worksheet updates with a new column, scored_bad.

    The binary value represents if the customer is likely (a '1') or unlikely (a '0') to default on his or her loan. I could now use any built in Excel features to filter or further analyze the data. For instance, I could filter all the customers likely to default on their loans and pass a report on to the customer management team.

    Final thoughts

    In this article we've explored how collaboration between teams with different skills can streamline processes for efficient data analysis. Each team focuses on what they're good at and all of the work is organized and completed in one place. It is a win-win for everyone.

    Related resources

    Extending Excel with Python and SAS Viya was published on SAS Users.