Data step programmers

10月 152020
 

SAS Microsoft partnershipYou might have heard about SAS - Microsoft partnership announced in June 2020 that officially joined the powers of SAS analytics with Microsoft’s cloud technology to further advance Artificial Intelligence (AI).

This partnership did not just happen out of nowhere. SAS has a long and deep history of integrating with Microsoft technologies. Examples include:

In this post we will look at a lesser known, but quite useful feature in SAS that allows SAS users to bring many Microsoft Excel functions right to their SAS programs. I hope that many SAS users (not just MS Excel aficionados) will love to discover this functionality within SAS.

Excel functions as SAS user-defined functions

SAS has a wide variety of built-in functions, however there are still many Microsoft Excel functions that are not intrinsically implemented in SAS. Luckily, many of them are made available in SAS via

proc fcmp inlib=SASHELP.SLKWXL listall;
run;

You can also capture the list of available Excel functions in a SAS data table using ODS OUTPUT with CODELIST= option:

ods noresults;
ods output codelist=WORK.EXCEL_FUNCTIONS_LIST (keep=COL1 COL2);
proc fcmp inlib=SASHELP.SLKWXL listall;
run;
ods output close;
ods results;

From this data table you can produce a nice looking HTML report listing all these functions:

data WORK.EXCEL_SAS_FUNCTIONS (keep=exc sas arg);
   label exc='Excel Function' sas='SAS Function' arg='Arguments';
   set WORK.EXCEL_FUNCTIONS_LIST (rename=(col2=arg));
   sas = tranwrd(col1,'Function ','');
   exc = tranwrd(sas,'_slk','');
run;
 
ods html path='c:\temp' file='excel_sas_functions.html';
title 'List of Excel functions available in SAS (via SASHELP.SLKWXL)';
proc print data=EXCEL_SAS_FUNCTIONS label;
run;
ods html close;

When you run this code, you should get the following list of Excel functions along with their SAS equivalents:

List of Excel functions available in SAS (via SASHELP.SLKWXL)
Obs Excel Function SAS Function Arguments
1 even even_slk ( x )
2 odd odd_slk ( x )
3 factdouble factdouble_slk ( x )
4 product product_slk ( nums )
5 multinomial multinomial_slk ( nums )
6 floor floor_slk ( n, sg )
7 datdif4 datdif4_slk ( start, end )
8 amorlinc amorlinc_slk ( cost, datep, fperiod, salvage, period, rate, basis )
9 amordegrc amordegrc_slk ( cost, datep, fperiod, salvage, period, rate, basis )
10 disc disc_slk ( settlement, maturity, pr, redemp, basis )
11 tbilleq tbilleq_slk ( settlement, maturity, discount )
12 tbillprice tbillprice_slk ( settlement, maturity, discount )
13 tbillyield tbillyield_slk ( settlement, maturity, par )
14 dollarde dollarde_slk ( fdollar, frac )
15 dollarfr dollarfr_slk ( ddollar, frac )
16 effect effect_slk ( nominal_rate, npery )
17 coupnum coupnum_slk ( settlement, maturity, freq, basis )
18 coupncd coupncd_slk ( settlement, maturity, freq, basis )
19 coupdaysnc coupdaysnc_slk ( settlement, maturity, freq, basis )
20 couppcd couppcd_slk ( settlement, maturity, freq, basis )
21 coupdays coupdays_slk ( settlement, maturity, freq, basis )
22 db db_slk ( cost, salvage, life, period, month )
23 yield yield_slk ( settlement, maturity, rate, pr, redemp, freq, basis )
24 yielddisc yielddisc_slk ( settlement, maturity, pr, redemp, basis )
25 coupdaybs coupdaybs_slk ( settlement, maturity, freq, basis )
26 oddfprice oddfprice_slk ( settlement, maturity, issue, fcoupon, rate, yield, redemp, freq, basis )
27 oddfyield oddfyield_slk ( settlement, maturity, issue, fcoupon, rate, pr, redemp, freq, basis )
28 oddlyield oddlyield_slk ( settlement, maturity, linterest, rate, pr, redemp, freq, basis )
29 oddlprice oddlprice_slk ( settlement, maturity, linterest, rate, yield, redemp, freq, basis )
30 price price_slk ( settlement, maturity, rate, yield, redemp, freq, basis )
31 pricedisc pricedisc_slk ( settlement, maturity, discount, redemp, basis )
32 pricemat pricemat_slk ( settlement, maturity, issue, rate, yld, basis )
33 yieldmat yieldmat_slk ( settlement, maturity, issue, rate, pr, basis )
34 received received_slk ( settlement, maturity, investment, discount, basis )
35 accrint accrint_slk ( issue, finterest, settlement, rate, par, freq, basis )
36 accrintm accrintm_slk ( issue, maturity, rate, par, basis )
37 duration duration_slk ( settlement, maturity, coupon, yld, freq, basis )
38 mduration mduration_slk ( settlement, maturity, coupon, yld, freq, basis )
39 avedev avedev_slk ( data )
40 devsq devsq_slk ( data )
41 varp varp_slk ( data )

 
NOTE: Excel functions that are made available in SAS are named from their Excel parent functions, suffixing them with _SLK to distinguish them from their Excel incarnations, as well as from native SAS functions.

Examples of Microsoft Excel functions usage in SAS

In order to use any of these Excel functions in your SAS code, all you need to do is to specify the functions definition data table in the CMPLIB= option:

options cmplib=SASHELP.SLKWXL;

Let’s consider several examples.

ODD function

This function returns number rounded up to the nearest odd integer:

options cmplib=SASHELP.SLKWXL;
data _null_;
   x = 6.4;
   y = odd_slk(x);
   put 'odd( ' x ') = ' y;
run;

SAS log:
odd( 6.4 ) = 7

EVEN function

This function returns number rounded up to the nearest even integer:

options cmplib=SASHELP.SLKWXL;
data _null_;
   x = 6.4;
   y = even_slk(x);
   put 'even( ' x ') = ' y;
run;

SAS log:
odd( 6.4 ) = 8

FACTDOUBLE function

This function returns the double factorial of a number. If number is not an integer, it is truncated.
Double factorial (or semifactorial) of a number n, denoted by n!!, is the product of all the integers from 1 up to n that have the same parity as n.
For even n, the double factorial is n!!=n(n-2)(n-4)…(4)(2), and for odd n, the double factorial is n!! = n(n-2)(n-4)…(3)(1).

Here is a SAS code example using the factdouble() Excel function:

options cmplib=SASHELP.SLKWXL;
data _null_;
   n = 6;
   m = 7;
   nn = factdouble_slk(n);
   mm = factdouble_slk(m);
   put n '!! = ' nn / m '!! = ' mm;
run;

It will produce the following SAS log:
6 !! = 48
7 !! = 105

Indeed, 6!! = 2 x 4 x 6 = 48 and 7!! = 1 x 3 x 5 x 7 = 105.

PRODUCT function

This function multiplies all elements of SAS numeric array given as its argument and returns the product:

options cmplib=SASHELP.SLKWXL;
data _null_;
   array x x1-x5 (5, 7, 1, 2, 2);
   p = product_slk(x);
   put 'x = ( ' x1-x5 ')';
   put 'product(x) = ' p;
run;

SAS log:
x = ( 5 7 1 2 2 )
product(x) = 140

Indeed 5*7*1*2*2 = 140.

MULTINOMIAL function

This function returns the ratio of the factorial of a sum of values to the product of factorials:

MULTINOMIAL(a1, a2, ... , an) = (a1 + a2 + ... + an)! : (a1! a2! ... an!)

In SAS, the argument to this function is specified as numeric array name:

options cmplib=SASHELP.SLKWXL;
data _null_;
   array a a1-a3 (1, 3, 2);
   m = multinomial_slk(a);
   put 'a = ( ' a1-a3 ')';
   put 'multinomial(a) = ' m;
run;

SAS log:
a = ( 1 3 2 )
multinomial(a) = 60

Indeed (1+3+2)!  :  (1! + 3! + 2!) = 720 : 12 = 60.

Other Microsoft Excel functions available in SAS

You can explore other Excel functions available in SAS via SASHELP.SLKWXL user-defined functions by cross-referencing them with the corresponding Microsoft Excel functions documentation (alphabetical or by categories) As you can see in the above List of Excel functions available in SAS, besides mathematical functions exemplified in the previous section, there are also many Excel financial functions related to securities trading that are made available in SAS.

Additional Resources on SAS user-defined functions

Your thoughts?

Have you found this blog post useful? Please share your use cases, thoughts and feedback in the comments below.

Using Microsoft Excel functions in SAS was published on SAS Users.

9月 022020
 

SAS offering free learning resources in celebration of programmers

For more than 40 years, SAS programmers have crafted software and solutions that transform the world. From statistics to data science, to analytics and artificial intelligence, people writing code have architected a new economy with incredible opportunities. SAS Programmer Week honors those people by offering free learning resources available for everyone, from students to early career professionals to SAS veterans.

Running from Sept. 7-11, SAS Programmer Week leads up to the international Day of the Programmer on Saturday, Sept. 12. Training resources will be available for free through a variety of YouTube and video tutorials, webinars, blogs and documentation.

There will be three different tracks for new, experienced and analytics-focused users, with new content released each day. The week culminates with SAS certification prep content that will have participants ready to pursue a valuable SAS credential.

For instance, Tech Republic named SAS as one of 7 data science certifications to boost your resume and salary. CIO Magazine puts SAS among the top 11 big data and analytics certifications for 2020.

Since SAS programmers are busy and may not have all day to engage with the materials, SAS Programmer Week is flexible. Participants can access the material when they want to learn a specific skill related to the day’s topic or consume the material in snippets when they have time.

Interested participants can visit the SAS Programmer Week website to register today, preview the materials and schedule, and jump-start their career journeys.

 

All hail the SAS programmer! was published on SAS Users.

4月 302012
 

Now, we all know by now that I'm not a programmer (that makes me very sad sometimes and may frustrate some of you at times), but I know a good paper and presentation when I see one. Christopher Bost knows how to teach a topic. I went to his Tuesday afternoon SAS Global Forum presentation because his paper sounded so interesting. (The room was packed, and no one left, so there must have been others who agreed with me.)

Bost's paper, Selecting All Observations When Any Observation Is of Interest, was written for those who work with medical claims data, fund balances, insurance claims data (you get it - any large data set that might contain multiple observations per person). His paper and presentation were about trading efficiencies - Can you sacrifice a little time processing the data, if the trade is a gain in a simpler way of coding?

Bost proposed three methods: the DATA step approach and then two different ways to produce the same results with PROC SQL.

DATA Step

According to Bost, a common method for selecting all observations when any observation is of interest is to use a match-merge. This requires three steps:

  1. Subset the observations of interest
  2. Keep one observation per person
  3. Match-merge the data
"By the way, I want to go on record as saying that I did this the most inefficient way possible because I wanted to show how efficient PROC SQL is," Bost told the audience with a grin. "You can do it more efficiently." (See the code in his paper.)
 
To put proof to Bost's pudding, he provided a list of pros and cons. First, he says that DATA step works fine and everybody should know how to do it, but what if the files were huge? It took three DATA steps and two PROC SORT steps on his tiny example data set.
 

PROC SQL

Here's the order when writing a PROC SQL clause (I tried to take a picture of this, so Bost sent me his slide):
 
Bost chose two methods in PROC SQL, but he says there are many PROC SQL methods that he could have chosen. He chose to show the audience a SUBQUERY and the GROUP BY and HAVING clauses. (Again, I won't include the code here. Bost has done a better job than I could at explaining it in his paper.)
 

Pros of PROC SQL methods used in his paper:

  • Subtle difference in sorting data - DATA step processing is done - almost always - sequentially. "In SQL, processing is not done from first to last. In fact, believe it or not, not doing things sequentially can be much more efficient - depending upon what you are doing," said Bost.
  • Single step instead of five steps
  • Flexibility

Cons:

  • Two passes through the data set - one for the subquery and one for the outer query. "As Kirk Lafler explained it to me," said Bost. "It is possible that the subquery gets executed for each observation." Possible.
Check out Bost's paper, and if you get a chance to hear him present (teach), don't pass it up!!
 
 
 

 

 

tags: data step, Data step programmers, paper & presentations, PROC SQL, SAS Global Forum
9月 132011
 
Kathleen Harkins, Carolyn Maass and Mary Anne Rutkowski, from Merck Sharp and Dohme, collaborated to write T.I.P.S: Techniques and information for programming in SAS® for NESUG 2011. These three women are highly experienced programmers: Harkins has more than 20 years of experience in the pharmaceutical and aerospace industries; Maass has [...]