SAS functions

2月 172022
 

This blog serves two purposes: the main purpose is to show you some useful SAS coding techniques, and the second is to show you an interesting method of creating a Beale cipher.

TJ Beale is famous in Virginia for leaving behind three ciphers, supposedly describing the location of hidden gold and treasures. (Most cryptologists and historians believe the whole set of ciphers and treasure was a hoax.) In one of the ciphers, he used a method based on the Declaration of Independence. His coding method was as follows:

  • Get a copy of the Declaration of Independence and number each word.
  • Take the first letter of each word and form a list.
  • Associate each number with that letter.

For example, consider this text:

“Four score and seven years ago, our fathers brought forth upon this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal. “

To create a Beale cipher, you would proceed as follows:

Four(1) score(2) and(3) seven(4) years(5) ago(6), our(7) fathers(8) brought(9) forth(10) upon(11) this(12) continent(13) a(14) new(15) nation(16), conceived(17) in(18) liberty(19) and(20) dedicated(21) to(22) the(23) proposition(24) that(25) all(26) men(27) are(28) created(29) equal(30).

Next, you would make a table like this:

Letter Numbers
F 1,8,10 (all the numbers of words that begin with 'F')
S 2,4
A 3,6,14,20,26,28
Y 5
…and so on

 

You would then want to put the list in alphabetical order like this:

A 3,6,14,20,26,28
B 9
C 13,17,29
D 21
E 30
F 1,8,10
…and so on

 

To create your cipher, select any number at random from the list of numbers, corresponding to the letter that you want to encode. The advantage of this over a simple substitution cipher is that you cannot use frequency analysis to guess what letter a particular number represents.

This blog explains how to create a Beale cipher; my next blog will explain how to decipher a Beale cipher.

You need to start out with a book or document that is accessible to the sender and recipient of the cipher. To offer some additional security, you could decide to start from a specific page in a book. For a simple demonstration of how to create a Beale cipher, I have entered part of the Declaration of Independence in a text file called Declare.txt.

A funny aside: I was teaching my Functions course in the UK, in a small town north of London on the Thames. One of the programs demonstrating several SAS character functions was the program I'm using here to demonstrate how to create a Beale cipher. I had completely forgotten that the document was the Declaration of Independence. Whoops! I asked the class, "I hope you're not still angry with us about that." Apparently not, and we all had a good laugh.

Back to the problem. I will break down the program into small steps and provide a partial listing of data sets along the way, so that you can see exactly how the program works. The first step is read the text file, extract the first letter from each word, change the letter to uppercase, and associate each letter with the count of words in the text.

Here is the first part of the program.

data Beale;
   length Letter $ 1;  
   infile 'c:\Books\Blogs\Declare.txt'; 
   input Letter : $upcase1. @@; ❶
   N + 1; ❷	
   output;
run;
 
title "First Step in the Beale Cipher (first 10 observations)";
proc print data=Beale(obs=10) noobs;
run;

❶ By using the $UPCASE1. informat, you are selecting the first letter of each word and converting it to uppercase. If you are unfamiliar with the $UPCASEn. informat, it is similar to the $n. informat with the additional task of converting the character(s) to uppercase.

❷ You use a SUM statement to associate each letter with the word count.

Here is the listing from this first step:

Next, you need to sort the data set by Letter so that all the words that start with As, Bs, and so forth are placed together.

proc sort data=Beale;
   by Letter;
run;
 
title "The list in sorted order (partial listing)";
proc print data=Beale(obs=10) noobs;
run;

Below is a partial listing of the sorted file:

Any of the numbers 24, 25, 27, and so forth can be used to code an 'A'.

The final step is to list all the letters from A to Z (Z is pronounced Zed in the UK and Canada) in a line, followed by all the possible numbers associated with each letter.

data Next;
   length List $ 40; ❸
   retain List; ❹
   set Beale;
   by Letter; ❺
   if first.Letter then List = ' '; ❻
   List = catx(',',List,N); ❼ 
   if last.Letter then output; ❽ 
run;
 
title "List of Beale Substitutions";
proc print data=next(obs=5) noobs;
   var Letter List;
run;

❸ The variable List will hold all the possible numbers that can be used to code any of the letters. In a real program, this list might be longer.

❹ You need to RETAIN this variable; otherwise, it would be set back to a missing value for each iteration of the DATA step.

❺ Following the SET statement with a BY statement creates the two temporary variables, First.Letter and Last.Letter. First.Letter is true when you are reading the first observation for each letter—Last.Letter is true when you are reading the last observation for a letter.

❻ For the first A, B, C, and so on, initialize the variable List to a missing value.

❼ Use the CATX function to concatenate all the numbers, separated by commas.

❽ When you are done reading the last A, B, C, and so on, output the string.

Below are a few lines generated by this program:

For more information about the CATX function and other SAS functions, please take a look at my book, SAS Functions by Example, Second Edition.

Fun with Ciphers (Part 1) was published on SAS Users.

12月 232021
 

As the leader in analytics, SAS provides a very rich set of SAS functions and CALL routines for programmers. There is a total of 693 uniquely named SAS functions besides the hundreds of SAS PROC steps. These SAS functions not only cover general functions such as character/string processing, mathematical, permutation & combination, probability, search and sorting, date and time, etc., but also includes a powerful domain-specific function set involving variable information and control, external files and system extensions, web services and tools, descriptive statistics, financial calculations, geodetic distance & ZIP code, Git and other fields. This article reviews the financial functions provided by the SAS system and demonstrates how to use them to do financial analysis and simplify your decisions. Functions with the same name can be invoked easily during programming with SAS DATA step, DS2, FedSQL and CAS environment, which will improve your programs' portability.

Overview

SAS provides 94 financial functions in total, which are mainly used to calculate financial values, such as interest, periodic payments, depreciation, and prices for European options on stocks and futures. The 53 financial functions conforming to industry standards are packaged in 1 general function called FINANCE. Each finance function has a unique string identifier for the first parameter of the FINANCE function. These functions have to be invoked via the FINANCE function. For example, we can use irr=finance("IRR", -400,100,200,300); to compute the internal rate of return (IRR) for a series of cash flows.

The 53 FINANCE functions are compatible with the 53 financial functions provided by Microsoft Excel for Office 365. In fact, the list of financial functions available in the latest Excel for Office 365 software contains 56 functions. Among the 3 extra functions, PDURATION and RRI are specific functions for Excel 2013, and STOCKHISTORY is a specific function for Office 365. These 53 financial functions are designed for financial calculation such as depreciation, maturation, accrued interest, net present value, periodic savings, internal rates of return, etc. They can be categorized as investment, depreciation, Internal rate of return, financial value, and yield calculations. The complete list of SAS FINANCE functions is shown in Table 1.

Category Subcategory Function
Investment Calculation [12] Future Value [2] FV, FVSCHEDULE
Payment [4] PMT, IPMT, PPMT, ISPMT
Present Value [3] PV, NPV, XNPV
Annual interest rate [2] EFFECT, NOMINAL
Number of periods [1] NPER
Depreciation [7] AMORDEGRC, AMORLINC, DB, DDB, SLN, SYD, VDB
Internal rate of return [4] IRR, RATE, MIRR, XIRR
Financial value and yield [30] Accrued/Cumulative Interest & Principal [4] ACCRINT, ACCRINTM, CUMIPMT, CUMPRINC
Coupon days/date and Number [6] COUPDAYBS, COUPDAYS, COUPDAYSNC, COUPNCD, COUPNUM, COUPPCD
Yield [6] INTRATE, ODDFYIELD, ODDLYIELD, YIELD, YIELDDISC, YIELDMAT
Treasury bill [3] TBILLEQ, TBILLPRICE, TBILLYIELD
Duration [2] DURATION, MDURATION
Security [7] DISC, ODDFPRICE, ODDLPRICE, PRICE, PRICEDISC, PRICEMAT, RECEIVED
Dollar Price [2] DOLLARDE, DOLLARFR

Table 1: 53 FINANCE XXX functions in one FINANCE function

Another 40 independently named SAS financial functions are used for investment or loan calculation, cash flow, depreciation, and pricing models. For example, we can use irr =irr(1,-400,100,200,300); to compute the internal rate of return (IRR) as a percentage for a series of cash flows.

There are 7 functions (IRR, NPV, PMT, PPMT, IPMT, CUMIPMT, CUMPRINC) whose names are the same as the industry standard FINANCE functions, but they have different sign-in results or even different calculation logic. For example, FINANCE PMT is used to calculate the periodic payment of an annuity, but the PMT function is used to calculate the periodic payment for a constant payment loan or the periodic savings for a future balance. Annuity and periodic savings are different in anchoring market interest rates, interest calculation, liquidity, and binding, so the same arguments return different results for them. The list is shown in Table 2.

Category Function
Investment/Loan [16] IRR/INTRR, NPV/NETPV, SAVING, SAVINGS,
NOMRATE, EFFRATE, COMPOUND,
PMT, PPMT, IPMT, CUMIPMT, CUMPRINC, MORT, TIMEVALUE
Depreciation [10] DACCDB, DACCDBSL, DACCSL, DACCSYD, DACCTAB
DEPDB, DEPDBSL, DEPSL, DEPSYD, DEPTAB
Cash flow [6] CONVX, CONVXP, DUR, DURP, PVP, YIELDP
Pricing Models [8] Stocks: BLKSHCLPRC/BLKSHPTPRC, GARKHCLPRC/GARKHPTPRC, MARGRCLPRC/MARGRPTPRC
Futures: BLACKCLPRC/BLACKPTPRC

Table 2: 40 uniquely named financial functions

To keep semantic consistency, these industry-standard financial functions usually have the same parameter names, and some of the most important parameters are:

  • Interest rate (rate): The interest rate or discount rate of an investment or loan.
  • Number of periods (nper): The total number of payment periods for investment or loan.
  • Payment (pmt): The regular payment that is made each period for investment or loan, the value cannot change over the life of the annuity. Typically, it contains principal and interest, but not other fees and taxes.
  • Present value (pv): The current value of an investment or loan. The present value of an investment is the value at the beginning of the investment period, and the present value of the loan is the value of the principal borrowed.
  • Future value (fv): The value of an investment or loan after all payments have occurred, or a cash balance that you want to attain after the last payment is made.
  • Payments due Type (type): A flag indicates if payments are due at the beginning of the period. The omitted type or 0 means payments are due at the end of the period, 1 indicates payments are due at the beginning of the period.
  • Period (period): Specify the period for interest rate or depreciation calculation. The period must be between 1 and the number of periods (nper).
  • Day count basis (basis): Optional, specify the type of day count basis to use. 0-represents US (NASD) 30/360, default; 1-represents Actual/actual; 2-represents Actual /360; 3-represents Actual /365; 4-represents European 30/360.
  • Frequency (frequency): Specify the number of coupon payments per year, 1-annual payment; 2-semiannual payment; 4-quarterly payment.

Below, we use some simple examples to show how to use these SAS financial functions to calculate and analyze mortgages.

Mortgage calculation

The average interest rate of first home mortgage in China is 5.46% since September 2021, and the average interest rate of second home mortgages is 5.83%, up 23 and 29 basis points respectively from 2020. Assuming that the first housing loan amount is 2 million yuan (CNY), the loan periods is 25 years. If the mortgage type is Constant Payment Mortgage (CPM), how much will the monthly payment and total payment be?

The FINANCE PMT function can be used to calculate the periodic payment of an annuity, it can be used to calculate the constant payment for Constant Payment Mortgage by default. That is, the payment is a constant for all periods, the same amount of the loan, including principal and interest, is repaid every month. The required parameters of the FINANCE PMT function are the interest rate (rate), the number of periods (nper), and the loan amount as the present value (pv). The optional parameters are the future value (fv) and the payment due type (type).

The FINANCE PPMT function can be used to calculate the payment on the principal during a given period for an investment or a loan based on a constant periodic payment and a constant interest rate. The required parameters are interest rate (rate), the payment period (period), number of periods (nper), and the loan amount as the present value (pv), optional parameters are future value (fv) and payment due type (type). Since FINANCE PPMT returns the monthly principal, we can calculate the monthly interest and the principal balance (pb) that need to be repaid. The complete SAS program is as follows:

 /*Mortgage: Constant Payment Mortgage*/
data Mortgage1;
  rate=0.0546/12; /*Interest rate: convert yearly 5.46% to monthly*/
  nper=25*12;     /*Number of periods: Term in month, convert 25 years to 300 months*/
  pv=2000000;     /*Present value: Loan amount - the value of the principal borrowed 2M*/
 
  pmt = FINANCE('pmt', rate, nper, pv); /*Monthly payment*/
  pb=pv;           /*Principal Balance*/
  do period = 1 to nper;
    ppmt=FINANCE('ppmt', rate, period, nper, pv); /*Payment on the principal*/
    mi=pmt- ppmt; /*Payment on the interest*/
    pb=pb+ppmt;
    output;
  end;
 
  label rate="Interest Rate (monthly)" nper="Number of periods (month)" pv="Loan amount" 
        period="Period" pmt = "Monthly payment" ppmt="Payment on principal" mi="Payment on interest" pb="Principal balance";
run;
 
title 'Mortgage: Constant Payment Mortgage';
proc print label;
run;

The results are as follows (Figure 1):

Figure 1: Details of Constant Payment Mortgage payment

To show the total payment and the total interest, we use PROC REPORT to generate a summary table as Figure 2. The total payment for the Constant Payment Mortgage is 3,670,206.07, the total interest is 1,670,206.07, and the ratio of interest to principal is 83.51%.

proc report;
  column period periodid pmt ppmt mi pb;
  format period 3.0 pmt ppmt  mi pb comma15.2;
  define period / order noprint;
  define pb / order;
  rbreak after / summarize style=[font_weight=bold];
 
  define periodid / computed "Period" center; 
  compute periodid / character length=20;
    periodid=period ;
  endcomp;
  compute after;
    if _break_ in ('_RBREAK_') then periodid='Total';
  endcomp;
run;

Figure 2: Constant Payment Mortgage payment schedule

If the mortgage type is Constant Amortization Mortgage (CAM), what will happen to its monthly payment and interest expenses? A Constant Amortization Mortgage divides the total amount of the loan into equal parts during the payment periods and the lender repays the same amount of principal every month and the interest incurred on the remaining loan of that month. As a result, the monthly payment amount is not constant, the initial payment is high, but the interest component of the payments declines rapidly over time. For this type of mortgage, the monthly payment on the principal (ppmt) is constant; we only need to calculate the current payment on the interest (mi) generated by the principal balance (pb) to get the current total payment (pmt) for the given period. The calculation can be accomplished without calling SAS FINANCE functions. The program is as follows:

/*Mortgage: Constant Amortization Mortgage*/
data Mortgage2;
  rate=0.0546/12; /*Interest rate: convert yearly 5.46% to monthly*/
  nper=25*12;     /*Number of periods: Term in month, convert 25 years to 300 months*/
  pv=2000000;     /*Present value: Loan amount - the value of the principal borrowed 2M*/
 
  ppmt=-pv/nper;   /*Monthly Payment on the principal*/
  pb=pv;           /*Principal Balance*/
  do period=1 to nper;
    mi=-pb*rate;   /*Payment on the interest*/
    pmt=ppmt+ mi;
    pb=pb+ppmt;
    output;
  end;
 
  label rate="Interest Rate (monthly)" nper="Number of periods (month)" pv="Loan amount" 
        period="Period" pmt = "Monthly payment" ppmt="Payment on principal" mi="Payment on interest" pb="Principal balance";
RUN;
 
title 'Mortgage: Constant Amortization Mortgage';
PROC PRINT;
RUN;

We use the same PROC REPORT program above to generate the summary table as Figure 3. In the table, the total payment for the Constant Amortization Mortgage is 3,369,550.00, the total interest is 1,369,550.00, and the ratio of interest to principal is 68.48%. This indicates that using this type of mortgage can save 300,656.07 in interest expenses.

Figure 3: Constant Amortization Mortgage payment schedule

To compare the monthly payment, monthly interest, and their cumulative value changes with the repayment periods for two mortgage types, we use PROC SGPANEL to draw the following figures. The code is as follows:

/*Merge two mortgage data with type=1/2 in long format*/
data Mortgage;
  set Mortgage1 Mortgage2;  
  retain type 1;
  if period=1 and _N_^=1 then type=type+1;
  pmt=abs(pmt);
  mi=abs(mi);
  ppmt=abs(ppmt);
run;
/*Create format for mortgage type*/
proc format;
  value typefmt 1="Constant Payment Mortgage"
2="Constant Amortization Mortgage";
run;
data Mortgage;
  set Mortgage;
  accuppmt+ppmt;
  accupmt+pmt;
  if first.type then do; 
	accuppmt=ppmt;
	accupmt=pmt;
  end;
  by type;
  label accuppmt="Cumulative monthly payment on principal" accupmt="Cumulative monthly payment";
  format type typefmt.;
run;
 
ods graphics / width=800px height=300px;
title "Mortgage: CPM vs CAM";
proc sgpanel data=Mortgage;
  panelby type / novarname rows=2;
 
  vline period / response=pmt lineattrs=(color=Cxa0a0a0);
  vbar period / response=pmt nooutline fillattrs=(color=cxE0E0E0) FILLTYPE=SOLID transparency= 0.5;
  vbar period / response=ppmt nooutline fillattrs=(color=DarkOrange) FILLTYPE=SOLID transparency= 0.5;  
  colaxis label="Period" type=linear  values=(1 to 300 by 12  ) valueattrs=(color=gray size=12pt) grid;
run;
 
ods graphics / width=800px height=500px;
title "Mortgage: CPM vs CAM";
proc sgpanel data=Mortgage;
  panelby type / novarname rows=2 ;
 
  vline period / response=accupmt lineattrs=(color=Cxa0a0a0);
  vbar period / response=accupmt nooutline fillattrs=(color=cxE0E0E0) FILLTYPE=SOLID transparency= 0.5;;
  vbar period / response=accuppmt nooutline fillattrs=(color=DarkOrange) FILLTYPE=SOLID transparency= 0.5;;
  colaxis label="Period" type=linear  values=(1 to 300 by 12  )  valueattrs=(color=gray size=11pt) grid;  
run;

In the following figures, the y-axis represents the monthly payment or the cumulative monthly payment, and the x-axis represents the payment period. The orange represents the principal component and the gray represents the interest component; the two constitute the monthly payment.

In Figure 4, the monthly payment of CPM (top), and the monthly payment on principal of CAM (bottom) are fixed values, which appear as a horizontal line and do not change over time. Correspondingly, in Figure 5, the cumulative monthly payment of CPM (top) and the cumulative monthly payment on principal (bottom) show linear growth.

Figure 4: Monthly payment of CPM vs CAM over period

Figure 5: Cumulative monthly payment of CPM vs CAM over period

Since the monthly payment of CPM is constant, the principal component increases month by month, but the interest component decreases accordingly. For CAM, the monthly payment and the interest component decreases rapidly month by month, but the principal component is constant.

CPM brings less stress to lenders and it is suitable for people whose income is expected to rise, and the lender is only willing to bear a lower monthly payment. CPM is also suitable for those people who have other channels to manage their own money to achieve greater benefits than mortgage expenditures. It is also suitable for those people who have a plan to sell real estate after a certain period. The disadvantage of CPM is that the overall interest expense is relatively high.

CAM is suitable for people who have higher income but whose income is flat or declining in the future. These people hope to put the payment stress on now rather than in the future. Declining payments may be appropriate to match a deflationary environment, but its high initial payment and rapid paydown of principal reduces leverage too fast. Furthermore, some people may complain the constantly changing payment is difficult to budget for and manage.

Summary

We have summarized the design and implementation of SAS functions in financial calculations. We then provided examples to calculate and analyze the monthly payment, interest, and principal for CPM/CAM mortgages. Finally, we compared the monthly payment, principal, interest, and cumulative value changes to a period for two mortgage types.

SAS financial functions review and mortgage payment analysis was published on SAS Users.

12月 232021
 

As the leader in analytics, SAS provides a very rich set of SAS functions and CALL routines for programmers. There is a total of 693 uniquely named SAS functions besides the hundreds of SAS PROC steps. These SAS functions not only cover general functions such as character/string processing, mathematical, permutation & combination, probability, search and sorting, date and time, etc., but also includes a powerful domain-specific function set involving variable information and control, external files and system extensions, web services and tools, descriptive statistics, financial calculations, geodetic distance & ZIP code, Git and other fields. This article reviews the financial functions provided by the SAS system and demonstrates how to use them to do financial analysis and simplify your decisions. Functions with the same name can be invoked easily during programming with SAS DATA step, DS2, FedSQL and CAS environment, which will improve your programs' portability.

Overview

SAS provides 94 financial functions in total, which are mainly used to calculate financial values, such as interest, periodic payments, depreciation, and prices for European options on stocks and futures. The 53 financial functions conforming to industry standards are packaged in 1 general function called FINANCE. Each finance function has a unique string identifier for the first parameter of the FINANCE function. These functions have to be invoked via the FINANCE function. For example, we can use irr=finance("IRR", -400,100,200,300); to compute the internal rate of return (IRR) for a series of cash flows.

The 53 FINANCE functions are compatible with the 53 financial functions provided by Microsoft Excel for Office 365. In fact, the list of financial functions available in the latest Excel for Office 365 software contains 56 functions. Among the 3 extra functions, PDURATION and RRI are specific functions for Excel 2013, and STOCKHISTORY is a specific function for Office 365. These 53 financial functions are designed for financial calculation such as depreciation, maturation, accrued interest, net present value, periodic savings, internal rates of return, etc. They can be categorized as investment, depreciation, Internal rate of return, financial value, and yield calculations. The complete list of SAS FINANCE functions is shown in Table 1.

Category Subcategory Function
Investment Calculation [12] Future Value [2] FV, FVSCHEDULE
Payment [4] PMT, IPMT, PPMT, ISPMT
Present Value [3] PV, NPV, XNPV
Annual interest rate [2] EFFECT, NOMINAL
Number of periods [1] NPER
Depreciation [7] AMORDEGRC, AMORLINC, DB, DDB, SLN, SYD, VDB
Internal rate of return [4] IRR, RATE, MIRR, XIRR
Financial value and yield [30] Accrued/Cumulative Interest & Principal [4] ACCRINT, ACCRINTM, CUMIPMT, CUMPRINC
Coupon days/date and Number [6] COUPDAYBS, COUPDAYS, COUPDAYSNC, COUPNCD, COUPNUM, COUPPCD
Yield [6] INTRATE, ODDFYIELD, ODDLYIELD, YIELD, YIELDDISC, YIELDMAT
Treasury bill [3] TBILLEQ, TBILLPRICE, TBILLYIELD
Duration [2] DURATION, MDURATION
Security [7] DISC, ODDFPRICE, ODDLPRICE, PRICE, PRICEDISC, PRICEMAT, RECEIVED
Dollar Price [2] DOLLARDE, DOLLARFR

Table 1: 53 FINANCE XXX functions in one FINANCE function

Another 40 independently named SAS financial functions are used for investment or loan calculation, cash flow, depreciation, and pricing models. For example, we can use irr =irr(1,-400,100,200,300); to compute the internal rate of return (IRR) as a percentage for a series of cash flows.

There are 7 functions (IRR, NPV, PMT, PPMT, IPMT, CUMIPMT, CUMPRINC) whose names are the same as the industry standard FINANCE functions, but they have different sign-in results or even different calculation logic. For example, FINANCE PMT is used to calculate the periodic payment of an annuity, but the PMT function is used to calculate the periodic payment for a constant payment loan or the periodic savings for a future balance. Annuity and periodic savings are different in anchoring market interest rates, interest calculation, liquidity, and binding, so the same arguments return different results for them. The list is shown in Table 2.

Category Function
Investment/Loan [16] IRR/INTRR, NPV/NETPV, SAVING, SAVINGS,
NOMRATE, EFFRATE, COMPOUND,
PMT, PPMT, IPMT, CUMIPMT, CUMPRINC, MORT, TIMEVALUE
Depreciation [10] DACCDB, DACCDBSL, DACCSL, DACCSYD, DACCTAB
DEPDB, DEPDBSL, DEPSL, DEPSYD, DEPTAB
Cash flow [6] CONVX, CONVXP, DUR, DURP, PVP, YIELDP
Pricing Models [8] Stocks: BLKSHCLPRC/BLKSHPTPRC, GARKHCLPRC/GARKHPTPRC, MARGRCLPRC/MARGRPTPRC
Futures: BLACKCLPRC/BLACKPTPRC

Table 2: 40 uniquely named financial functions

To keep semantic consistency, these industry-standard financial functions usually have the same parameter names, and some of the most important parameters are:

  • Interest rate (rate): The interest rate or discount rate of an investment or loan.
  • Number of periods (nper): The total number of payment periods for investment or loan.
  • Payment (pmt): The regular payment that is made each period for investment or loan, the value cannot change over the life of the annuity. Typically, it contains principal and interest, but not other fees and taxes.
  • Present value (pv): The current value of an investment or loan. The present value of an investment is the value at the beginning of the investment period, and the present value of the loan is the value of the principal borrowed.
  • Future value (fv): The value of an investment or loan after all payments have occurred, or a cash balance that you want to attain after the last payment is made.
  • Payments due Type (type): A flag indicates if payments are due at the beginning of the period. The omitted type or 0 means payments are due at the end of the period, 1 indicates payments are due at the beginning of the period.
  • Period (period): Specify the period for interest rate or depreciation calculation. The period must be between 1 and the number of periods (nper).
  • Day count basis (basis): Optional, specify the type of day count basis to use. 0-represents US (NASD) 30/360, default; 1-represents Actual/actual; 2-represents Actual /360; 3-represents Actual /365; 4-represents European 30/360.
  • Frequency (frequency): Specify the number of coupon payments per year, 1-annual payment; 2-semiannual payment; 4-quarterly payment.

Below, we use some simple examples to show how to use these SAS financial functions to calculate and analyze mortgages.

Mortgage calculation

The average interest rate of first home mortgage in China is 5.46% since September 2021, and the average interest rate of second home mortgages is 5.83%, up 23 and 29 basis points respectively from 2020. Assuming that the first housing loan amount is 2 million yuan (CNY), the loan periods is 25 years. If the mortgage type is Constant Payment Mortgage (CPM), how much will the monthly payment and total payment be?

The FINANCE PMT function can be used to calculate the periodic payment of an annuity, it can be used to calculate the constant payment for Constant Payment Mortgage by default. That is, the payment is a constant for all periods, the same amount of the loan, including principal and interest, is repaid every month. The required parameters of the FINANCE PMT function are the interest rate (rate), the number of periods (nper), and the loan amount as the present value (pv). The optional parameters are the future value (fv) and the payment due type (type).

The FINANCE PPMT function can be used to calculate the payment on the principal during a given period for an investment or a loan based on a constant periodic payment and a constant interest rate. The required parameters are interest rate (rate), the payment period (period), number of periods (nper), and the loan amount as the present value (pv), optional parameters are future value (fv) and payment due type (type). Since FINANCE PPMT returns the monthly principal, we can calculate the monthly interest and the principal balance (pb) that need to be repaid. The complete SAS program is as follows:

 /*Mortgage: Constant Payment Mortgage*/
data Mortgage1;
  rate=0.0546/12; /*Interest rate: convert yearly 5.46% to monthly*/
  nper=25*12;     /*Number of periods: Term in month, convert 25 years to 300 months*/
  pv=2000000;     /*Present value: Loan amount - the value of the principal borrowed 2M*/
 
  pmt = FINANCE('pmt', rate, nper, pv); /*Monthly payment*/
  pb=pv;           /*Principal Balance*/
  do period = 1 to nper;
    ppmt=FINANCE('ppmt', rate, period, nper, pv); /*Payment on the principal*/
    mi=pmt- ppmt; /*Payment on the interest*/
    pb=pb+ppmt;
    output;
  end;
 
  label rate="Interest Rate (monthly)" nper="Number of periods (month)" pv="Loan amount" 
        period="Period" pmt = "Monthly payment" ppmt="Payment on principal" mi="Payment on interest" pb="Principal balance";
run;
 
title 'Mortgage: Constant Payment Mortgage';
proc print label;
run;

The results are as follows (Figure 1):

Figure 1: Details of Constant Payment Mortgage payment

To show the total payment and the total interest, we use PROC REPORT to generate a summary table as Figure 2. The total payment for the Constant Payment Mortgage is 3,670,206.07, the total interest is 1,670,206.07, and the ratio of interest to principal is 83.51%.

proc report;
  column period periodid pmt ppmt mi pb;
  format period 3.0 pmt ppmt  mi pb comma15.2;
  define period / order noprint;
  define pb / order;
  rbreak after / summarize style=[font_weight=bold];
 
  define periodid / computed "Period" center; 
  compute periodid / character length=20;
    periodid=period ;
  endcomp;
  compute after;
    if _break_ in ('_RBREAK_') then periodid='Total';
  endcomp;
run;

Figure 2: Constant Payment Mortgage payment schedule

If the mortgage type is Constant Amortization Mortgage (CAM), what will happen to its monthly payment and interest expenses? A Constant Amortization Mortgage divides the total amount of the loan into equal parts during the payment periods and the lender repays the same amount of principal every month and the interest incurred on the remaining loan of that month. As a result, the monthly payment amount is not constant, the initial payment is high, but the interest component of the payments declines rapidly over time. For this type of mortgage, the monthly payment on the principal (ppmt) is constant; we only need to calculate the current payment on the interest (mi) generated by the principal balance (pb) to get the current total payment (pmt) for the given period. The calculation can be accomplished without calling SAS FINANCE functions. The program is as follows:

/*Mortgage: Constant Amortization Mortgage*/
data Mortgage2;
  rate=0.0546/12; /*Interest rate: convert yearly 5.46% to monthly*/
  nper=25*12;     /*Number of periods: Term in month, convert 25 years to 300 months*/
  pv=2000000;     /*Present value: Loan amount - the value of the principal borrowed 2M*/
 
  ppmt=-pv/nper;   /*Monthly Payment on the principal*/
  pb=pv;           /*Principal Balance*/
  do period=1 to nper;
    mi=-pb*rate;   /*Payment on the interest*/
    pmt=ppmt+ mi;
    pb=pb+ppmt;
    output;
  end;
 
  label rate="Interest Rate (monthly)" nper="Number of periods (month)" pv="Loan amount" 
        period="Period" pmt = "Monthly payment" ppmt="Payment on principal" mi="Payment on interest" pb="Principal balance";
RUN;
 
title 'Mortgage: Constant Amortization Mortgage';
PROC PRINT;
RUN;

We use the same PROC REPORT program above to generate the summary table as Figure 3. In the table, the total payment for the Constant Amortization Mortgage is 3,369,550.00, the total interest is 1,369,550.00, and the ratio of interest to principal is 68.48%. This indicates that using this type of mortgage can save 300,656.07 in interest expenses.

Figure 3: Constant Amortization Mortgage payment schedule

To compare the monthly payment, monthly interest, and their cumulative value changes with the repayment periods for two mortgage types, we use PROC SGPANEL to draw the following figures. The code is as follows:

/*Merge two mortgage data with type=1/2 in long format*/
data Mortgage;
  set Mortgage1 Mortgage2;  
  retain type 1;
  if period=1 and _N_^=1 then type=type+1;
  pmt=abs(pmt);
  mi=abs(mi);
  ppmt=abs(ppmt);
run;
/*Create format for mortgage type*/
proc format;
  value typefmt 1="Constant Payment Mortgage"
2="Constant Amortization Mortgage";
run;
data Mortgage;
  set Mortgage;
  accuppmt+ppmt;
  accupmt+pmt;
  if first.type then do; 
	accuppmt=ppmt;
	accupmt=pmt;
  end;
  by type;
  label accuppmt="Cumulative monthly payment on principal" accupmt="Cumulative monthly payment";
  format type typefmt.;
run;
 
ods graphics / width=800px height=300px;
title "Mortgage: CPM vs CAM";
proc sgpanel data=Mortgage;
  panelby type / novarname rows=2;
 
  vline period / response=pmt lineattrs=(color=Cxa0a0a0);
  vbar period / response=pmt nooutline fillattrs=(color=cxE0E0E0) FILLTYPE=SOLID transparency= 0.5;
  vbar period / response=ppmt nooutline fillattrs=(color=DarkOrange) FILLTYPE=SOLID transparency= 0.5;  
  colaxis label="Period" type=linear  values=(1 to 300 by 12  ) valueattrs=(color=gray size=12pt) grid;
run;
 
ods graphics / width=800px height=500px;
title "Mortgage: CPM vs CAM";
proc sgpanel data=Mortgage;
  panelby type / novarname rows=2 ;
 
  vline period / response=accupmt lineattrs=(color=Cxa0a0a0);
  vbar period / response=accupmt nooutline fillattrs=(color=cxE0E0E0) FILLTYPE=SOLID transparency= 0.5;;
  vbar period / response=accuppmt nooutline fillattrs=(color=DarkOrange) FILLTYPE=SOLID transparency= 0.5;;
  colaxis label="Period" type=linear  values=(1 to 300 by 12  )  valueattrs=(color=gray size=11pt) grid;  
run;

In the following figures, the y-axis represents the monthly payment or the cumulative monthly payment, and the x-axis represents the payment period. The orange represents the principal component and the gray represents the interest component; the two constitute the monthly payment.

In Figure 4, the monthly payment of CPM (top), and the monthly payment on principal of CAM (bottom) are fixed values, which appear as a horizontal line and do not change over time. Correspondingly, in Figure 5, the cumulative monthly payment of CPM (top) and the cumulative monthly payment on principal (bottom) show linear growth.

Figure 4: Monthly payment of CPM vs CAM over period

Figure 5: Cumulative monthly payment of CPM vs CAM over period

Since the monthly payment of CPM is constant, the principal component increases month by month, but the interest component decreases accordingly. For CAM, the monthly payment and the interest component decreases rapidly month by month, but the principal component is constant.

CPM brings less stress to lenders and it is suitable for people whose income is expected to rise, and the lender is only willing to bear a lower monthly payment. CPM is also suitable for those people who have other channels to manage their own money to achieve greater benefits than mortgage expenditures. It is also suitable for those people who have a plan to sell real estate after a certain period. The disadvantage of CPM is that the overall interest expense is relatively high.

CAM is suitable for people who have higher income but whose income is flat or declining in the future. These people hope to put the payment stress on now rather than in the future. Declining payments may be appropriate to match a deflationary environment, but its high initial payment and rapid paydown of principal reduces leverage too fast. Furthermore, some people may complain the constantly changing payment is difficult to budget for and manage.

Summary

We have summarized the design and implementation of SAS functions in financial calculations. We then provided examples to calculate and analyze the monthly payment, interest, and principal for CPM/CAM mortgages. Finally, we compared the monthly payment, principal, interest, and cumulative value changes to a period for two mortgage types.

SAS financial functions review and mortgage payment analysis was published on SAS Users.

7月 272021
 

In the past, the COMPRESS function was useful. Since SAS version 9, it has become a blockbuster, and you might not have noticed. The major change was the addition of a new optional parameter called MODIFIERS.

The traditional use of the COMPRESS function was to remove blanks or a list of selected characters from a character string. The addition of a MODIFIER argument does two things. First, you can specify classes of characters to remove, such as all letters, all punctuation marks, or all digits. That is extremely useful, but the addition of the 'k' modifier is why I used the term blockbuster in my description. The 'k' modifier flips the function from one that removes characters from a string to one that keeps a list of characters and removes everything else. Let me show you some examples.

This first example stems from a real problem I encountered while trying to read values that contained units. My data looked something like this:

ID     Weight 
001    100lbs.
002     59Kgs.
003    210LBS
004    83kg

My goal was to create a variable called Wt that represented the person's weight in pounds as a numeric value.

First, let’s look at the code. Then, I’ll give an explanation.

data Convert;
   length ID $3 Weight $8;
   input ID Weight;
 
   Wt = input(compress(Weight,,'kd'),8.);
   /* The COMPRESS function uses two modifiers, 'k' and 'd'.  This means
      keep the digits, remove anything else.  The INPUT function does the
      character-to-numeric conversion.
   */
 
   If findc(Weight,'k','i') then Wt = Wt * 2.2;
 
   /* the FINDC function is looking for an upper or lowercase 'k' in the
      original character string.  If found, it converts the value in
      kilograms to pounds (note: 1 kg = 2.2 pounds).
   */
 
datalines;
001    100lbs.
002     59Kgs.
003    210LBS
004    83kg
;
title "Listing of Data Set Convert";
footnote "This program was run using SAS OnDemand for Academics";
proc print data=Convert noobs;
run;

The program reads the value of Weight as a character string. The COMPRESS function uses 'k' and 'd' as modifiers. Notice the two commas in the list of arguments. A single comma would interpret 'kd' as the second argument (the list of characters to remove). Including two commas notifies the function that 'kd' is the third argument (modifiers). You can list these modifiers in any order, but I like to use 'kd', and I think of it as "keep the digits." What remains is the string of digits. The INPUT function does the character-to-numeric conversion.

Your next step is to figure out if the original value of Weight contained an upper or lowercase 'k'. The FINDC function can take three arguments: the first is the string that you are examining, the second is a list of characters that you are searching for, and the third argument is the 'i' modifier that says, "ignore case" (very useful).

If the original character string (Weight) contains an uppercase or lowercase 'k', you convert from kilograms to pounds.

Here is the output:

There is one more useful application of the COMPRESS function that I want to discuss. Occasionally, you might have a text file in ASCII or EBCDIC that contains non-printing characters (usually placed there in error). Suppose you want just the digits, decimal points (periods), blanks, and commas. You need to read the original value as a text string. Let's call the original string Contains_Junk. All you need to convert these values is one line of code like this:

Valid = compress(Contains_Junk,'.,','kdas');

In this example, you are using all three arguments of the COMPRESS function. As in pre-9 versions of SAS, the second argument is a list of characters that you want to remove. However, because the third argument (modifiers) contains a 'k', the second argument is a list of characters that you want to keep. In addition to periods and commas, you use modifiers to include all digits, uppercase and lowercase letters (the 'a' modifier - 'a' for alpha), and space characters (these include spaces, tabs, and a few others such as carriage returns and linefeeds). If you did not want to include tabs and other "white space" characters, you could rewrite this line as:

Valid = compress(Contains_Junk,'., ','kd');

Here you are including a blank in the second argument and omitting the 's' in the modifier list.

You can read more about the COMPRESS function in any of the following books, available from SAS Press as an e-book or from Amazon in print form:

Or my latest programming book:

 

Questions and/or comments are welcome.

The Amazing COMPRESS Function was published on SAS Users.

11月 042020
 

Removing duplicate charactersIn this blog post we are going to tackle a data cleansing task of removing unwanted repeated characters in SAS character variables.

Character repetition can stem from various stages of data life cycle: from data collection, to data transmission, to data transformation. It can be accidental or intentional by design. It can be sporadic or consistent. In either case, it needs to be addressed by robust data cleansing processes to ensure adequate data quality that is imperative for the data usability.

Character repetition examples

Example 1. Data entry, especially manual data entry, can be a high-risk factor for accidental character duplication. Have you ever pressed a key on your computer keyboard for a bit longer than intended, so it started automatically typing multiple characters???????????????

Keyboard properties adjustment

 Tip: You can adjust your Keyboard Properties to control “Repeat delay” and “Repeat rate” settings (on Windows computer, search for “Keyboard” and click on Keyboard in Control Panel).

Example 2. Recently, I had to deal with the data that contained multiple consecutive double quotation marks all over the character string values. Even though we don’t know the exact cause of it, still for each of these duplicated quotation marks occurrences we needed to replace them with a single quotation mark.

Removing repeated blanks

There is a very useful Removing unwanted characters from text strings by Amadeus Software we developed a prototype using

data D;
   c = ','; *<- character to un-duplicate;
   cc = c||c; *<- double character;
   string = 'Many,,,,,, commas,,,,, in,,, this,, sentence.,'; *<- source string;
   put 'BEFORE:' string=; *<- output initial string;
   do while (find(string,cc)); *<- loop through while there are doubles;
      string = tranwrd(string,cc,c); *<- replace double with a single character;
   end;
   put 'AFTER: ' string=; *<- output unduplicated string;
run;

This code will produce the following in the SAS log:

BEFORE:string=Many,,,,,, commas,,,,, in,,, this,, sentence.,
AFTER: string=Many, commas, in, this, sentence.,

which shows that this approach correctly un-duplicates the source string removing and replacing all repeated characters (commas in our example) with a single one.

User-defined SAS function for removing any repeated characters

Let’s use

libname funclib 'c:\projects\functions';
proc fcmp outlib=funclib.userfuncs.package1;
   function undupc(str $, clist $) $;
      length x $32767 c $1 cc $2;
      x = str; 
      do i=1 to length(clist);
         c = char(clist,i);
         cc = c||c;
         do while (find(trim(x),cc));
            x = tranwrd(x,cc,c);
         end;
      end;
      return (x); 
   endfunc; 
run;

Code highlights

  • We introduce an interim variable x to which we will iteratively apply replacing double characters with a single one.
  • We assign length attribute of this variable to be maximum allowable character length of 32767 bytes to accommodate any character length used in the calling program.
  • Outer do-loop loops through the clist containing characters we want to unduplicate.
  • Variable c is assigned a single character from clist, variable cc is assigned double of the cc value.
  • Inner do-loop iterates through trimmed characters in x while doubles are found; using trim(x) is essential as it not only speeds up processing while searching through a shorter string (without trailing blanks), it also prevents from falling into an infinite loop in case clist contains blank character to unduplicate (cc equals to double blanks which are always going to be found among trailing blanks).

Let’s test our newly minted UNDUPC function on the following data:

data SOURCE;
   infile datalines truncover;
   input str $50.;
   datalines;
"""Repeated "double quotes""""
Repeated,,,,,commas,,,,,,,,,,,
[[[""Mixed""]]   characters,,,
;

Since our user-defined function is permanently stored in the location specified in the

options cmplib=funclib.userfuncs;
data TARGET;
   set SOURCE;
   length new_str $50;
   new_str = undupc(str, ' ,"][');
run;

This code will remove and replace all repeated sequences of characters ' ',',', '"', ']', and '['. The order of these characters listed in the second argument doesn’t matter. Here is what we get:

Duplicate characters removal results
As you can see, we get what we wanted including the functionality of the COMPBL function.

User-defined CALL routine for removing any repeated characters

As much as I love user-defined functions, I have an issue with the above undupc user-defined function implementation. It has to do with how the PROC FCMP handles interim character variables length attribute assignment. It does not implicitly inherit their length attribute from another variable as SAS data step does. For example, if you run the following data step:

data a;
   length x $99;
   y = x;
run;

variable y will have the length attribute $99 implicitly inherited from the x variable.

In PROC CFMP function, you can either assign the length attribute to a character variable explicitly with LENGTH or ATTRIB statement (as we did by using length x $32767 ), or it will be set to $33 if you use any other way of implicit assignment. (I leave it up to you guessing why 33 and why not any other number.) Since we wanted to accommodate SAS character strings of any length, we had to explicitly assign our interim variable x length attribute the maximum valid value of $32767. This will inevitably take tall on the function performance as we will have to process longer strings.

However, we can avoid this issue by using CALL routine instead:

libname funclib 'c:\projects\functions';
proc fcmp outlib=funclib.usercalls.package1;
   subroutine undupc(str $, clist $, x $);
      outargs x;
      length c $1 cc $2;
      x = str;
      do i=1 to length(clist);
         c = char(clist,i);
         cc = c||c;
         do while (find(trim(x),cc));
            x = tranwrd(x,cc,c);
         end;
      end;
   endsub; 
run;

This code is very similar to the user-defined function above with a slight difference. Here, x variable is listed as an argument in the subroutine definition and refers to a SAS data step variable whose length attribute is assigned in the calling data step. Unlike SAS function, SAS subroutine does not return a value; instead, it uses

options cmplib=funclib.usercalls;
data TARGET;
   set SOURCE;
   length new_str $50;
   call undupc(str, ' ,"][', new_str);
run;

And we will get the same results as with the UNDUPC function above.

Store user-defined functions and subroutines separately

You can create and have both, user-defined function and call routine with the same name. However, to avoid confusion (and errors) do not store their definitions in the same data table (outlib= option of the PROC FCMP). If they are stored in the same data table, then when used in a DATA step, SAS will pull the latest definition by its name only and that may not be the entity you want.

Performance benchmarking

To compare performances of the UNDUPC function vs UNDUPC subroutine we created a rather large data table (1 Million observations) with randomly generated strings (1000 characters long):

libname SASDL 'C:\PROJECTS\TESTDATA';
 
data SASDL.TESTDATA (keep=str);
   length str $1000;
   do i=1 to 1000000;
      str = '';
      do j=1 to 1000;
         str = cats(str,byte(int(rank(' ')+38*rand('UNIFORM'))));
      end;
      output;
   end;
run;

Then we ran the following 2 data steps, one using the undupc() function, and the other using undupc() call routine:

options cmplib=funclib.userfuncs;
 
data SASDL.TESTDATA_UNDUPC_FUNC;
   set SASDL.TESTDATA;
   length new_str $1000;
   new_str = undupc(str, '#+');
run;
 
options cmplib=subrlib.usercalls;
 
data SASDL.TESTDATA_UNDUPC_CALL;
   set SASDL.TESTDATA;
   length new_str $1000;
   call undupc(str, '#+', new_str);
run;

A quick SAS log inspection reveals that CALL UNDUPC works as much as 3 times faster than UNDUPC function (10 seconds vs. 30 seconds). The time savings may vary depending on your data composition and computing environment, but in any case, if you process high volumes of data you may consider using CALL routine over function. This is not a blanket statement, as it only pertains to this particular algorithm of eliminating character repetitions where we had to explicitly assign the highest possible length attribute value to the interim variable in the function, but not in the CALL routine.

When we reduced declared length of x from $32767 to $1000 within the user-defined function definition its performance became on par with the CALL routine.

Additional Resources for SAS character strings processing

Your thoughts?

Have you found this blog post useful? Would you vote for implementing UNDUPC as a native built-in SAS function? Please share your thoughts and feedback in the comments section below.

Removing repeated characters in SAS strings was published on SAS Users.

10月 152020
 

SAS Microsoft partnershipYou might have heard about SAS - Microsoft partnership announced in June 2020 that officially joined the powers of SAS analytics with Microsoft’s cloud technology to further advance Artificial Intelligence (AI).

This partnership did not just happen out of nowhere. SAS has a long and deep history of integrating with Microsoft technologies. Examples include:

In this post we will look at a lesser known, but quite useful feature in SAS that allows SAS users to bring many Microsoft Excel functions right to their SAS programs. I hope that many SAS users (not just MS Excel aficionados) will love to discover this functionality within SAS.

Excel functions as SAS user-defined functions

SAS has a wide variety of built-in functions, however there are still many Microsoft Excel functions that are not intrinsically implemented in SAS. Luckily, many of them are made available in SAS via

proc fcmp inlib=SASHELP.SLKWXL listall;
run;

You can also capture the list of available Excel functions in a SAS data table using ODS OUTPUT with CODELIST= option:

ods noresults;
ods output codelist=WORK.EXCEL_FUNCTIONS_LIST (keep=COL1 COL2);
proc fcmp inlib=SASHELP.SLKWXL listall;
run;
ods output close;
ods results;

From this data table you can produce a nice looking HTML report listing all these functions:

data WORK.EXCEL_SAS_FUNCTIONS (keep=exc sas arg);
   label exc='Excel Function' sas='SAS Function' arg='Arguments';
   set WORK.EXCEL_FUNCTIONS_LIST (rename=(col2=arg));
   sas = tranwrd(col1,'Function ','');
   exc = tranwrd(sas,'_slk','');
run;
 
ods html path='c:\temp' file='excel_sas_functions.html';
title 'List of Excel functions available in SAS (via SASHELP.SLKWXL)';
proc print data=EXCEL_SAS_FUNCTIONS label;
run;
ods html close;

When you run this code, you should get the following list of Excel functions along with their SAS equivalents:

List of Excel functions available in SAS (via SASHELP.SLKWXL)
Obs Excel Function SAS Function Arguments
1 even even_slk ( x )
2 odd odd_slk ( x )
3 factdouble factdouble_slk ( x )
4 product product_slk ( nums )
5 multinomial multinomial_slk ( nums )
6 floor floor_slk ( n, sg )
7 datdif4 datdif4_slk ( start, end )
8 amorlinc amorlinc_slk ( cost, datep, fperiod, salvage, period, rate, basis )
9 amordegrc amordegrc_slk ( cost, datep, fperiod, salvage, period, rate, basis )
10 disc disc_slk ( settlement, maturity, pr, redemp, basis )
11 tbilleq tbilleq_slk ( settlement, maturity, discount )
12 tbillprice tbillprice_slk ( settlement, maturity, discount )
13 tbillyield tbillyield_slk ( settlement, maturity, par )
14 dollarde dollarde_slk ( fdollar, frac )
15 dollarfr dollarfr_slk ( ddollar, frac )
16 effect effect_slk ( nominal_rate, npery )
17 coupnum coupnum_slk ( settlement, maturity, freq, basis )
18 coupncd coupncd_slk ( settlement, maturity, freq, basis )
19 coupdaysnc coupdaysnc_slk ( settlement, maturity, freq, basis )
20 couppcd couppcd_slk ( settlement, maturity, freq, basis )
21 coupdays coupdays_slk ( settlement, maturity, freq, basis )
22 db db_slk ( cost, salvage, life, period, month )
23 yield yield_slk ( settlement, maturity, rate, pr, redemp, freq, basis )
24 yielddisc yielddisc_slk ( settlement, maturity, pr, redemp, basis )
25 coupdaybs coupdaybs_slk ( settlement, maturity, freq, basis )
26 oddfprice oddfprice_slk ( settlement, maturity, issue, fcoupon, rate, yield, redemp, freq, basis )
27 oddfyield oddfyield_slk ( settlement, maturity, issue, fcoupon, rate, pr, redemp, freq, basis )
28 oddlyield oddlyield_slk ( settlement, maturity, linterest, rate, pr, redemp, freq, basis )
29 oddlprice oddlprice_slk ( settlement, maturity, linterest, rate, yield, redemp, freq, basis )
30 price price_slk ( settlement, maturity, rate, yield, redemp, freq, basis )
31 pricedisc pricedisc_slk ( settlement, maturity, discount, redemp, basis )
32 pricemat pricemat_slk ( settlement, maturity, issue, rate, yld, basis )
33 yieldmat yieldmat_slk ( settlement, maturity, issue, rate, pr, basis )
34 received received_slk ( settlement, maturity, investment, discount, basis )
35 accrint accrint_slk ( issue, finterest, settlement, rate, par, freq, basis )
36 accrintm accrintm_slk ( issue, maturity, rate, par, basis )
37 duration duration_slk ( settlement, maturity, coupon, yld, freq, basis )
38 mduration mduration_slk ( settlement, maturity, coupon, yld, freq, basis )
39 avedev avedev_slk ( data )
40 devsq devsq_slk ( data )
41 varp varp_slk ( data )

 
NOTE: Excel functions that are made available in SAS are named from their Excel parent functions, suffixing them with _SLK to distinguish them from their Excel incarnations, as well as from native SAS functions.

Examples of Microsoft Excel functions usage in SAS

In order to use any of these Excel functions in your SAS code, all you need to do is to specify the functions definition data table in the CMPLIB= option:

options cmplib=SASHELP.SLKWXL;

Let’s consider several examples.

ODD function

This function returns number rounded up to the nearest odd integer:

options cmplib=SASHELP.SLKWXL;
data _null_;
   x = 6.4;
   y = odd_slk(x);
   put 'odd( ' x ') = ' y;
run;

SAS log:
odd( 6.4 ) = 7

EVEN function

This function returns number rounded up to the nearest even integer:

options cmplib=SASHELP.SLKWXL;
data _null_;
   x = 6.4;
   y = even_slk(x);
   put 'even( ' x ') = ' y;
run;

SAS log:
odd( 6.4 ) = 8

FACTDOUBLE function

This function returns the double factorial of a number. If number is not an integer, it is truncated.
Double factorial (or semifactorial) of a number n, denoted by n!!, is the product of all the integers from 1 up to n that have the same parity as n.
For even n, the double factorial is n!!=n(n-2)(n-4)…(4)(2), and for odd n, the double factorial is n!! = n(n-2)(n-4)…(3)(1).

Here is a SAS code example using the factdouble() Excel function:

options cmplib=SASHELP.SLKWXL;
data _null_;
   n = 6;
   m = 7;
   nn = factdouble_slk(n);
   mm = factdouble_slk(m);
   put n '!! = ' nn / m '!! = ' mm;
run;

It will produce the following SAS log:
6 !! = 48
7 !! = 105

Indeed, 6!! = 2 x 4 x 6 = 48 and 7!! = 1 x 3 x 5 x 7 = 105.

PRODUCT function

This function multiplies all elements of SAS numeric array given as its argument and returns the product:

options cmplib=SASHELP.SLKWXL;
data _null_;
   array x x1-x5 (5, 7, 1, 2, 2);
   p = product_slk(x);
   put 'x = ( ' x1-x5 ')';
   put 'product(x) = ' p;
run;

SAS log:
x = ( 5 7 1 2 2 )
product(x) = 140

Indeed 5*7*1*2*2 = 140.

MULTINOMIAL function

This function returns the ratio of the factorial of a sum of values to the product of factorials:

MULTINOMIAL(a1, a2, ... , an) = (a1 + a2 + ... + an)! : (a1! a2! ... an!)

In SAS, the argument to this function is specified as numeric array name:

options cmplib=SASHELP.SLKWXL;
data _null_;
   array a a1-a3 (1, 3, 2);
   m = multinomial_slk(a);
   put 'a = ( ' a1-a3 ')';
   put 'multinomial(a) = ' m;
run;

SAS log:
a = ( 1 3 2 )
multinomial(a) = 60

Indeed (1+3+2)!  :  (1! + 3! + 2!) = 720 : 12 = 60.

Other Microsoft Excel functions available in SAS

You can explore other Excel functions available in SAS via SASHELP.SLKWXL user-defined functions by cross-referencing them with the corresponding Microsoft Excel functions documentation (alphabetical or by categories) As you can see in the above List of Excel functions available in SAS, besides mathematical functions exemplified in the previous section, there are also many Excel financial functions related to securities trading that are made available in SAS.

Additional Resources on SAS user-defined functions

Your thoughts?

Have you found this blog post useful? Please share your use cases, thoughts and feedback in the comments below.

Using Microsoft Excel functions in SAS was published on SAS Users.

4月 212018
 

Have you ever been working in the macro facility and needed a macro function, but you could not locate one that would achieve your task? With the %SYSFUNC macro function, you can access most SAS® functions. In this blog post, I demonstrate how %SYSFUNC can help in your programming needs when a macro function might not exist. I also illustrate the formatting feature that is built in to %SYSFUNC. %SYSFUNC also has a counterpart called %QSYSFUNC that masks the returned value, in case special characters are returned.
%SYSFUNC enables the execution of SAS functions and user-written functions, such as those created with the FCMP procedure. Within the DATA step, arguments to the functions require quotation marks, but because %SYSFUNC is a macro function, you do not enclose the arguments in quotation marks. The examples here demonstrate this.

%SYSFUNC has two possible arguments. The first argument is the SAS function, and the second argument (which is optional) is the format to be applied to the value returned from the function. Suppose you had a report and within the title you wanted to issue today’s date in word format:

   title "Today is %sysfunc(today(),worddate20.)";

The title appears like this:

   "Today is               July 4, 2018"

Because the date is right-justified, there are leading blanks before the date. In this case, you need to introduce another function to remove the blank spaces. Luckily %SYSFUNC enables the nesting of functions, but each function that you use must have its own associated %SYSFUNC. You can rewrite the above example by adding the STRIP function to remove any leading or trailing blanks in the value:

   title "Today is %sysfunc(strip(%sysfunc(today(),worddate20.)))";

The title now appears like this:

    "Today is July 4, 2018"

The important thing to notice is the use of two separate functions. Each function is contained within its own %SYSFUNC.

Suppose you had a macro variable that contained blank spaces and you wanted to remove them. There is no macro COMPRESS function that removes all blanks. However, with %SYSFUNC, you have access to one. Here is an example:

   %let list=a    b    c; 
   %put %sysfunc(compress(&list));

The value that is written to the log is as follows:

   abc

In this last example, I use %SYSFUNC to work with SAS functions where macro functions do not exist.

The example checks to see whether an external file is empty. It uses the following SAS functions: FILEEXIST, FILENAME, FOPEN, FREAD, FGET, and FCLOSE. There are other ways to accomplish this task, but this example illustrates the use of SAS functions within %SYSFUNC.

   %macro test(outf);
   %let filrf=myfile;
 
   /* The FILEEXIST function returns a 1 if the file exists; else, a 0
   is returned. The macro variable &OUTF resolves to the filename
   that is passed into the macro. This function is used to determine
   whether the file exists. In this case you want to find the file
   that is contained within &OUTF. Notice that there are no quotation
   marks around the argument, as you will see in all cases below. If
   the condition is false, the %ELSE portion is executed, and a
   message is written to the log stating that the file does not
   exist.*/
 
   %if %sysfunc(fileexist(&outf)) %then %do;
 
   /* The FILENAME function returns 0 if the operation was successful; 
   else, a nonzero is returned. This function can assign a fileref
   for the external file that is located in the &OUTF macro 
   variable. */
 
   %let rc=%sysfunc(filename(filrf,&outf));
 
   /* The FOPEN function returns 0 if the file could not be opened; 
   else, a nonzero is returned. This function is used to open the
   external file that is associated with the fileref from &FILRF. */
 
   %let fid=%sysfunc(fopen(&filrf));
 
   /* The %IF macro checks to see whether &FID has a value greater
   than zero, which means that the file opened successfully. If the
   condition is true, we begin to read the data in the file. */
 
   %if &fid > 0 %then %do;
 
   /* The FREAD function returns 0 if the read was successful; else, a
   nonzero is returned. This function is used to read a record from
   the file that is contained within &FID. */
 
   %let rc=%sysfunc(fread(&fid));
 
   /* The FGET function returns a 0 if the operation was successful. A
   returned value of -1 is issued if there are no more records
   available. This function is used to copy data from the file data 
   buffer and place it into the macro variable, specified as the
   second argument in the function. In this case, the macro variable
   is MYSTRING. */   
 
   %let rc=%sysfunc(fget(&fid,mystring));
 
   /* If the read was successful, the log will write out the value
   that is contained within &MYSTRING. If nothing is returned, the
   %ELSE portion is executed. */
 
   %if &rc = 0 %then %put &mystring;
   %else %put file is empty;
 
   /* The FCLOSE function returns a 0 if the operation was successful;
   else, a nonzero value is returned. This function is used to close
   the file that was referenced in the FOPEN function. */
 
   %let rc=%sysfunc(fclose(&fid));
   %end;
 
   /* The FILENAME function is used here to deassign the fileref 
   FILRF. */
 
   %let rc=%sysfunc(filename(filrf));
   %end;
   %else %put file does not exist;
   %mend test;
   %test(c:\testfile.txt)

There are times when the value that is returned from the function used with %SYSFUNC contains special characters. Those characters then need to be masked. This can be done easily by using %SYSFUNC’s counterpart, %QSYSFUNC. Suppose we run the following example:

   %macro test(dte);
   %put &dte;
   %mend test;
 
   %test(%sysfunc(today(), worddate20.))

The above code would generate an error in the log, similar to the following:

   1  %macro test(dte);
   2  %put &dte;
   3  %mend test;
   4
   5  %test(%sysfunc(today(), worddate20.))
   MLOGIC(TEST):  Beginning execution.
   MLOGIC(TEST):  Parameter DTE has value July 20
   ERROR: More positional parameters found than defined.
   MLOGIC(TEST):  Ending execution.

The WORDDATE format would return the value like this: July 20, 2017. The comma, to a parameter list, represents a delimiter, so this macro call is pushing two positional parameters. However, the definition contains only one positional parameter. Therefore, an error is generated. To correct this problem, you can rewrite the macro invocation in the following way:

   %test(%qsysfunc(today(), worddate20.))

The %QSYSFUNC macro function masks the comma in the returned value so that it is seen as text rather than as a delimiter.

For a list of the functions that are not available with %SYSFUNC, see the “How to expand the number of available SAS functions within the macro language was published on SAS Users.

4月 052018
 

Sir Tim Berners-Lee is famous for inventing the World Wide Web and for the construction of URLs -- a piece of syntax that every 8-year-old is now familiar with. According to the lore, when Sir Tim invented URLs he did not imagine that Internet surfers of all ages and backgrounds would be expected to type these cryptic schemes into their browser windows...but here we are. Today's young people can navigate the Web with URLs in the same way that migratory birds can find their way South for the winter -- by pure instinct.

URLs are syntax, the language of Web navigation. And HTML is syntax too, the language of the Web page. Each of these represent instructions to our web browser, telling it to how to go somewhere or display something. As we navigate the web programmatically, it is sometimes necessary to encode information in a URL or in HTML in a way that it won't be mistakenly interpreted as syntax. And of course, SAS has some functions for that.

Here's a table with links to the SAS documentation for these functions. You'll find they are intuitively named. The names are the same or similar to corresponding functions in other programming languages -- you can get only so creative with basic functions like these.

HTMLDECODE Function   Decodes a string that contains HTML numeric character references or HTML character entity references, and returns the decoded string.
HTMLENCODE Function   Encodes characters using HTML character entity references, and returns the encoded string.
URLDECODE Function   Returns a string that was decoded using the URL escape syntax.
URLENCODE Function   Returns a string that was encoded using the URL escape syntax.

Of these four functions, I use URLENCODE the most often. I need it when I need to pass syntax for instructions to a REST API, in which the API call itself is a URL. Here's an example from my paper about accessing Google Analytics with SAS:

%let workdate='01Oct2017'd;
%let urldate=%sysfunc(putn(&workdate.,yymmdd10.));
%let metrics=%sysfunc(urlencode(%str(ga:pageviews,ga:sessions)));
%let id=%sysfunc(urlencode(%str(ga:XXXXXX)));
filename garesp temp;
proc http
  url="https://www.googleapis.com/analytics/v3/data/ga?ids=&id.%str(&)start-date=&urldate.%str(&)end-date=&urldate.%str(&)metrics=&metrics.%str(&)max-results=20000"
  method="GET" out=garesp;
  headers 
    "Authorization"="Bearer &access_token."
    "client-id:"="&client_id.";
 run;

In the previous example, I need to include "ga:pageviews,ga:sessions" as an instruction on the Google Analytics API, but the colon character has special meaning within the URL syntax. I need to "escape" the colon character so that the URL parser ignores it and simply passes the value to the API. The URLENCODE function converts this segment to "ga%3Apageviews,ga%3Asessions". The colon has been replaced by its hexadecimal code, set off by a percent sign: %3A.

An aside: the percent (%) and ampersand (&) characters have special meaning in URLs. They also have special meaning in the SAS macro language. We can claim that the SAS macro language preceded URL syntax by decades, but there are only so many characters on the keyboard that syntax designers can use to set off code instructions. I use the

HTMLENCODE is useful when you need to represent HTML syntax in your output, but prevent the web browser from interpreting the HTML as code. Here's a simple example.

filename out temp;
ods html5 file=out;
data link;
  site = "sas.com";
  link = "<a href='https://www.sas.com'>sas.com</a>";
  code_as = htmlencode(link);
run;
 
proc print data=link;
run;
ods html5 close;

When produced using the HTML5 destination, the link variable is formatted as a live link, while the code_as variable shows the syntax that went into it.

As you might expect, the URLDECODE function "unescapes" the URL hex characters and restores the original URL syntax. HTMLDECODE does the same for HTML content. If you are writing code that implements an API endpoint in SAS (as you might do with a SAS stored process on the back end of a web service), you'll find these functions useful to unpack the information that was encoded on an API call.

HTMLENCODE and URLENCODE are not interchangeable. More than once, I have written programs that mistakenly use HTMLENCODE when it was URLENCODE that was needed. Those mistakes can be tricky to debug, so pay attention!

Cover image by Fabio Lanari, Internet2, CC BY-SA 4.0

The post SAS functions to encode and decode data for the Web appeared first on The SAS Dummy.

11月 022017
 

The purpose of this blog post is to demonstrate a SAS coding technique that allows for calculations with multiple variables across a SAS dataset, whether or not their values belong to the same or different observations.

calculations across observations of a SAS data table

What do we want?

As illustrated in the picture on the right, we want to be able to hop, or jump, back and forth, up and down across observations of a data table in order to implement calculations not just on different variables, but with their values from different observations of a data table.

In essence, we want to access SAS dataset variable values similar to accessing elements of a matrix (aij), where rows represent dataset observations, and columns represent dataset variables.

Combine and Conquer

In the spirit of my earlier post Combine and Conquer with SAS, this technique combines the functionality of the LAG function, which allows us to retrieve the variable value of a previous observation from a queue, with an imaginary, non-existent in SAS, LEAD function that reads a subsequent observation in a data set while processing the current observation during the same iteration of the DATA step.

LAG function

LAG<n> function in SAS is not your usual, ordinary function. While it provides a mechanism of retrieving previous observations in a data table, it does not work “on demand” to arbitrarily read a variable value from a previous observation n steps back. If you want to use it conditionally in some observations of a data step, you still need to call it in every iteration of that data step. That is because it retrieves values from a queue that is built sequentially for each invocation of the LAG<n> function. In essence, in order to use the LAG function even just once in a data step, you need to call it every time in each data step iteration until that single use.

Moreover, if you need to use each of the LAG1, LAG2, . . . LAGn functions just once, in order to build these queues, you have to call each of them in every data step iteration even if you are going to use them in some subsequent iterations.

LEAD function

The LEAD function is implemented in Oracle SQL and it returns data from the next or subsequent row of a data table. It allows you to query more than one row in a table at a time without having to join the table to itself.

There is no such function in SAS. However, the POINT= option of the SET statement in a SAS data step allows retrieving any observation by its number from a data set using random (direct) access to read a SAS data set. This will allow us to simulate a LEAD function in SAS.

HOP function

But why do we need two separate functions like LAG and LEAD in order to retrieve non-current observations. In essence, these two functions do the same thing, just in opposite directions. Why can’t we get by with just one function that does both backwards and forwards “hopping?”

Let’s combine and conquer.

Ideally, we would like to construct a new single function - let’s call it HOP(x, j) - that combines the best qualities of both LAG and LEAD functions. The two arguments of the HOP function would be as follows:

x – SAS variable name (numeric or character) the value of we are retrieving;

j – hop distance (numeric) – an offset from the current observation; negative values being lagging (hopping back), positive values being leading (hopping forward), and a zero-value meaning staying within the current observation.

The sign of the second argument defines whether we lag (minus) or lead (plus). The absolute value of this second argument defines how far from the current observation we hop.

Alternatively, we could have the first argument as a column number, and the second argument as a row/observation number, if we wanted this function to deal with the data table more like with a matrix. But relatively speaking, the method doesn’t really matter as long as we can unambiguously identify a data element or a cell. To stay within the data step paradigm, we will stick with the variable name and offset from the current observation (_n_) as arguments.

Let’s say we have a data table SAMPLE, where for each event FAIL_FLAG=1 we want to calculate DELTA as the difference between DATE_OUT, one observation after the event, and DATE_IN, two observations before the event:

Calculations across observations of a SAS data table

That is, we want to calculate DELTA in the observation where FAIL_FLAG = 1 as

26MAR2017 18JAN2017 = 67 (as shown in light-blue highlighting in the above figure).

With the HOP() function, that calculation in the data step would look like this:

data SAMPLE;
   set SAMPLE;
   if FAIL_FLAG then DELTA = hop(DATE_OUT,1) - hop(DATE_IN,-2);
run;

It would be reasonable to suggest that the hop() function should return a missing value when the second argument produces an observation number outside of the dataset boundary, that is when

_n_ + j < 0 or _n_ + j > num, where _n_ is the current observation number of the data step iteration; num is the number of observations in the dataset; j is the offset argument value.

Do you see anything wrong with this solution? I don’t. Except that the HOP function exists only in my imagination. Hopefully, it will be implemented soon if enough SAS users ask for it. But until then, we can use its surrogate in the form of a %HOP macro.

%HOP macro

The HOP macro grabs the value of a specified variable in an offset observation relative to the current observation and assigns it to another variable. It is used within a SAS data step, but it cannot be used in an expression; each invocation of the HOP macro can only grab one value of a variable across observations and assign it to another variable within the current observation.

If you need to build an expression to do calculations with several variables from various observations, you would need to first retrieve all those values by invoking the %hop macro as many times as the number of the values involved in the expression.

Here is the syntax of the %HOP macro, which has four required parameters:

%HOP(d,x,y,j)

d – input data table name;

x – source variable name;

y – target variable name;

j – integer offset relative to the current observation. As before, a negative value means a previous observation, a positive value means a subsequent observation, and zero means the current observation.

Using this %HOP macro we can rewrite our code for calculating DELTA as follows:

 data SAMPLE (drop=TEMP1 TEMP2);
   set SAMPLE;
   if FAIL_FLAG then
   do;
      %hop(SAMPLE,DATE_OUT,TEMP1, 1)
      %hop(SAMPLE,DATE_IN, TEMP2,-2)
      DELTA = TEMP1 - TEMP2;
   end;
run;

Note that we should not have temporary variables TEMP1 and TEMP2 listed in a RETAIN statement, as this could mess up our calculations if the j-offset throws an observation number out of the dataset boundary.

Also, the input data table name (d parameter value) is the one that is specified in the SET statement, which may or may not be the same as the name specified in the DATA statement.

In case you are wondering where you can download the %HOP macro from, here it is in its entirety:

%macro hop(d,x,y,j);
   _p_ = _n_ + &j;
   if (1 le _p_ le _o_) then set &d(keep=&x rename=(&x=&y)) point=_p_ nobs=_o_;
%mend hop;

Of course, it is “free of charge” and “as is” for your unlimited use.

Your turn

Please provide your feedback and share possible use cases for the HOP function/macro in the Comment section below. This is your chance for your voice to be heard!

Hopping for the best - calculations across SAS dataset observations was published on SAS Users.

5月 082017
 

In his recent article Perceptions of probability, Rick Wicklin explores how vague statements about "likeliness" translate into probabilities that we can express numerically. It's a fun, informative post -- I recommend it! You'll "Almost Certainly" enjoy it.

To prepare the article, Rick first had to download the source data from the study he cited. The data was shared as a CSV file on GitHub. Rick also had to rename the variables (column names) from the data table so that they are easier to code within SAS. Traditionally, SAS variable names must adhere to a few common programming rules: they must be alphanumeric, begin with a letter, and contain no spaces or special characters. The complete rules are documented in the this method for reading data from a cloud service like DropBox and GitHub. It's still my favorite technique for reading data from the Internet. You'll find lots of papers and examples that use FILENAME URL for the same job in fewer lines of code, but PROC HTTP is more robust. It runs faster, and it allows you to separate the step of fetching the file from the subsequent steps of processing that file.

You can see the contents of the CSV file at this friendly URL: https://github.com/zonination/perceptions/blob/master/probly.csv. But that's not the URL that I need for PROC HTTP or any programmatic access. To download the file via a script, I need the "Raw" file URL, which I can access via the Raw button on the GitHub page.

GitHub preview

In this case, that's https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv. Here's the PROC HTTP step to download the CSV file into a temporary fileref.

/* Fetch the file from the web site */
filename probly temp;
proc http
 url="https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv"
 method="GET"
 out=probly;
run;

A note for SAS University Edition users: this step won't work for you, as the free software does not support access to secure (HTTPS) sites. You'll have to manually download the file via your browser and then continue with the remaining steps.

Step 2. Import the data into SAS with PROC IMPORT

SAS can process data with nonstandard variable names, including names that contain spaces and special characters. You simply have to use the VALIDVARNAME= system option to put SAS into the right mode (oops, almost wrote "mood" there but it's sort of the same thing).

With 'crime against nature'n.)

For this step, I'll set VALIDVARNAME=ANY to allow PROC IMPORT to retain the original column names from the CSV file. The same trick would work if I was importing from an Excel file, or any other data source that was a little more liberal in its naming rules.

/* Tell SAS to allow "nonstandard" names */
options validvarname=any;
 
/* import to a SAS data set */
proc import
  file=probly
  out=work.probly replace
  dbms=csv;
run;

Step 3. Create RENAME and LABEL statements with PROC SQL

This is one of my favorite SAS tricks. You can use PROC SQL SELECT INTO to create SAS programming statements for you, based on the data you're processing. Using this technique, I can build the parts of the LABEL statement and the RENAME statement dynamically, without knowing the variable names ahead of time.

The LABEL statement is simple. I'm going to build a series of assignments that look like this:

  'original variable name'n = 'original variable name'

I used the SELECT INTO clause to build a label assignment for each variable name. I used the CAT function to assemble the label assignment piece-by-piece, including the special literal syntax, the variable name, the assignment operator, and the label value within quotes. I'm fetching the variable names from SASHELP.VCOLUMN, one of the built-in dictionary tables that SAS provides to surface table and column metadata.

  select cat("'",trim(name),"'n","=","'",trim(name),"'") 
     into :labelStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK";

Here's part of the value of &labelStmt:

'Almost Certainly'n='Almost Certainly' 
'Highly Likely'n='Highly Likely' 
'Very Good Chance'n='Very Good Chance' 
'Probable'n='Probable' 
'Likely'n='Likely' 
'Probably'n='Probably' 
'We Believe'n='We Believe' 

The RENAME statement is a little trickier, because I have to calculate a new valid variable name. For this specific data source that's easy, because the only SAS "rule" that these column names violate is the ban on space characters. I can create a new name by using the COMPRESS function to remove the spaces. To be a little safer, I used the "kn" modifier on the COMPRESS function to keep only English letters, numbers, and underscores. That should cover all cases except for variable names that are too long (greater than 32 characters) or that begin with a number (or that don't contain any valid characters to begin with).

Some of the column names are one-word names that are already valid. If I include those in the RENAME statement, SAS will generate an error (you cannot "rename" a variable to its current name). I used the

/* Generate new names to comply with SAS rules.                          */
/* Assumes names contain spaces, and can fix with COMPRESS               */
/* Other deviations (like special chars, names that start with a number) */
/* would need different adjustments                                      */
/* NVALID() function can check that a name is a valid V7 name           */
proc sql noprint;
 
  /* retain original names as labels */
  select cat("'",trim(name),"'n","=","'",trim(name),"'") 
     into :labelStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK";
 
  select cat("'",trim(name),"'n","=",compress(name,,'kn')) 
     into :renameStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK"
  /* exclude those varnames that are already valid */
  AND not NVALID(trim(name),'V7');
quit;

Step 4. Modify the data set with new names and labels using PROC DATASETS

With the body of the LABEL and RENAME statements built, it's time to plug them into a PROC DATASETS step. PROC DATASETS can change data set attributes such as variable names, labels, and formats without requiring a complete rewrite of the data -- it's a very efficient operation.

I include the LABEL statement first, since it references the original variable names. Then I include the RENAME statement, which changes the variable names to their new V7-compliant values.

Finally, I reset the VALIDVARNAME= option to the normal V7 sanity. (Unless you're running in SAS Enterprise Guide, in which case the option is already set to ANY by default. Check this blog post for a less disruptive method of setting/restoring options.)

proc datasets lib=work nolist ;
  modify probly / memtype=data;
  label &labelStmt.;
  rename &renameStmt.;
  /* optional: report on the var names/labels */
  contents data=probly nodetails;
quit;
 
/* reset back to the old rules */
options validvarname=v7;

Here's the CONTENTS output from the PROC DATASETS step, which shows the final variable attributes. I now have easy-to-code variable names, and they still have their descriptive labels. My data dictionary dreams are coming true!

DATASETS rename output

Download the entire program example from my public Gist: import_renameV7.sas.

The post How to download and convert CSV files for use in SAS appeared first on The SAS Dummy.