Tech

1月 132022
 

3 ways of calculating length of overlapping of two integer intervalsIn this post, we tackle the problem of calculating the length of overlap of two date/time intervals. For example, we might have two events — each lasting several days — and need to calculate the number of shared days between both events. In other words, we need to calculate the length of the events’ overlap.

Such tasks are typical in clinical trials programming, health (and non-health) insurance claim processing, project management, planning and scheduling, etc.

The length of the overlap can be measured not only in days, but also in any other units of the date/time dimension: hours, minutes, seconds, weeks, months, years and so on. Moreover, the date/time application is just one special case of a broader task of calculating the overlap length of two integer intervals. An integer interval [x .. y] is a set of all consecutive integer numbers beginning with x and ending with y (boundaries included and x ≤ y).

Instead of suggesting a single “best” way of solving this problem, I offer three different strategies. This allows you to compare and weigh their pros and cons and decide for yourself which approach is most suitable for your circumstances. All three solutions presented in this blog post are applicable to the date/time use case, as well as its superset of integer intervals’ overlap length calculation.

Problem description

Suppose we have two events A [A1 .. A2] and B [B1 .. B2]. Event A lasts from date A1 until date A2 (A1 ≤ A2), and event B lasts from date B1 until date B2 (B1 ≤ B2).  Both starting and ending dates are included in the corresponding event intervals.

We need to calculate the overlap between these two events, defined as the number of days that belong to both events A and B.

Overlapping intervals: sample data

Before solving the problem, let’s create sample data to which we are going to apply our possible solutions.

The following data step creates data set EVENTS representing such a data sample:

data EVENTS;
   input A1 A2 B1 B2;
   format A1 A2 B1 B2 date9.;
   informat A1 A2 B1 B2 mmddyy10.;
   lines;
01/02/2022 01/05/2022 01/06/2022 01/10/2022
01/22/2022 01/30/2022 01/16/2022 01/18/2022
01/02/2022 01/05/2022 01/03/2022 01/10/2022
01/02/2022 01/05/2022 01/03/2022 01/04/2022
01/10/2022 01/15/2022 01/06/2022 01/14/2022
01/01/2022 01/05/2022 01/05/2022 01/09/2022
01/07/2022 01/13/2022 01/10/2022 01/13/2022
;

Now let’s go over the following three distinct possible solutions, each with its own merit and downside. At the end, you decide which solution you like the most.

Solution 1: Brute force

In general, brute force algorithms solve problems by exhaustive iteration through all possible choices until a solution is found. It rarely results in a clever, efficient solution. However, it is a straightforward, valid and important algorithm design strategy that is applicable to a wide variety of problems.

For integer intervals overlap calculation, a brute force approach would involve the following steps:

  1. Determine the earliest date in the definitions of our two intervals
  2. Determine the latest date in the definitions of our two intervals
  3. Create an iterative loop with index variable ranging from the earliest to the latest date
  4. Within this loop, increment a counter by one if index variable falls within both intervals
  5. At the end of this loop, the counter will equal the desired value for the overlap.

Here is the code implementation of the described above algorithm:

data RESULTS;
   set EVENTS;
   OVERLAP = 0;
   do i=min(A1,B1) to max(A2,B2);
      if (A1<=i<=A2) and (B1<=i<=B2) then OVERLAP + 1;
   end;
run;

As you can see, the code implementation of the brute force solution is quite simple. However, it is hardly efficient as it involves extensive looping.

Solution 2: Exhaustive logic

Exhaustive logic algorithms solve problems by splitting them into several smaller problems (logical units or classes). This way, instead of working on a problem in its entirety, you can work separately on smaller and simpler problems that are easier to comprehend, digest and solve.

Here is an illustration of various arrangements of integer intervals representing separate logical units (numbered 1...5) when calculating the intervals overlaps:

Different arrangements of date/time integer intervals

The following SAS code demonstrates an implementation of the exhaustive logic algorithm (each IF-THEN statement corresponds to the numbered logical unit in the above figure):

data RESULTS;
   set EVENTS;
   if A1<=B1 and A2>=B2    then OVERLAP = B2 - B1 + 1; else
   if A1>=B1 and A2<=B2    then OVERLAP = A2 - A1 + 1; else
   if A1<=B1<=A2 and B2>A2 then OVERLAP = A2 - B1 + 1; else
   if B1<=A1<=B2 and A2>B2 then OVERLAP = B2 - A1 + 1; else
   if B1>A2 or A1>B2       then OVERLAP = 0;
run;

Notice that we need to add 1 to each of the two days difference. That is because the number of days spanned is always 1 less than the number of days contained in an interval. Here is an illustration:

Days spanned vs. number of days

Solution 3: Holistic Math

Finally, a holistic math method looks at the problem in its entirety and condenses it to a comprehensive mathematical formula. While it is usually not the most obvious approach, and often requires some brain stretching and internalization, the effort is often rewarded by achieving the best possible solution.

The exhaustive logic solution may help in better understanding the problem and arriving at the generalized mathematical formula.

If you dwell on the exhaustive logic solution for a while, you may come to realization that all the variety of the logical units boils down to the following single simple formula:

Overlap = min(A2, B2) - max(A1, B1) + 1.

In other words, the overlap of two integer intervals is a difference between the minimum value of the two upper boundaries and the maximum value of the two lower boundaries, plus 1.

Positive OVERLAP value represents actual overlap days, while zero or negative value means that intervals do not overlap at all. The higher absolute value of the negative result – the farther apart the intervals are.  To correct this “negativity” effect we can just set all negative values to zero.

The following SAS code implements the described holistic math approach:

data RESULTS;
   set EVENTS;
   OVERLAP = min(A2,B2) - max(A1, B1) + 1;
   if OVERLAP<0 then OVERLAP = 0;
run;

As you can see, this in the most concise, elegant, and at the same time the most efficient solution with no iterative looping and complicated exhaustive (and exhausting) logic.

Output

Here is the output data table RESULTS showing our sample intervals (columns A1, A2, B1 and B2) and the resulting column OVERLAP (all three solutions described above should produce identical OVERLAP values):

Output data set with calculated days overlaps

Questions? Thoughts? Comments?

Which of the presented here three methods do you like most? Which one would you prefer under different circumstances? Can you come up with yet another way of calculating date intervals overlaps? Do you have questions, concerns, comments? Please share with us below.

Additional resources

Calculating the overlap of date/time intervals was published on SAS Users.

1月 112022
 

Last year, I wrote a blog demonstrating how to use the %Auto_Outliers macro to automatically identify possible data errors. This blog demonstrates a different approach—one that is useful for variables for which you can identify reasonable ranges of values for each variable. For example, you would not expect resting heart rates below 40 or over 100 or adult heights below 45 inches or above 84 inches. Although values outside those ranges might be valid values, it would be a good idea to check those out-of-range values to see if they are real or data errors.

In the third edition of my book, Cody's Data Cleaning Techniques, I present two macros: %Errors and %Report. These two macros provide a consolidated error report.

To demonstrate these two macros, I created a data set called Patients. A listing of the first 10 observations is shown below:

Notice that the unique identifier is called Patno, and you see three variables HR (heart rate), SBP (systolic blood pressure), and DBP (diastolic blood pressure).

The calling arguments for the %Errors macro are:

  • VAR=     A variable for which you have pre-defined bounds
  • Low=     The lowest reasonable value for this variable
  • High=    The highest reasonable value for this variable
  • Missing=Error or Missing=Ignore. The default for this argument is IGNORE, but it is still good practice to include it in the call so that anyone reading your program understands how missing values are being handled.

Because you might be calling this macro for many variables, the values of two macro variables &Dsn (data set name) and &IDVar (the identifying variable such as Patno or Subj) are assigned values once, using two %Let statements. You can then call the %Errors macro for each variable of interest. When you are finished, call the %Report macro to see a consolidated report of your possible errors.

Here is an example:

/* Set values for &amp;Dsn and %IDVar with %LET statements */
%let Dsn = Clean.Patients;   
%let Idvar = Patno;
 
%Errors(Var=HR, Low=40, High=100, Missing=error)
%Errors(Var=SBP, Low=80, High=200, Missing=ignore)
%Errors(Var=DBP, Low=60, High=120)   
 
/* When you are finished selecting variables, create the report */
%Report

You are reporting all heart rates below 40 or above 100 (and considering missing values as errors); values of SBP below 80 or above 200 (and ignoring missing values); and values of DBP below 60 or above 120 (also ignoring missing values—using the default value of IGNORE).

Here is the result:

Notice that several patients with missing values for HR are flagged as errors. I should point out that I have violated a cardinal rule of macro programming: never write a macro that has no arguments—it should be a program. However, I liked the idea of calling %Errors and then %Report. Shown below are the two macros:

/****************************************************************
| PROGRAM NAME: ERRORS.SAS  in c:\Books\Cleans\Patients         |
| PURPOSE: Accumulates errors for numeric variables in a SAS    |
|          data set for later reporting.                        |
|          This macro can be called several times with a        |
|          different variable each time. The resulting errors   |
|          are accumulated in a temporary SAS data set called   |
|          Errors.                                              |
| ARGUMENTS: Dsn=    - SAS data set name (assigned with a %LET) |
|            Idvar=  - Id variable (assigned with a %LET)       |
|                                                               |
|            Var     = The variable name to test                |
|            Low     = Lowest valid value                       |
|            High    = Highest valid value                      |
|            Missing = IGNORE (default) Ignore missing values   |
|                      ERROR Missing values flagged as errors   |
|                                                               |
| EXAMPLE: %let Dsn = Clean.Patients;                           |
|          %let Idvar = Patno;                                  |
|                                                               |
|          %Errors(Var=HR, Low=40, High=100, Missing=error)     |
|          %Errors(Var=SBP, Low=80, High=200, Missing=ignore)   |
|          %Errors(Var=DBP, Low=60, High=120)                   |
|          Test the numeric variables HR, SBP, and DBP in data  |
|          set Clean.patients for data outside the ranges       |
|          40 to 100, 80 to 200, and 60 to 120 respectively.    |
|          The ID variable is PATNO and missing values are to   |
|          be flagged as invalid for HR but not for SBP or DBP. |
****************************************************************/
%macro Errors(Var=,    /* Variable to test     */
              Low=,    /* Low value            */
              High=,   /* High value           */
              Missing=IGNORE 
                       /* How to treat missing values         */
                       /* Ignore is the default.  To flag     */
                       /* missing values as errors set        */
                       /* Missing=error                       */);
data Tmp;
   set &amp;Dsn(keep=&amp;Idvar &amp;Var);
   length Reason $ 10 Variable $ 32;
   Variable = "&amp;Var";
   Value = &amp;Var;
   if &amp;Var lt &amp;Low and not missing(&amp;Var) then do;
      Reason='Low';
      output;
   end;
 
   %if %upcase(&amp;Missing) ne IGNORE %then %do;
      else if missing(&amp;Var) then do;
         Reason='Missing';
         output;
      end;
   %end;
 
   else if &amp;Var gt &amp;High then do;
      Reason='High';
      output;
      end;
      drop &amp;Var;
   run;
 
   proc append base=Errors data=Tmp;
   run;
 
%mend Errors;

The basic idea for the %Errors macro is to test each variable and, if it is a possible error, use PROC APPEND to add it to a data set called Errors. When the first error is detected, PROC APPEND creates the data set Errors. From then on, each observation in data set Tmp is added to data set Errors.

Most of this macro is straightforward. For those readers who are not that comfortable with macro programming, the third section (beginning with %if %upcase(&Missing)) is executed only when the value of the macro variable &Missing is not equal to IGNORE.

Below is a listing of the %Report macro:

%macro Report;
   proc sort data=Errors;
      by &amp;Idvar;
   run;
 
   proc print data=Errors;
      title "Error Report for Data Set &amp;Dsn";
      id &amp;Idvar;
      var Variable Value Reason;
   run;
 
   proc delete data=Errors Tmp;
   run;
 
%mend Report;

The %Report macro is mainly a PROC PRINT of the temporary data set Errors. I added PROC DELETE to delete the two temporary data sets Error and Tmp.

You can cut and paste these macros, or you can download all of the macros, programs, and data sets from Cody's Data Cleaning Techniques Using SAS®, Third Edition, by going to support.sas.com/Cody, search for the book, then click Example Code and Data. You do not have to buy the book to download all the files (although I would be delighted if you did). This is true for all of my books published by SAS Press.

Comments and/or corrections are always welcome.

Two macros for detecting data errors was published on SAS Users.

1月 072022
 

Welcome to the sixth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to work with the results. Now it's time to generate simple descriptive statistics of a CAS table.

Let's begin by confirming the cars table is loaded into memory. With a connection to CAS established, execute the tableInfo action to view available in-memory tables. If necessary, you can execute the following code in SAS Studio to load the sashelp.cars table into memory.

conn.tableinfo(caslib="casuser")

The results show the cars table is loaded into memory and available for processing. Next, reference the cars table in the variable tbl. Then use the print function to show the value of the variable.

tbl = conn.CASTable('cars', caslib='casuser')
print(tbl)
CASTable('cars', caslib='casuser')

The results show that the tbl variable references the cars table in the CAS server.

Preview the CAS Table

First things first. Remember, the SWAT package blends the world of Pandas and CAS into one. So you can begin with the traditional head method to preview the CAS table.

tbl.head()

The SWAT head method returns five rows from the CAS server to the client as expected.

The Describe Method

Next, let's retrieve descriptive statistics of all numeric columns by using the familiar describe method on the CAS table.

tbl.describe()

The SWAT describe method returns the same descriptive statistics as the Pandas describe method. The only difference is that the SWAT version uses the CAS API to convert the describe method into CAS actions behind the scenes to process the data on the distributed CAS server. CAS processes the data and returns summarized results back to the client as a SASDataFrame, which is a subclass of the Pandas DataFrame. You can now work with the results as you would a Pandas DataFrame.

Summary CAS Action

Instead of using the familiar describe method, let's use a CAS action to do something similar. Here I'll use the summary CAS action.

tbl.summary()

Summary CAS Action

The results of the summary action return a CASResults object (Python dictionary) to the client. The CASResults object contains a single key named Summary with a SASDataFrame as the value. The SASDataFrame shows a variety of descriptive statistics.  While the summary action does not return exactly the same statistics as the describe method, it can provide additional insights into your data.

What if we don't want all the statistics for all of the data?

Selecting Columns and Summary Statistics with the Summary Action

Let's add additional parameters to the summary action. I'll add the inputs parameter to specify the columns to analyze in the CAS server.

tbl.summary(inputs = ['MPG_City','MPG_Highway'])

The results show only the MPG_City and MPG_Highway columns were analyzed.

Next, I'll use the subSet parameter to specify the summary statistics to produce. Here I'll obtain the MEAN, MIN and MAX.

tbl.summary(inputs = ['MPG_City','MPG_Highway'],
                       subSet = ['mean','min','max'])

The results processed only the MPG_City and MPG_Highway columns, and returned only the specified summary statistics to the client.

Creating a Calculated Column

Lastly, let's create a calculated column within the summary action. There are a variety of ways to do this. I like to add it as a parameter to the CASTable object. You can do that by specifying the tbl object, then computedVarsProgram parameter. Within computedVarsProgram you can use SAS assignment statements with most SAS functions. Here we will create a new column name MPG_Avg that takes the mean of MPG_City and MPG_Highway. Lastly, add the new column to the inputs parameter.

tbl.computedVarsProgram = 'MPG_Avg = mean(MPG_City, MPG_Highway);'
tbl.summary(inputs = ['MPG_City','MPG_Highway', 'MPG_Avg'],
                       subSet = ['mean','min','max'])

In the results I see the calculated column and requested summary statistics.

Summary

The SWAT package blends the world of Pandas and CAS. You can use many of the familiar Pandas methods within the SWAT package, or the flexible, highly optimized CAS actions like summary to easily obtain summary statistics of your data in the massively parallel processing CAS engine.

Additional and related resources

Getting Started with Python Integration to SAS® Viya® - Index
SWAT API Reference
CAS Action Documentation
SAS® Cloud Analytic Services: Fundamentals
SAS Scripting Wrapper for Analytics Transfer (SWAT)
CAS Action! - a series on fundamentals
Execute the following code in SAS Studio to load the sashelp.cars table into memory

Getting Started with Python Integration to SAS® Viya® - Part 6 - Descriptive Statistics was published on SAS Users.

1月 062022
 

This article was co-written by Nick Johnson, Product Marketing Manager, Microsoft Partnership. Check out his blog profile for more information.

As employees continue to adapt to the reality of remote work — collaborating with teams near and far — it is vital that organizations have the right collaboration and productivity tools at their fingertips to support teams remotely. With over 250 million monthly active users, Microsoft Teams has become the collaboration tool of choice for thousands of organizations, changing the way meetings are conducted and how teams access the documents and data that support their business operations.

SAS and Microsoft are partnering to inspire greater trust and confidence in every decision, by driving innovation and proven AI in the cloud. With a combined product roadmap, SAS and Microsoft are working tirelessly to improve offerings and connectivity between SAS Viya and Microsoft. That’s why we’re especially excited to announce SAS Conversation Designer is now generally available in Microsoft Teams.

Conversational AI enables humans to interact with machines using natural language – text or voice – and instantly get a human-like, intelligent response. And ChatOps – a way of collaborating that connects people with process, tools and automation into a transparent workflow – can enable your teams to work together on complex analytics processes without writing a single line of code. Conversational AI is creating new opportunities for finding insights in your data by simply asking a question in natural language to a chatbot.

Now, you can ask questions of your SAS and open-source data directly from the Microsoft Teams toolbar and share insights directly with your teammates without jumping between application interfaces. Chat-enabled analytics does the work for you by providing data, reports and visualizations through a chat interface in Microsoft Teams.

With SAS Conversation Designer in Teams you can:

    • Build and deploy a chatbot with ease using a low-code visual interface.
    • Get answers and complete tasks using SAS’ industry leading natural language processing.
    • Access data, reports and visualizations via chat – even run advanced analytics and AI.

Follow the quick start guide below to see how easy it is to build and deploy natural language chatbots in your Microsoft Teams environment.

Get started:

Step 1: To get started, log onto SAS Viya through a web browser using Azure AD for simplified access.

Step 2: SAS Conversation Designer’s visual interface is where you can build a chatbot. You can see where key words and phrases, intents, and dialog shown on a visual pipeline create a structure for the chatbot.

Step 3: Now that the critical elements of the chatbot are in place, the chatbot can be published and is ready for interaction.

Step 4: Let’s put this bot into Microsoft Teams. Gather the information within SAS Viya for configuring a manifest file, then enter the information into the App Studio in Microsoft Teams.

Step 5: Start a conversation! Your chatbot is ready to provide insights and accelerate collaboration for you and your colleagues.

Learn more

Looking for more on chatbots and our partnership with Microsoft? Check out the resources below:

SAS Conversation Designer

Microsoft Partnership

Make collaboration your superpower with SAS Conversation Designer in Microsoft Teams was published on SAS Users.

1月 062022
 

This article was co-written by Nick Johnson, Product Marketing Manager, Microsoft Partnership. Check out his blog profile for more information.

As employees continue to adapt to the reality of remote work — collaborating with teams near and far — it is vital that organizations have the right collaboration and productivity tools at their fingertips to support teams remotely. With over 250 million monthly active users, Microsoft Teams has become the collaboration tool of choice for thousands of organizations, changing the way meetings are conducted and how teams access the documents and data that support their business operations.

SAS and Microsoft are partnering to inspire greater trust and confidence in every decision, by driving innovation and proven AI in the cloud. With a combined product roadmap, SAS and Microsoft are working tirelessly to improve offerings and connectivity between SAS Viya and Microsoft. That’s why we’re especially excited to announce SAS Conversation Designer is now generally available in Microsoft Teams.

Conversational AI enables humans to interact with machines using natural language – text or voice – and instantly get a human-like, intelligent response. And ChatOps – a way of collaborating that connects people with process, tools and automation into a transparent workflow – can enable your teams to work together on complex analytics processes without writing a single line of code. Conversational AI is creating new opportunities for finding insights in your data by simply asking a question in natural language to a chatbot.

Now, you can ask questions of your SAS and open-source data directly from the Microsoft Teams toolbar and share insights directly with your teammates without jumping between application interfaces. Chat-enabled analytics does the work for you by providing data, reports and visualizations through a chat interface in Microsoft Teams.

With SAS Conversation Designer in Teams you can:

    • Build and deploy a chatbot with ease using a low-code visual interface.
    • Get answers and complete tasks using SAS’ industry leading natural language processing.
    • Access data, reports and visualizations via chat – even run advanced analytics and AI.

Follow the quick start guide below to see how easy it is to build and deploy natural language chatbots in your Microsoft Teams environment.

Get started:

Step 1: To get started, log onto SAS Viya through a web browser using Azure AD for simplified access.

Step 2: SAS Conversation Designer’s visual interface is where you can build a chatbot. You can see where key words and phrases, intents, and dialog shown on a visual pipeline create a structure for the chatbot.

Step 3: Now that the critical elements of the chatbot are in place, the chatbot can be published and is ready for interaction.

Step 4: Let’s put this bot into Microsoft Teams. Gather the information within SAS Viya for configuring a manifest file, then enter the information into the App Studio in Microsoft Teams.

Step 5: Start a conversation! Your chatbot is ready to provide insights and accelerate collaboration for you and your colleagues.

Learn more

Looking for more on chatbots and our partnership with Microsoft? Check out the resources below:

SAS Conversation Designer

Microsoft Partnership

Make collaboration your superpower with SAS Conversation Designer in Microsoft Teams was published on SAS Users.

12月 232021
 

As the leader in analytics, SAS provides a very rich set of SAS functions and CALL routines for programmers. There is a total of 693 uniquely named SAS functions besides the hundreds of SAS PROC steps. These SAS functions not only cover general functions such as character/string processing, mathematical, permutation & combination, probability, search and sorting, date and time, etc., but also includes a powerful domain-specific function set involving variable information and control, external files and system extensions, web services and tools, descriptive statistics, financial calculations, geodetic distance & ZIP code, Git and other fields. This article reviews the financial functions provided by the SAS system and demonstrates how to use them to do financial analysis and simplify your decisions. Functions with the same name can be invoked easily during programming with SAS DATA step, DS2, FedSQL and CAS environment, which will improve your programs' portability.

Overview

SAS provides 94 financial functions in total, which are mainly used to calculate financial values, such as interest, periodic payments, depreciation, and prices for European options on stocks and futures. The 53 financial functions conforming to industry standards are packaged in 1 general function called FINANCE. Each finance function has a unique string identifier for the first parameter of the FINANCE function. These functions have to be invoked via the FINANCE function. For example, we can use irr=finance("IRR", -400,100,200,300); to compute the internal rate of return (IRR) for a series of cash flows.

The 53 FINANCE functions are compatible with the 53 financial functions provided by Microsoft Excel for Office 365. In fact, the list of financial functions available in the latest Excel for Office 365 software contains 56 functions. Among the 3 extra functions, PDURATION and RRI are specific functions for Excel 2013, and STOCKHISTORY is a specific function for Office 365. These 53 financial functions are designed for financial calculation such as depreciation, maturation, accrued interest, net present value, periodic savings, internal rates of return, etc. They can be categorized as investment, depreciation, Internal rate of return, financial value, and yield calculations. The complete list of SAS FINANCE functions is shown in Table 1.

Category Subcategory Function
Investment Calculation [12] Future Value [2] FV, FVSCHEDULE
Payment [4] PMT, IPMT, PPMT, ISPMT
Present Value [3] PV, NPV, XNPV
Annual interest rate [2] EFFECT, NOMINAL
Number of periods [1] NPER
Depreciation [7] AMORDEGRC, AMORLINC, DB, DDB, SLN, SYD, VDB
Internal rate of return [4] IRR, RATE, MIRR, XIRR
Financial value and yield [30] Accrued/Cumulative Interest & Principal [4] ACCRINT, ACCRINTM, CUMIPMT, CUMPRINC
Coupon days/date and Number [6] COUPDAYBS, COUPDAYS, COUPDAYSNC, COUPNCD, COUPNUM, COUPPCD
Yield [6] INTRATE, ODDFYIELD, ODDLYIELD, YIELD, YIELDDISC, YIELDMAT
Treasury bill [3] TBILLEQ, TBILLPRICE, TBILLYIELD
Duration [2] DURATION, MDURATION
Security [7] DISC, ODDFPRICE, ODDLPRICE, PRICE, PRICEDISC, PRICEMAT, RECEIVED
Dollar Price [2] DOLLARDE, DOLLARFR

Table 1: 53 FINANCE XXX functions in one FINANCE function

Another 40 independently named SAS financial functions are used for investment or loan calculation, cash flow, depreciation, and pricing models. For example, we can use irr =irr(1,-400,100,200,300); to compute the internal rate of return (IRR) as a percentage for a series of cash flows.

There are 7 functions (IRR, NPV, PMT, PPMT, IPMT, CUMIPMT, CUMPRINC) whose names are the same as the industry standard FINANCE functions, but they have different sign-in results or even different calculation logic. For example, FINANCE PMT is used to calculate the periodic payment of an annuity, but the PMT function is used to calculate the periodic payment for a constant payment loan or the periodic savings for a future balance. Annuity and periodic savings are different in anchoring market interest rates, interest calculation, liquidity, and binding, so the same arguments return different results for them. The list is shown in Table 2.

Category Function
Investment/Loan [16] IRR/INTRR, NPV/NETPV, SAVING, SAVINGS,
NOMRATE, EFFRATE, COMPOUND,
PMT, PPMT, IPMT, CUMIPMT, CUMPRINC, MORT, TIMEVALUE
Depreciation [10] DACCDB, DACCDBSL, DACCSL, DACCSYD, DACCTAB
DEPDB, DEPDBSL, DEPSL, DEPSYD, DEPTAB
Cash flow [6] CONVX, CONVXP, DUR, DURP, PVP, YIELDP
Pricing Models [8] Stocks: BLKSHCLPRC/BLKSHPTPRC, GARKHCLPRC/GARKHPTPRC, MARGRCLPRC/MARGRPTPRC
Futures: BLACKCLPRC/BLACKPTPRC

Table 2: 40 uniquely named financial functions

To keep semantic consistency, these industry-standard financial functions usually have the same parameter names, and some of the most important parameters are:

  • Interest rate (rate): The interest rate or discount rate of an investment or loan.
  • Number of periods (nper): The total number of payment periods for investment or loan.
  • Payment (pmt): The regular payment that is made each period for investment or loan, the value cannot change over the life of the annuity. Typically, it contains principal and interest, but not other fees and taxes.
  • Present value (pv): The current value of an investment or loan. The present value of an investment is the value at the beginning of the investment period, and the present value of the loan is the value of the principal borrowed.
  • Future value (fv): The value of an investment or loan after all payments have occurred, or a cash balance that you want to attain after the last payment is made.
  • Payments due Type (type): A flag indicates if payments are due at the beginning of the period. The omitted type or 0 means payments are due at the end of the period, 1 indicates payments are due at the beginning of the period.
  • Period (period): Specify the period for interest rate or depreciation calculation. The period must be between 1 and the number of periods (nper).
  • Day count basis (basis): Optional, specify the type of day count basis to use. 0-represents US (NASD) 30/360, default; 1-represents Actual/actual; 2-represents Actual /360; 3-represents Actual /365; 4-represents European 30/360.
  • Frequency (frequency): Specify the number of coupon payments per year, 1-annual payment; 2-semiannual payment; 4-quarterly payment.

Below, we use some simple examples to show how to use these SAS financial functions to calculate and analyze mortgages.

Mortgage calculation

The average interest rate of first home mortgage in China is 5.46% since September 2021, and the average interest rate of second home mortgages is 5.83%, up 23 and 29 basis points respectively from 2020. Assuming that the first housing loan amount is 2 million yuan (CNY), the loan periods is 25 years. If the mortgage type is Constant Payment Mortgage (CPM), how much will the monthly payment and total payment be?

The FINANCE PMT function can be used to calculate the periodic payment of an annuity, it can be used to calculate the constant payment for Constant Payment Mortgage by default. That is, the payment is a constant for all periods, the same amount of the loan, including principal and interest, is repaid every month. The required parameters of the FINANCE PMT function are the interest rate (rate), the number of periods (nper), and the loan amount as the present value (pv). The optional parameters are the future value (fv) and the payment due type (type).

The FINANCE PPMT function can be used to calculate the payment on the principal during a given period for an investment or a loan based on a constant periodic payment and a constant interest rate. The required parameters are interest rate (rate), the payment period (period), number of periods (nper), and the loan amount as the present value (pv), optional parameters are future value (fv) and payment due type (type). Since FINANCE PPMT returns the monthly principal, we can calculate the monthly interest and the principal balance (pb) that need to be repaid. The complete SAS program is as follows:

 /*Mortgage: Constant Payment Mortgage*/
data Mortgage1;
  rate=0.0546/12; /*Interest rate: convert yearly 5.46% to monthly*/
  nper=25*12;     /*Number of periods: Term in month, convert 25 years to 300 months*/
  pv=2000000;     /*Present value: Loan amount - the value of the principal borrowed 2M*/
 
  pmt = FINANCE('pmt', rate, nper, pv); /*Monthly payment*/
  pb=pv;           /*Principal Balance*/
  do period = 1 to nper;
    ppmt=FINANCE('ppmt', rate, period, nper, pv); /*Payment on the principal*/
    mi=pmt- ppmt; /*Payment on the interest*/
    pb=pb+ppmt;
    output;
  end;
 
  label rate="Interest Rate (monthly)" nper="Number of periods (month)" pv="Loan amount" 
        period="Period" pmt = "Monthly payment" ppmt="Payment on principal" mi="Payment on interest" pb="Principal balance";
run;
 
title 'Mortgage: Constant Payment Mortgage';
proc print label;
run;

The results are as follows (Figure 1):

Figure 1: Details of Constant Payment Mortgage payment

To show the total payment and the total interest, we use PROC REPORT to generate a summary table as Figure 2. The total payment for the Constant Payment Mortgage is 3,670,206.07, the total interest is 1,670,206.07, and the ratio of interest to principal is 83.51%.

proc report;
  column period periodid pmt ppmt mi pb;
  format period 3.0 pmt ppmt  mi pb comma15.2;
  define period / order noprint;
  define pb / order;
  rbreak after / summarize style=[font_weight=bold];
 
  define periodid / computed "Period" center; 
  compute periodid / character length=20;
    periodid=period ;
  endcomp;
  compute after;
    if _break_ in ('_RBREAK_') then periodid='Total';
  endcomp;
run;

Figure 2: Constant Payment Mortgage payment schedule

If the mortgage type is Constant Amortization Mortgage (CAM), what will happen to its monthly payment and interest expenses? A Constant Amortization Mortgage divides the total amount of the loan into equal parts during the payment periods and the lender repays the same amount of principal every month and the interest incurred on the remaining loan of that month. As a result, the monthly payment amount is not constant, the initial payment is high, but the interest component of the payments declines rapidly over time. For this type of mortgage, the monthly payment on the principal (ppmt) is constant; we only need to calculate the current payment on the interest (mi) generated by the principal balance (pb) to get the current total payment (pmt) for the given period. The calculation can be accomplished without calling SAS FINANCE functions. The program is as follows:

/*Mortgage: Constant Amortization Mortgage*/
data Mortgage2;
  rate=0.0546/12; /*Interest rate: convert yearly 5.46% to monthly*/
  nper=25*12;     /*Number of periods: Term in month, convert 25 years to 300 months*/
  pv=2000000;     /*Present value: Loan amount - the value of the principal borrowed 2M*/
 
  ppmt=-pv/nper;   /*Monthly Payment on the principal*/
  pb=pv;           /*Principal Balance*/
  do period=1 to nper;
    mi=-pb*rate;   /*Payment on the interest*/
    pmt=ppmt+ mi;
    pb=pb+ppmt;
    output;
  end;
 
  label rate="Interest Rate (monthly)" nper="Number of periods (month)" pv="Loan amount" 
        period="Period" pmt = "Monthly payment" ppmt="Payment on principal" mi="Payment on interest" pb="Principal balance";
RUN;
 
title 'Mortgage: Constant Amortization Mortgage';
PROC PRINT;
RUN;

We use the same PROC REPORT program above to generate the summary table as Figure 3. In the table, the total payment for the Constant Amortization Mortgage is 3,369,550.00, the total interest is 1,369,550.00, and the ratio of interest to principal is 68.48%. This indicates that using this type of mortgage can save 300,656.07 in interest expenses.

Figure 3: Constant Amortization Mortgage payment schedule

To compare the monthly payment, monthly interest, and their cumulative value changes with the repayment periods for two mortgage types, we use PROC SGPANEL to draw the following figures. The code is as follows:

/*Merge two mortgage data with type=1/2 in long format*/
data Mortgage;
  set Mortgage1 Mortgage2;  
  retain type 1;
  if period=1 and _N_^=1 then type=type+1;
  pmt=abs(pmt);
  mi=abs(mi);
  ppmt=abs(ppmt);
run;
/*Create format for mortgage type*/
proc format;
  value typefmt 1="Constant Payment Mortgage"
2="Constant Amortization Mortgage";
run;
data Mortgage;
  set Mortgage;
  accuppmt+ppmt;
  accupmt+pmt;
  if first.type then do; 
	accuppmt=ppmt;
	accupmt=pmt;
  end;
  by type;
  label accuppmt="Cumulative monthly payment on principal" accupmt="Cumulative monthly payment";
  format type typefmt.;
run;
 
ods graphics / width=800px height=300px;
title "Mortgage: CPM vs CAM";
proc sgpanel data=Mortgage;
  panelby type / novarname rows=2;
 
  vline period / response=pmt lineattrs=(color=Cxa0a0a0);
  vbar period / response=pmt nooutline fillattrs=(color=cxE0E0E0) FILLTYPE=SOLID transparency= 0.5;
  vbar period / response=ppmt nooutline fillattrs=(color=DarkOrange) FILLTYPE=SOLID transparency= 0.5;  
  colaxis label="Period" type=linear  values=(1 to 300 by 12  ) valueattrs=(color=gray size=12pt) grid;
run;
 
ods graphics / width=800px height=500px;
title "Mortgage: CPM vs CAM";
proc sgpanel data=Mortgage;
  panelby type / novarname rows=2 ;
 
  vline period / response=accupmt lineattrs=(color=Cxa0a0a0);
  vbar period / response=accupmt nooutline fillattrs=(color=cxE0E0E0) FILLTYPE=SOLID transparency= 0.5;;
  vbar period / response=accuppmt nooutline fillattrs=(color=DarkOrange) FILLTYPE=SOLID transparency= 0.5;;
  colaxis label="Period" type=linear  values=(1 to 300 by 12  )  valueattrs=(color=gray size=11pt) grid;  
run;

In the following figures, the y-axis represents the monthly payment or the cumulative monthly payment, and the x-axis represents the payment period. The orange represents the principal component and the gray represents the interest component; the two constitute the monthly payment.

In Figure 4, the monthly payment of CPM (top), and the monthly payment on principal of CAM (bottom) are fixed values, which appear as a horizontal line and do not change over time. Correspondingly, in Figure 5, the cumulative monthly payment of CPM (top) and the cumulative monthly payment on principal (bottom) show linear growth.

Figure 4: Monthly payment of CPM vs CAM over period

Figure 5: Cumulative monthly payment of CPM vs CAM over period

Since the monthly payment of CPM is constant, the principal component increases month by month, but the interest component decreases accordingly. For CAM, the monthly payment and the interest component decreases rapidly month by month, but the principal component is constant.

CPM brings less stress to lenders and it is suitable for people whose income is expected to rise, and the lender is only willing to bear a lower monthly payment. CPM is also suitable for those people who have other channels to manage their own money to achieve greater benefits than mortgage expenditures. It is also suitable for those people who have a plan to sell real estate after a certain period. The disadvantage of CPM is that the overall interest expense is relatively high.

CAM is suitable for people who have higher income but whose income is flat or declining in the future. These people hope to put the payment stress on now rather than in the future. Declining payments may be appropriate to match a deflationary environment, but its high initial payment and rapid paydown of principal reduces leverage too fast. Furthermore, some people may complain the constantly changing payment is difficult to budget for and manage.

Summary

We have summarized the design and implementation of SAS functions in financial calculations. We then provided examples to calculate and analyze the monthly payment, interest, and principal for CPM/CAM mortgages. Finally, we compared the monthly payment, principal, interest, and cumulative value changes to a period for two mortgage types.

SAS financial functions review and mortgage payment analysis was published on SAS Users.

12月 232021
 

As the leader in analytics, SAS provides a very rich set of SAS functions and CALL routines for programmers. There is a total of 693 uniquely named SAS functions besides the hundreds of SAS PROC steps. These SAS functions not only cover general functions such as character/string processing, mathematical, permutation & combination, probability, search and sorting, date and time, etc., but also includes a powerful domain-specific function set involving variable information and control, external files and system extensions, web services and tools, descriptive statistics, financial calculations, geodetic distance & ZIP code, Git and other fields. This article reviews the financial functions provided by the SAS system and demonstrates how to use them to do financial analysis and simplify your decisions. Functions with the same name can be invoked easily during programming with SAS DATA step, DS2, FedSQL and CAS environment, which will improve your programs' portability.

Overview

SAS provides 94 financial functions in total, which are mainly used to calculate financial values, such as interest, periodic payments, depreciation, and prices for European options on stocks and futures. The 53 financial functions conforming to industry standards are packaged in 1 general function called FINANCE. Each finance function has a unique string identifier for the first parameter of the FINANCE function. These functions have to be invoked via the FINANCE function. For example, we can use irr=finance("IRR", -400,100,200,300); to compute the internal rate of return (IRR) for a series of cash flows.

The 53 FINANCE functions are compatible with the 53 financial functions provided by Microsoft Excel for Office 365. In fact, the list of financial functions available in the latest Excel for Office 365 software contains 56 functions. Among the 3 extra functions, PDURATION and RRI are specific functions for Excel 2013, and STOCKHISTORY is a specific function for Office 365. These 53 financial functions are designed for financial calculation such as depreciation, maturation, accrued interest, net present value, periodic savings, internal rates of return, etc. They can be categorized as investment, depreciation, Internal rate of return, financial value, and yield calculations. The complete list of SAS FINANCE functions is shown in Table 1.

Category Subcategory Function
Investment Calculation [12] Future Value [2] FV, FVSCHEDULE
Payment [4] PMT, IPMT, PPMT, ISPMT
Present Value [3] PV, NPV, XNPV
Annual interest rate [2] EFFECT, NOMINAL
Number of periods [1] NPER
Depreciation [7] AMORDEGRC, AMORLINC, DB, DDB, SLN, SYD, VDB
Internal rate of return [4] IRR, RATE, MIRR, XIRR
Financial value and yield [30] Accrued/Cumulative Interest & Principal [4] ACCRINT, ACCRINTM, CUMIPMT, CUMPRINC
Coupon days/date and Number [6] COUPDAYBS, COUPDAYS, COUPDAYSNC, COUPNCD, COUPNUM, COUPPCD
Yield [6] INTRATE, ODDFYIELD, ODDLYIELD, YIELD, YIELDDISC, YIELDMAT
Treasury bill [3] TBILLEQ, TBILLPRICE, TBILLYIELD
Duration [2] DURATION, MDURATION
Security [7] DISC, ODDFPRICE, ODDLPRICE, PRICE, PRICEDISC, PRICEMAT, RECEIVED
Dollar Price [2] DOLLARDE, DOLLARFR

Table 1: 53 FINANCE XXX functions in one FINANCE function

Another 40 independently named SAS financial functions are used for investment or loan calculation, cash flow, depreciation, and pricing models. For example, we can use irr =irr(1,-400,100,200,300); to compute the internal rate of return (IRR) as a percentage for a series of cash flows.

There are 7 functions (IRR, NPV, PMT, PPMT, IPMT, CUMIPMT, CUMPRINC) whose names are the same as the industry standard FINANCE functions, but they have different sign-in results or even different calculation logic. For example, FINANCE PMT is used to calculate the periodic payment of an annuity, but the PMT function is used to calculate the periodic payment for a constant payment loan or the periodic savings for a future balance. Annuity and periodic savings are different in anchoring market interest rates, interest calculation, liquidity, and binding, so the same arguments return different results for them. The list is shown in Table 2.

Category Function
Investment/Loan [16] IRR/INTRR, NPV/NETPV, SAVING, SAVINGS,
NOMRATE, EFFRATE, COMPOUND,
PMT, PPMT, IPMT, CUMIPMT, CUMPRINC, MORT, TIMEVALUE
Depreciation [10] DACCDB, DACCDBSL, DACCSL, DACCSYD, DACCTAB
DEPDB, DEPDBSL, DEPSL, DEPSYD, DEPTAB
Cash flow [6] CONVX, CONVXP, DUR, DURP, PVP, YIELDP
Pricing Models [8] Stocks: BLKSHCLPRC/BLKSHPTPRC, GARKHCLPRC/GARKHPTPRC, MARGRCLPRC/MARGRPTPRC
Futures: BLACKCLPRC/BLACKPTPRC

Table 2: 40 uniquely named financial functions

To keep semantic consistency, these industry-standard financial functions usually have the same parameter names, and some of the most important parameters are:

  • Interest rate (rate): The interest rate or discount rate of an investment or loan.
  • Number of periods (nper): The total number of payment periods for investment or loan.
  • Payment (pmt): The regular payment that is made each period for investment or loan, the value cannot change over the life of the annuity. Typically, it contains principal and interest, but not other fees and taxes.
  • Present value (pv): The current value of an investment or loan. The present value of an investment is the value at the beginning of the investment period, and the present value of the loan is the value of the principal borrowed.
  • Future value (fv): The value of an investment or loan after all payments have occurred, or a cash balance that you want to attain after the last payment is made.
  • Payments due Type (type): A flag indicates if payments are due at the beginning of the period. The omitted type or 0 means payments are due at the end of the period, 1 indicates payments are due at the beginning of the period.
  • Period (period): Specify the period for interest rate or depreciation calculation. The period must be between 1 and the number of periods (nper).
  • Day count basis (basis): Optional, specify the type of day count basis to use. 0-represents US (NASD) 30/360, default; 1-represents Actual/actual; 2-represents Actual /360; 3-represents Actual /365; 4-represents European 30/360.
  • Frequency (frequency): Specify the number of coupon payments per year, 1-annual payment; 2-semiannual payment; 4-quarterly payment.

Below, we use some simple examples to show how to use these SAS financial functions to calculate and analyze mortgages.

Mortgage calculation

The average interest rate of first home mortgage in China is 5.46% since September 2021, and the average interest rate of second home mortgages is 5.83%, up 23 and 29 basis points respectively from 2020. Assuming that the first housing loan amount is 2 million yuan (CNY), the loan periods is 25 years. If the mortgage type is Constant Payment Mortgage (CPM), how much will the monthly payment and total payment be?

The FINANCE PMT function can be used to calculate the periodic payment of an annuity, it can be used to calculate the constant payment for Constant Payment Mortgage by default. That is, the payment is a constant for all periods, the same amount of the loan, including principal and interest, is repaid every month. The required parameters of the FINANCE PMT function are the interest rate (rate), the number of periods (nper), and the loan amount as the present value (pv). The optional parameters are the future value (fv) and the payment due type (type).

The FINANCE PPMT function can be used to calculate the payment on the principal during a given period for an investment or a loan based on a constant periodic payment and a constant interest rate. The required parameters are interest rate (rate), the payment period (period), number of periods (nper), and the loan amount as the present value (pv), optional parameters are future value (fv) and payment due type (type). Since FINANCE PPMT returns the monthly principal, we can calculate the monthly interest and the principal balance (pb) that need to be repaid. The complete SAS program is as follows:

 /*Mortgage: Constant Payment Mortgage*/
data Mortgage1;
  rate=0.0546/12; /*Interest rate: convert yearly 5.46% to monthly*/
  nper=25*12;     /*Number of periods: Term in month, convert 25 years to 300 months*/
  pv=2000000;     /*Present value: Loan amount - the value of the principal borrowed 2M*/
 
  pmt = FINANCE('pmt', rate, nper, pv); /*Monthly payment*/
  pb=pv;           /*Principal Balance*/
  do period = 1 to nper;
    ppmt=FINANCE('ppmt', rate, period, nper, pv); /*Payment on the principal*/
    mi=pmt- ppmt; /*Payment on the interest*/
    pb=pb+ppmt;
    output;
  end;
 
  label rate="Interest Rate (monthly)" nper="Number of periods (month)" pv="Loan amount" 
        period="Period" pmt = "Monthly payment" ppmt="Payment on principal" mi="Payment on interest" pb="Principal balance";
run;
 
title 'Mortgage: Constant Payment Mortgage';
proc print label;
run;

The results are as follows (Figure 1):

Figure 1: Details of Constant Payment Mortgage payment

To show the total payment and the total interest, we use PROC REPORT to generate a summary table as Figure 2. The total payment for the Constant Payment Mortgage is 3,670,206.07, the total interest is 1,670,206.07, and the ratio of interest to principal is 83.51%.

proc report;
  column period periodid pmt ppmt mi pb;
  format period 3.0 pmt ppmt  mi pb comma15.2;
  define period / order noprint;
  define pb / order;
  rbreak after / summarize style=[font_weight=bold];
 
  define periodid / computed "Period" center; 
  compute periodid / character length=20;
    periodid=period ;
  endcomp;
  compute after;
    if _break_ in ('_RBREAK_') then periodid='Total';
  endcomp;
run;

Figure 2: Constant Payment Mortgage payment schedule

If the mortgage type is Constant Amortization Mortgage (CAM), what will happen to its monthly payment and interest expenses? A Constant Amortization Mortgage divides the total amount of the loan into equal parts during the payment periods and the lender repays the same amount of principal every month and the interest incurred on the remaining loan of that month. As a result, the monthly payment amount is not constant, the initial payment is high, but the interest component of the payments declines rapidly over time. For this type of mortgage, the monthly payment on the principal (ppmt) is constant; we only need to calculate the current payment on the interest (mi) generated by the principal balance (pb) to get the current total payment (pmt) for the given period. The calculation can be accomplished without calling SAS FINANCE functions. The program is as follows:

/*Mortgage: Constant Amortization Mortgage*/
data Mortgage2;
  rate=0.0546/12; /*Interest rate: convert yearly 5.46% to monthly*/
  nper=25*12;     /*Number of periods: Term in month, convert 25 years to 300 months*/
  pv=2000000;     /*Present value: Loan amount - the value of the principal borrowed 2M*/
 
  ppmt=-pv/nper;   /*Monthly Payment on the principal*/
  pb=pv;           /*Principal Balance*/
  do period=1 to nper;
    mi=-pb*rate;   /*Payment on the interest*/
    pmt=ppmt+ mi;
    pb=pb+ppmt;
    output;
  end;
 
  label rate="Interest Rate (monthly)" nper="Number of periods (month)" pv="Loan amount" 
        period="Period" pmt = "Monthly payment" ppmt="Payment on principal" mi="Payment on interest" pb="Principal balance";
RUN;
 
title 'Mortgage: Constant Amortization Mortgage';
PROC PRINT;
RUN;

We use the same PROC REPORT program above to generate the summary table as Figure 3. In the table, the total payment for the Constant Amortization Mortgage is 3,369,550.00, the total interest is 1,369,550.00, and the ratio of interest to principal is 68.48%. This indicates that using this type of mortgage can save 300,656.07 in interest expenses.

Figure 3: Constant Amortization Mortgage payment schedule

To compare the monthly payment, monthly interest, and their cumulative value changes with the repayment periods for two mortgage types, we use PROC SGPANEL to draw the following figures. The code is as follows:

/*Merge two mortgage data with type=1/2 in long format*/
data Mortgage;
  set Mortgage1 Mortgage2;  
  retain type 1;
  if period=1 and _N_^=1 then type=type+1;
  pmt=abs(pmt);
  mi=abs(mi);
  ppmt=abs(ppmt);
run;
/*Create format for mortgage type*/
proc format;
  value typefmt 1="Constant Payment Mortgage"
2="Constant Amortization Mortgage";
run;
data Mortgage;
  set Mortgage;
  accuppmt+ppmt;
  accupmt+pmt;
  if first.type then do; 
	accuppmt=ppmt;
	accupmt=pmt;
  end;
  by type;
  label accuppmt="Cumulative monthly payment on principal" accupmt="Cumulative monthly payment";
  format type typefmt.;
run;
 
ods graphics / width=800px height=300px;
title "Mortgage: CPM vs CAM";
proc sgpanel data=Mortgage;
  panelby type / novarname rows=2;
 
  vline period / response=pmt lineattrs=(color=Cxa0a0a0);
  vbar period / response=pmt nooutline fillattrs=(color=cxE0E0E0) FILLTYPE=SOLID transparency= 0.5;
  vbar period / response=ppmt nooutline fillattrs=(color=DarkOrange) FILLTYPE=SOLID transparency= 0.5;  
  colaxis label="Period" type=linear  values=(1 to 300 by 12  ) valueattrs=(color=gray size=12pt) grid;
run;
 
ods graphics / width=800px height=500px;
title "Mortgage: CPM vs CAM";
proc sgpanel data=Mortgage;
  panelby type / novarname rows=2 ;
 
  vline period / response=accupmt lineattrs=(color=Cxa0a0a0);
  vbar period / response=accupmt nooutline fillattrs=(color=cxE0E0E0) FILLTYPE=SOLID transparency= 0.5;;
  vbar period / response=accuppmt nooutline fillattrs=(color=DarkOrange) FILLTYPE=SOLID transparency= 0.5;;
  colaxis label="Period" type=linear  values=(1 to 300 by 12  )  valueattrs=(color=gray size=11pt) grid;  
run;

In the following figures, the y-axis represents the monthly payment or the cumulative monthly payment, and the x-axis represents the payment period. The orange represents the principal component and the gray represents the interest component; the two constitute the monthly payment.

In Figure 4, the monthly payment of CPM (top), and the monthly payment on principal of CAM (bottom) are fixed values, which appear as a horizontal line and do not change over time. Correspondingly, in Figure 5, the cumulative monthly payment of CPM (top) and the cumulative monthly payment on principal (bottom) show linear growth.

Figure 4: Monthly payment of CPM vs CAM over period

Figure 5: Cumulative monthly payment of CPM vs CAM over period

Since the monthly payment of CPM is constant, the principal component increases month by month, but the interest component decreases accordingly. For CAM, the monthly payment and the interest component decreases rapidly month by month, but the principal component is constant.

CPM brings less stress to lenders and it is suitable for people whose income is expected to rise, and the lender is only willing to bear a lower monthly payment. CPM is also suitable for those people who have other channels to manage their own money to achieve greater benefits than mortgage expenditures. It is also suitable for those people who have a plan to sell real estate after a certain period. The disadvantage of CPM is that the overall interest expense is relatively high.

CAM is suitable for people who have higher income but whose income is flat or declining in the future. These people hope to put the payment stress on now rather than in the future. Declining payments may be appropriate to match a deflationary environment, but its high initial payment and rapid paydown of principal reduces leverage too fast. Furthermore, some people may complain the constantly changing payment is difficult to budget for and manage.

Summary

We have summarized the design and implementation of SAS functions in financial calculations. We then provided examples to calculate and analyze the monthly payment, interest, and principal for CPM/CAM mortgages. Finally, we compared the monthly payment, principal, interest, and cumulative value changes to a period for two mortgage types.

SAS financial functions review and mortgage payment analysis was published on SAS Users.

12月 222021
 

It feels like yesterday that we were signing a contract with a new vendor, Acclaim, to provide digital badging services for our SAS Certification candidates. That was over five years ago. Since then, Acclaim joined forces with another market leader and became part of Credly. SAS has expanded our badging program to include more than just certifications; and COVID-19 accelerated our already increasingly digital lives. When NFTs are selling for 150 Ethereum ($650,000) and the Metaverse is emerging, why would you be hanging a paper certificate on your office wall? I mean who has been in your office lately?

The digital badge has replaced the paper certificate with a living, breathing, skills-based currency. Let’s unpack that. Validated skills in the form of certifications and credentials are becoming more prevalent - and possibly even preferred – to college degrees. These skills have become a currency in the job market that is facing shortages in many skill areas including data analytics. An estimated 11.5 million data science jobs will become available by 2026. Your digital badge is not only a representation of an accomplishment, but an indicator of the important skills you possess that hold value in the labor market.

It’s time to put your badge to work.

What's in a Digital Badge?

When you receive a digital badge from SAS, it is made up of underlying metadata. This includes:

    • A description of the accomplishment being recognized and a link to more detailed information.
    • A list of the criteria required to earn the badge.
    • A series of attributes describing the type of accomplishment, amount of time invested, costs, and skill level (such as intermediate vs. expert level).
    • A list of recommended next actions. For example, if you earn the SAS Certified Specialist Badge, it points you towards the information on the next logical step, passing the Advanced exam to earn the SAS Certified Professional Certification.
    • A series of skills tags (and this is where the fun begins).

What should I do with the badge?

We’ve all heard the phrase, “Sharing is caring.” In this case, it’s self-care. Accept your badge and share it on LinkedIn or other social media platforms to showcase your accomplishment and associated skills. Once you’ve done this, you’ve established yourself as a trader in the skills economy.

Credly has a wealth of Labor Market Insights (LMI) and skills-data housed in their platform. Through a recent partnership with Emsi, Credly is utilizing a vast skills library to normalize skills being earned through digital badges and more tightly align them to skills being requested in open job requisitions. The bottom line, posting your badge makes it easier to find jobs that closely align with your skill set.

Here's how to leverage the power of your badge

Are you looking for your first job or maybe a new job? You can open your badge, see associated skills and click the skills tags to view more information. Each skills tag will lead to a page containing:

    • Job openings sourced from over 25,000 job boards.
    • Top locations where those jobs are located.
    • Salary ranges.
    • The top companies looking for those skills.
    • A series of related skills you may be interested in adding to your “skills wallet.”
    • And some others related badges you may wish to earn.

From here, click the jobs that interest you, get connected with that employer and apply for a job immediately. For step-by-step instructions, watch this short video. For additional information on features and options, visit Credly Support.

What does this mean for SAS users? For those with a Machine Learning using SAS Viya Learn badge, there are more than 125,000 active job openings. For those with a SAS Programming badge, there are more than 2,900 job openings.

What's next for Credly?

Credly is currently building out additional solutions for talent acquisition.

    1. Corporations hiring can send their job requisitions to Credly.
    2. Credly will map the requirements to relevant skills and credentials.
    3. Credly will produce a curated list of top candidates meeting the required skills.

This sounds great if you are a company or a recruiting firm, but why is this important as a badge earner? The answer is that by simply earning badges and validating the skills you possess, with little effort, you will be front and center for employers looking for candidates who meet your skills profile.

Invest in a rapidly growing currency…skills was published on SAS Users.

12月 162021
 

Because it is near the end of the year, I thought a blog about "Summarizing" data might be in order.

For these examples, I am going to use a simulated data set called Drug_Study, containing some categorical and numerical variables. For those interested readers, the SAS code that I used to generate the data set Drug_Study is shown below. (Please refer to two earlier blogs that describe how to create simulated data sets.)

*Program to create data set Drug_Study';
 
proc format;
   value $Gender 'M' = 'Male'
                 'F' = 'Female'; 
   run;
 
   data Drug_Study;
   call streaminit(13579);
   length Chol_Group $ 6;
   do i = 1 to 1000;
      do Drug = 'Placebo','A','B';
         Subject + 1;
         if rand('Bernoulli',.5) = 1 then Gender = 'F';
         else Gender = 'M';
 
         HR = rand('Normal',80,10) - 10*(Drug = 'A') + 10*(Drug = 'B') 
             - 5*(Gender = 'F');
         HR = round(HR);
 
         Cholesterol = rand('Normal',200,20) - 20*(Drug = 'A') -10*(Drug = 'B') 
                       - 10*(Gender = 'F');
         Cholesterol = round(Cholesterol);
 
         if Cholesterol lt 180 then Chol_Group = 'Low';
         else if Cholesterol lt 200 then Chol_Group = 'Normal';
         else Chol_Group = 'High';
 
         output;
      end;
   end;
   drop i;
   format Gender $Gender.;
run;
 
title "Listing of Drug_Study - first 9 observations";
proc print data=Drug_Study(obs=9);
   id Subject;
   var Drug Gender Cholesterol Chol_Group;
run;

The first nine observations from this data set are shown below.

Let's start out­ with the most basic summarization—computing statistics for all numeric variables for the entire data set. You can write a program as simple as:

proc means data=Drug_Study;
run;

However, this program will compute default statistics for every numeric variable in the data set (including Subject). You will probably want to include PROC MEANS options to select what statistics you want and a VAR statement to list the variables that you want to summarize. Here is an example:

title "Summarizing the Entire Drug_Study Data Set";
proc means data=Drug_Study n nmiss median mean std clm maxdec=3;
   var HR Cholesterol;
run;

The procedure options selected here are some of my favorites. This is what they represent:

n The number of nonmissing observations
nmiss The number of observations with missing values
median The median value (the 50th percentile)
mean The mean
std The standard deviation
clm The 95% confidence limits for the mean. (You are 95% confident that the mean from which this sample was taken is between these two limits.)
maxdec The maximum number of places to print following the decimal point

Here is the output:

You can tell this is simulated (fake) data because there are no missing values. This would be very unusual in a real-life situation.

Most researchers would like to break down the two variables of interest (HR and Cholesterol) by some of the classification variables, such as Gender and Drug. You use a CLASS statement to do this. Let's start out by requesting statistics on HR and Cholesterol, broken down by Drug.

title "Variables Broken Down by Drug";
proc means data=Drug_Study n nmiss median mean std clm maxdec=3;
   class Drug;
   var HR Cholesterol;
run;

 

Here is the output:

You could repeat this program, substituting Gender instead of Drug as the CLASS variable, yielding the following:

Finally, you can include a CLASS statement, listing those two class variables. The program, modified to show statistics broken down by Drug and Gender is shown next, followed by the output.

title "Variables Broken Down by Drug and Gender";
proc means data=Drug_Study n nmiss median mean std clm maxdec=3;
   class Drug Gender;
   var HR Cholesterol;
run;


It is tedious having to run PROC MEANS four times to produce all of the outputs shown thus far. Instead, you can use the two CLASS variables and add the PROC MEANS option PRINTALLTYPES when you run the program. It looks like this:

title "Variables Broken Down by Drug and Gender";
footnote "Including the PROC MEANS option PRINTALLTYPES";
proc means data=Drug_Study n nmiss median mean std clm maxdec=3
   printalltypes;
   class Drug Gender;
   var HR Cholesterol;
run;

In one run, you now see the two variables, HR and Cholesterol, for all combinations of the class variables as shown below.

All of the programs used in this blog were run using SAS OnDemand for Academics using the SAS Studio editor. You could have also produced all of the outputs using built-in SAS Studio tasks. You can read more about how to do this in my book A Gentle Introduction to Statistics Using SAS Studio in the Cloud.

As always, comments and/or corrections are welcome.

Summarizing data was published on SAS Users.

12月 142021
 

The recent news of log4j vulnerabilities is still top-of-mind for many SAS customers. We want to share with you the latest activities and findings from SAS.

For SAS customers:

Security and integrity of SAS software and Cloud hosted environments is always our top priority. In the early days of the log4j news (mid-December), the SAS teams responded swiftly. For our on-premises customers, we provided information on SAS products that were affected as well as recommended actions. SAS Viya 2021.2.2 includes an updated version of log4j. Additionally, we released a free tool (called loguccino) that you can use to detect and patch vulnerable log4j files within your SAS 9.4 and SAS Viya 3.x environments.

For our SAS Cloud hosted customers, we immediately hardened aspects of the environment that such a vulnerability could potentially exploit, including tighter network-based policy filters and increased surveillance. We have detected no evidence of attacks related to SAS software specific to these published vulnerabilities.

Upon further research, we determined that while the log4j vulnerability itself is severe, the log4j configuration and use within our SAS-hosted systems presents very limited exposure. No unauthenticated users (that is, users without existing privilege to access) can trigger the remote code execution vulnerability. Given these findings and the preventative measures already in place, we feel confident that your SAS applications and data remain secure in SAS Cloud.

For additional details please refer to the SAS Security Bulletin.

Thank you for your continued partnership in keeping your SAS environments productive and safe and thank you for being a SAS customer!

Highlights from the security bulletin

  • To receive notifications about bulletin updates, subscribe to the Updates on log4j Remote Code Execution Vulnerability (CVE-2021-44228) topic on SAS Support Communities or follow this RSS feed.
  • In an effort to provide an audited and automated approach for customers, SAS has developed a vulnerability patch script called loguccino. Loguccino is a tool that is similar to logpresso but customized for SAS software. The tool is specifically designed to remediate 9.4 and SAS Viya 3 environments and recursively searches for vulnerable log4j jar files, removes the JndiLookup class, and repackages the JAR without the vulnerability.
  • The bulletin describes the plans and timelines for SAS to deliver updated versions of log4j in its software.
  • The continuous and ongoing investigation SAS has made into the use of Log4j within the SAS Viya 2020.1, SAS Viya 3.5, SAS Viya 3.4 platform and the SAS 9 SAS Logon process has concluded that, given the community understanding of CVE-2021-44228, unauthenticated remote code execution (RCE) exploits are not possible at this time.
  • The major vulnerability scanning vendors (QualysRapid7, and Tenable) have all released updated signatures to check for the most common attack vectors related to this vulnerability.

As a reminder, you can always find the latest security bulletins from SAS on our Support website.

This post was originally published on December 13, 2021. It has been updated to include new messages and recent highlights from the SAS security bulletin.

Updates on the Apache Log4j CVE-2021-44228 vulnerability was published on SAS Users.