Tech

12月 102019
 
The DATA step has been around for many years and regardless of how many new SAS® products and solutions are released, the DATA step remains a popular way to create and manipulate SAS data sets. The SAS language contains a wide variety of approaches that provide an endless number of ways to accomplish a goal. Whether you are reshaping a data set entirely or simply assigning values to a new variable, there are numerous tips and tricks that you can use to save time and keystrokes. Here are a few that Technical Support offers to customers on a regular basis.

Writing file contents to the SAS® log

Perhaps you are reading a file and seeing “invalid data” messages in the log or you are not sure which delimiter is used between variables. You can use the null INPUT statement to read a small sample of the data. Specify the amount of data that you want to read by adjusting options in the INFILE statement. This example reads one record with 500 bytes:

   data _null_;
      infile 'path' recfm=f lrecl=500 obs=1;
      input;
      list;
   run;

The LIST statement writes the current record to the log. In addition, it includes a ruled line above the record so that it is easier to see the data for each column of the file. If the data contains at least one unprintable character, you will see three lines of output for each record written.

For example:

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+
 
1   CHAR  this is a test.123.dog.. 24
    ZONE  766726726276770333066600
    NUMR  48930930104534912394F7DA

The line beginning with CHAR describes the character that is represented by the two hexadecimal characters underneath. When you see a period in that line, it might actually be a period, but it often means that the hexadecimal characters do not map to a specific key on the keyboard. Also, column 15 for this data is a tab ('09'x). By looking at this record in the log, you can see that the first variable contains spaces and that the file is tab delimited.

Writing hexadecimal output for all input

Since SAS® 9.4M6 (TS1M6), the HEXLISTALL option in the LIST statement enables all lines of input data to be written in hexadecimal format to the SAS log, regardless of whether there are unprintable characters in the data:

   data mylib.new / hexlistall;

Removing “invalid data” messages from the log

When SAS reads data into a data set, it is common to encounter values that are invalid based on the variable type or the informat that is specified for a variable. You should examine the SAS log to assess whether the informat is incorrect or if there are some records that are actually invalid. After you ensure that the data is being read correctly, you can dismiss any “invalid data” messages by using double question mark (??) modifiers after the variable name in question:

   input date ?? mmddyy10.;

You can also use the question mark modifiers when you convert a character variable to numeric with the INPUT function:

   newdate=input(olddate,?? mmddyy10.);

The above syntax reads all values with the MMDDYY10. informat and then dismisses the notes to the log when some values for the OLDDATE variable are invalid.

Sharing files between UNIX and Microsoft Windows operating systems

The end-of-record markers in text files are different for UNIX and Windows operating systems. If you needed to share a file between these systems, the file used to need preprocessing in order to change the markers to the desired type or perhaps specify a delimiter. Now, the TERMSTR= option in the INFILE statement enables you to specify the end-of-record marker in the incoming file.

If you are working in a UNIX environment and you need to read a file that was created in a Windows environment, use the TERMSTR=CRLF option:

   infile 'file-specification'  termstr=crlf ;

If you are in a Windows environment and you need to read a file that was created in a UNIX environment, use this syntax:

   infile 'file-specification'  termstr=lf ;

Adapting array values from an ARRAY statement

The VNAME function makes it very convenient to use the variable names from an ARRAY statement. You most often use an ARRAY statement to obtain the values from numerous variables. In this example, the SQL procedure makes it easy to store the unique values of the variable Product into the macro variable &VARLIST and that number of values into the macro variable &CT (another easy tip). Within the DO loop, you obtain the names of the variables from the array, and those values from the array then become variable names.

   proc sql noprint;
      select distinct(product) into :varlist separated by ' '
      from one;
      select count(distinct product) into :ct
      from one;
   quit;
 
   …more DATA step statements…
   array myvar(&ct) $ &varlist;
      do i=1 to &ct;
         if product=vname(myvar(i)) then do;
         myvar(i)=left(put(contract,8.));
   end;
   …more DATA step statements…

Using a mathematical equation in a DO loop

A typical DO loop has a beginning and an end, both represented by integers. Did you know you can use an equation to process data more easily? Suppose that you want to process every four observations as one unit. You can run code similar to the following:

   …more DATA step statements…
   J+1;
   do i=((j*4)-3) to (j*4);
      set data-set-name point=I;
      …more DATA step statements…
   end;

Using an equation to point to an array element

With small data sets, you might want to put all values for all observations into a single observation. Suppose that you have a data set with four variables and six observations. You can create an array to hold the existing variables and also create an array for the new variables. Here is partial code to illustrate how the equation dynamically points to the correct new variable name in the array:

   array old(4) a b c d;
   array test (24) var1-var24;
   retain var1-var24;
   do i=1 to nobs;
      set w nobs=nobs point=i;
      do j=1 to 4;
      test(((i-1)*4)+j)=old(j);
      end;
   end;

Creating a hexadecimal reference chart

How often have you wanted to know the hexadecimal equivalent for a character or vice versa? Sure, you can look up a reference chart online, but you can also create one with a short program. The BYTE function returns values in your computer’s character set. The PUT statement writes the decimal value, hexadecimal value, and equivalent character to the SAS log:

   data a;
      do i=0 to 255;
      x=byte(i);
      put i=z3. i=hex2. x=;
   end;
   run;

When the resulting character is unprintable, such as a carriage return and line feed (CRLF) character, the X value is blank.

Conclusion

Hopefully, these new tips will be useful in your future programming. If you know some additional tips, please comment them below so that readers of this blog can add even more DATA step tips to their arsenal! If you would like to learn more about DATA step, check out these other blogs:

Old reliable: DATA step tips and tricks was published on SAS Users.

12月 102019
 
The DATA step has been around for many years and regardless of how many new SAS® products and solutions are released, the DATA step remains a popular way to create and manipulate SAS data sets. The SAS language contains a wide variety of approaches that provide an endless number of ways to accomplish a goal. Whether you are reshaping a data set entirely or simply assigning values to a new variable, there are numerous tips and tricks that you can use to save time and keystrokes. Here are a few that Technical Support offers to customers on a regular basis.

Writing file contents to the SAS® log

Perhaps you are reading a file and seeing “invalid data” messages in the log or you are not sure which delimiter is used between variables. You can use the null INPUT statement to read a small sample of the data. Specify the amount of data that you want to read by adjusting options in the INFILE statement. This example reads one record with 500 bytes:

   data _null_;
      infile 'path' recfm=f lrecl=500 obs=1;
      input;
      list;
   run;

The LIST statement writes the current record to the log. In addition, it includes a ruled line above the record so that it is easier to see the data for each column of the file. If the data contains at least one unprintable character, you will see three lines of output for each record written.

For example:

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+
 
1   CHAR  this is a test.123.dog.. 24
    ZONE  766726726276770333066600
    NUMR  48930930104534912394F7DA

The line beginning with CHAR describes the character that is represented by the two hexadecimal characters underneath. When you see a period in that line, it might actually be a period, but it often means that the hexadecimal characters do not map to a specific key on the keyboard. Also, column 15 for this data is a tab ('09'x). By looking at this record in the log, you can see that the first variable contains spaces and that the file is tab delimited.

Writing hexadecimal output for all input

Since SAS® 9.4M6 (TS1M6), the HEXLISTALL option in the LIST statement enables all lines of input data to be written in hexadecimal format to the SAS log, regardless of whether there are unprintable characters in the data:

   data mylib.new / hexlistall;

Removing “invalid data” messages from the log

When SAS reads data into a data set, it is common to encounter values that are invalid based on the variable type or the informat that is specified for a variable. You should examine the SAS log to assess whether the informat is incorrect or if there are some records that are actually invalid. After you ensure that the data is being read correctly, you can dismiss any “invalid data” messages by using double question mark (??) modifiers after the variable name in question:

   input date ?? mmddyy10.;

You can also use the question mark modifiers when you convert a character variable to numeric with the INPUT function:

   newdate=input(olddate,?? mmddyy10.);

The above syntax reads all values with the MMDDYY10. informat and then dismisses the notes to the log when some values for the OLDDATE variable are invalid.

Sharing files between UNIX and Microsoft Windows operating systems

The end-of-record markers in text files are different for UNIX and Windows operating systems. If you needed to share a file between these systems, the file used to need preprocessing in order to change the markers to the desired type or perhaps specify a delimiter. Now, the TERMSTR= option in the INFILE statement enables you to specify the end-of-record marker in the incoming file.

If you are working in a UNIX environment and you need to read a file that was created in a Windows environment, use the TERMSTR=CRLF option:

   infile 'file-specification'  termstr=crlf ;

If you are in a Windows environment and you need to read a file that was created in a UNIX environment, use this syntax:

   infile 'file-specification'  termstr=lf ;

Adapting array values from an ARRAY statement

The VNAME function makes it very convenient to use the variable names from an ARRAY statement. You most often use an ARRAY statement to obtain the values from numerous variables. In this example, the SQL procedure makes it easy to store the unique values of the variable Product into the macro variable &VARLIST and that number of values into the macro variable &CT (another easy tip). Within the DO loop, you obtain the names of the variables from the array, and those values from the array then become variable names.

   proc sql noprint;
      select distinct(product) into :varlist separated by ' '
      from one;
      select count(distinct product) into :ct
      from one;
   quit;
 
   …more DATA step statements…
   array myvar(&ct) $ &varlist;
      do i=1 to &ct;
         if product=vname(myvar(i)) then do;
         myvar(i)=left(put(contract,8.));
   end;
   …more DATA step statements…

Using a mathematical equation in a DO loop

A typical DO loop has a beginning and an end, both represented by integers. Did you know you can use an equation to process data more easily? Suppose that you want to process every four observations as one unit. You can run code similar to the following:

   …more DATA step statements…
   J+1;
   do i=((j*4)-3) to (j*4);
      set data-set-name point=I;
      …more DATA step statements…
   end;

Using an equation to point to an array element

With small data sets, you might want to put all values for all observations into a single observation. Suppose that you have a data set with four variables and six observations. You can create an array to hold the existing variables and also create an array for the new variables. Here is partial code to illustrate how the equation dynamically points to the correct new variable name in the array:

   array old(4) a b c d;
   array test (24) var1-var24;
   retain var1-var24;
   do i=1 to nobs;
      set w nobs=nobs point=i;
      do j=1 to 4;
      test(((i-1)*4)+j)=old(j);
      end;
   end;

Creating a hexadecimal reference chart

How often have you wanted to know the hexadecimal equivalent for a character or vice versa? Sure, you can look up a reference chart online, but you can also create one with a short program. The BYTE function returns values in your computer’s character set. The PUT statement writes the decimal value, hexadecimal value, and equivalent character to the SAS log:

   data a;
      do i=0 to 255;
      x=byte(i);
      put i=z3. i=hex2. x=;
   end;
   run;

When the resulting character is unprintable, such as a carriage return and line feed (CRLF) character, the X value is blank.

Conclusion

Hopefully, these new tips will be useful in your future programming. If you know some additional tips, please comment them below so that readers of this blog can add even more DATA step tips to their arsenal! If you would like to learn more about DATA step, check out these other blogs:

Old reliable: DATA step tips and tricks was published on SAS Users.

12月 092019
 

Building on my last post, How to create checklist tables in SAS®, this one shows you how to compare SAS data Check mark and cross mark sets that include common and uncommon columns. You'll learn how to visualize side-by-side columns commonalities and differences in data tables.

As before, we're working with a comparison matrix (aka checklist table) where check-marks / x-marks indicate included / excluded columns.

Data tables will be comparable products while their columns (variables) will represent product features. We'll add background color to highlight which attributes are different in the common columns. Since there might be several different attributes for a given column, we will use a hierarchy typelengthlabel to indicate only the highest mismatched level of hierarchy. For example:

  • If same-named columns have different type (Numeric vs. Character), their corresponding check-mark will be shown on a light-red background, which indicates the highest degree of mismatch.
  • If same-named columns have the same type, a yellow background will indicate any difference in variables length.
  • When same-named variables type and length match, a light-blue background marks any difference in variables label.

SAS code to create color-enhanced comparison matrix

Let’s compare variable attributes in two data tables: one is SAS-supplied SASHELP.CARS, and another WORK.NEWCARS that I derive from the first one, slightly scrambling its column definitions:

data WORK.NEWCARS (drop=temp:);
   set SASHELP.CARS (rename=(Origin=Region EngineSize=temp1 Make=temp2));
   length EngineSize $3 Make $20;
   EngineSize = put(temp1,3.1);
   Make = temp2; 
   label Type='New Car Type';
run;

In this NEWCARS data table, I did the following:

  • Replaced column name Origin with Region
  • Changed type of column EngineSize from Numeric to Character
  • Changed length of column Make from $13 to $20
  • Changed label of column Type from blank to “New Car Type”

Now let’s build the comparison matrix:

proc contents data=SASHELP.CARS noprint out=DS1(keep=Name Type Length Label);
run;
 
proc contents data=WORK.NEWCARS noprint out=DS2(keep=Name Type Length Label);
run;
 
data comparison_matrix;
   merge
      DS1(in=in1 rename=(Type=Typ1 Length=Len1 Label=Lab1))
      DS2(in=in2 rename=(Type=Typ2 Length=Len2 Label=Lab2));
   by Name;
 
   /* set symbol shape: 1=V; 0=X */
   ds1 = 1; ds2 = 1;
   if in1 and not in2 then ds2 = 0; else
   if in2 and not in1 then ds1 = 0;
 
   /* add background color */
   if ds1=ds2=1 then
   select;
      when(Typ1^=Typ2) do; ds1=2; ds2=2; end;
      when(Len1^=Len2) do; ds1=3; ds2=3; end;
      when(Lab1^=Lab2) do; ds1=4; ds2=4; end;
      otherwise; 
   end;
 
   label
      Name = 'Column Name'
      ds1 = 'SASHELP.CARS'
      ds2 = 'WORK.NEWCARS'
      ;
run;
 
proc format;
   value chmark
      0   = '(*ESC*){unicode "2718"x}'
      1-4 = '(*ESC*){unicode "2714"x}'
      ;
   value chcolor
      0   = red
      1-4 = green
      ;
   value bgcolor
      2 = 'cxffccbb'
      3 = 'cxffe177'
      4 = 'cxd4f8d4' 
      ;
run;
 
ods html path='c:\temp' file='comp_marix.html' style=Seaside;
ods escapechar='^';
title 'Data set columns comparison matrix';
 
proc odstext;
   p '<div align="center">Mismatch Legend:'||
     '<span style="background-color:#ffccbb;margin-left:17px">^_^_^_^_</span> Type'||
     '<span style="background-color:#ffe177;margin-left:17px">^_^_^_^_</span> Length'||
     '<span style="background-color:#d4f8d4;margin-left:17px">^_^_^_^_</span> Label</div>'
   / style=[fontsize=9pt];
run;
 
title; 
proc print data=comparison_matrix label noobs;
   var Name / style={fontweight=bold width=100px};
   var ds1 ds2 / style={color=chcolor. backgroundcolor=bgcolor. just=center fontweight=bold width=120px};
   format ds1 ds2 chmark.;
run;
 
ods html close;

Here is a brief explanation of the code:

  1. Two PROC CONTENTS produce alphabetical lists (as datasets) of the data table column names, as well as their attributes (type, length, label)
  2. The DATA STEP merges these 2 lists and creates DS1 and DS2 variables indicating common name (values 1, 2, 3, 4) or uncommon name (value 0).
  3. PROC FORMAT creates 3 user-defined formats chmark, chcolor, bgcolor responsible for checkmark shape, checkmark color, and background color respectively. For checkmark shape, we use Unicode characters, and for colors we use both, color names (e.g. red, green) and hexadecimal RGB color notations (e.g. 'cxFFCCBB').
  4. PROC ODSTEXT’s P statement is used to display color legend for the comparison matrix.
  5. Finally, PROC PRINT with user-defined formats produces our color-enhanced comparison matrix.

Data tables comparison matrix – OUTPUT

The above code will generate the following HTML output with the comparison matrix for variables in two data sets:

Comparison matrix for common/uncommon variables in 2 datasets

Adding more detail to the comparison matrix chart

We can further enhance our output comparison matrix by adding detailed descriptive information about differences between variable attributes. For comprehensive view, we can add a COMMENTS column that spells out differences (attributes mismatches). In addition to the hierarchical logic defining only one mismatch of the highest degree indicated by color highlighting above, comments can include all found discrepancies. Simply add the following two pieces of SAS code:

1. Add the following group of statements to the above DATA Step (right after SELECT statement):

 length Comments $200;
   if ds1>1 then
   do;
      if Typ1^=Typ2 then Comments = catx(' ', Comments, 'Type1=',   Typ1, '; Type2=',   Typ2, ';');
      if Len1^=Len2 then Comments = catx(' ', Comments, 'Length1=', Len1, '; Length2=', Len2, ';');
      if Lab1^=Lab2 then Comments = catx(' ', Comments, 'Label1=',  Lab1, '; Label2=',  Lab2, ';');
   end;

Depending on your needs this Comments can be added unconditionally – you would just need to remove IF-THEN logic keeping only:

length Comments $200;
Comments = catx(' ', Comments, 'Type1=',   Typ1, '; Type2=',   Typ2, ';');
Comments = catx(' ', Comments, 'Length1=', Len1, '; Length2=', Len2, ';');
Comments = catx(' ', Comments, 'Label1=',  Lab1, '; Label2=',  Lab2, ';');

2. Add the following statement to the above PROC PRINT (right before the FORMAT statement):

var comments / style={width=250px};

Then your HTML output will look as follows:

Detailed comparison matrix for common/uncommon variables in 2 datasets

Conclusion

Comparison matrix charts are a convenient tool for data development and metadata validation when you're comparing a data table’s metadata against requirements descriptions.

It allows us to quickly identify tables’ common and uncommon variables, as well as common variable inconsistencies by type, length and other attributes, such as labels and formats.

We can easily add detailed descriptive information when needed.

On a related note

While this post focused on visualizing SAS data sets comparison vis-à-vis common and uncommon columns, it's worth noting SAS websites have plenty of info on finding common variables (or columns) in data sets. For example:

Your thoughts?

Do you find this material useful? What other usages of the checklist tables and color-enhanced comparison matrices can you suggest?

How to compare SAS data tables for common/uncommon columns was published on SAS Users.

12月 042019
 

If you’re like me, you struggle to buy gifts. Most folks in my inner circle already have everything they need and most of what they want. Most folks, that is, except the tech-lovers. That’s because there’s always something new on the horizon. There’s always a new gadget or program. Or a new way to learn those things. If you know folks who fall into this category, and who love SAS, boy do I have some gift ideas for you:

Try SAS for free

If you know someone who is interested in SAS but doesn’t work with it on a daily basis, or someone looking to experiment in a different area of SAS, let them know about our free software trials. Who doesn’t love free? And you’ll look like a rock star when you point out that they can explore areas like SAS Data Preparation, SAS Visual Forecasting, SAS Visual Statistics and lots more, for free.

SAS University Edition

Let’s keep things in the vein of free, shall we? SAS University Edition includes SAS Studio, Base SAS, SAS/STAT, SAS/IML, SAS/ACCESS, and several time series forecasting procedures from SAS/ETS. It’s the same software used by sites around the world; that means it’s the most up-to-date statistical and quantitative methods. And it’s free for academic, noncommercial use.

Registration to SAS Global Forum 2020

Oh my gosh, you would be your loved one’s favorite person! I’ve been to many SAS Global Forums and I love seeing the excitement on someone’s face as they explore the booths, sit in on sessions, and network to their heart’s content. SAS Global Forum is the place to be for SAS professionals, thought leaders, decision makers, partners, students, and academics. There truly is something for everyone at this event for analytics enthusiasts. And who wouldn’t want to be in Washington, D.C. in the Spring? It’s cherry blossom time! Be sure to register before Jan. 29, 2020 to get early bird pricing and save $600 off the on-site registration fee.

SAS books

I worked for many years in SAS Press and saw, firsthand, how excited folks get over a new book. I mean, who doesn’t want the 6th edition of the Little SAS Book or the latest from Ron Cody? Browse more than 100 titles and find the perfect gift. Don’t forget, SAS Press is offering 25% off everything in the bookstore for the month of December! Use promo code HOLIDAYS25 at checkout when placing your order. Offer expires at midnight Tuesday, December 31, 2019 (US).

A SAS Learning subscription

Give the gift of unlimited learning with access to our selection of e-learning and video tutorials. You can give a monthly or yearly subscription. Help the special people in your life get a head start on a new career. The gift of education is one of the best gifts you can give.

SAS OnDemand for Academics

For those who don't want to install anything, but run SAS in the cloud, we offer SAS OnDemand for Academics. Users get free access to powerful SAS software for statistical analysis, data mining and forecasting. Point-and-click functionality means there’s no need to program. But if you like to program, you do can do that, too! It’s really the best of both worlds.

SAS blogs

I began this post with something free, and I’ll end it with something free: learning from our SAS blogs. Pick from areas such as customer intelligence, operations research, data science and more. You can also read content from specific regions such as Korea, Latin America, Japan and others. To get you started, here are a few posts from three of our most popular blog authors:

Chris Hemedinger, author of The SAS Dummy:

Rick Wicklin, author of The DO Loop:

Robert Allison, major contributor to Graphically Speaking:

Happy holidays and happy learning. Now go help someone geek out. And I won’t tell if you purchase a few of these things for yourself!

Gifts to give the SAS fan in your life was published on SAS Users.

12月 042019
 

If you’re like me, you struggle to buy gifts. Most folks in my inner circle already have everything they need and most of what they want. Most folks, that is, except the tech-lovers. That’s because there’s always something new on the horizon. There’s always a new gadget or program. Or a new way to learn those things. If you know folks who fall into this category, and who love SAS, boy do I have some gift ideas for you:

Try SAS for free

If you know someone who is interested in SAS but doesn’t work with it on a daily basis, or someone looking to experiment in a different area of SAS, let them know about our free software trials. Who doesn’t love free? And you’ll look like a rock star when you point out that they can explore areas like SAS Data Preparation, SAS Visual Forecasting, SAS Visual Statistics and lots more, for free.

SAS University Edition

Let’s keep things in the vein of free, shall we? SAS University Edition includes SAS Studio, Base SAS, SAS/STAT, SAS/IML, SAS/ACCESS, and several time series forecasting procedures from SAS/ETS. It’s the same software used by sites around the world; that means it’s the most up-to-date statistical and quantitative methods. And it’s free for academic, noncommercial use.

Registration to SAS Global Forum 2020

Oh my gosh, you would be your loved one’s favorite person! I’ve been to many SAS Global Forums and I love seeing the excitement on someone’s face as they explore the booths, sit in on sessions, and network to their heart’s content. SAS Global Forum is the place to be for SAS professionals, thought leaders, decision makers, partners, students, and academics. There truly is something for everyone at this event for analytics enthusiasts. And who wouldn’t want to be in Washington, D.C. in the Spring? It’s cherry blossom time! Be sure to register before Jan. 29, 2020 to get early bird pricing and save $600 off the on-site registration fee.

SAS books

I worked for many years in SAS Press and saw, firsthand, how excited folks get over a new book. I mean, who doesn’t want the 6th edition of the Little SAS Book or the latest from Ron Cody? Browse more than 100 titles and find the perfect gift. Don’t forget, SAS Press is offering 25% off everything in the bookstore for the month of December! Use promo code HOLIDAYS25 at checkout when placing your order. Offer expires at midnight Tuesday, December 31, 2019 (US).

A SAS Learning subscription

Give the gift of unlimited learning with access to our selection of e-learning and video tutorials. You can give a monthly or yearly subscription. Help the special people in your life get a head start on a new career. The gift of education is one of the best gifts you can give.

SAS OnDemand for Academics

For those who don't want to install anything, but run SAS in the cloud, we offer SAS OnDemand for Academics. Users get free access to powerful SAS software for statistical analysis, data mining and forecasting. Point-and-click functionality means there’s no need to program. But if you like to program, you do can do that, too! It’s really the best of both worlds.

SAS blogs

I began this post with something free, and I’ll end it with something free: learning from our SAS blogs. Pick from areas such as customer intelligence, operations research, data science and more. You can also read content from specific regions such as Korea, Latin America, Japan and others. To get you started, here are a few posts from three of our most popular blog authors:

Chris Hemedinger, author of The SAS Dummy:

Rick Wicklin, author of The DO Loop:

Robert Allison, major contributor to Graphically Speaking:

Happy holidays and happy learning. Now go help someone geek out. And I won’t tell if you purchase a few of these things for yourself!

Gifts to give the SAS fan in your life was published on SAS Users.

12月 042019
 

Site relaunches with improved content, organization and navigation.

In 2016, a cross-divisional SAS team created developer.sas.com. Their mission: Build a bridge between SAS (and our software) and open source developers.

The initial effort made available basic information about SAS® Viya® and integration with open source technologies. In June 2018, the Developer Advocate role was created to build on that foundation. Collaborating with many of you, the SAS Communities team has improved the site by clarifying its scope and updating it consistently with helpful content.

Design is an iterative process. One idea often builds on another.

-- businessman Mark Parker

The team is happy to report that recently developer.sas.com relaunched, with marked improvements in content, organization and navigation. Please check it out and share with others.

New overview page on developer.sas.com

The developer experience

The developer experience goes beyond the developer.sas.com portal. The Q&A below provides more perspective and background.

What is the developer experience?

Think of the developer experience (DX) as equivalent to the user experience (UX), only the developer interacts with the software through code, not points and clicks. Developers expect and require an easy interface to software code, good documentation, support resources and open communication. All this interaction occurs on the developer portal.

What is a developer portal?

The white paper Developer Portal Components captures the key elements of a developer portal. Without going into detail, the portal must contain (or link to) these resources: an overview page, onboarding pages, guides, API reference, forums and support, and software development kits (SDKs). In conjunction with the Developers Community, the site’s relaunch includes most of these items.

Who are these developers?

Many developers fit somewhere in these categories:

  • Data scientists and analysts who code in open source languages (mainly Python and R in this case).
  • Web application developers who create apps that require data and processing from SAS.
  • IT service admins who manage customer environments.

All need to interact with SAS but may not have written SAS code. We want this population to benefit from our software.

What is open source and how is SAS involved?

Simply put, open source software is just what the name implies: the source code is open to all. Many of the programs in use every day are based on open source technologies: operating systems, programming languages, web browsers and servers, etc. Leveraging open source technologies and integrating them with commercial software is a popular industry trend today. SAS is keeping up with the market by providing tools that allow open source developers to interact with SAS software.

What is an API?

All communications between open source and SAS are possible through APIs, or application programming interfaces. APIs allow software systems to communicate with one another. Software companies expose their APIs so developers can incorporate functionality and send or request data from the software.

Why does SAS care about APIs?

APIs allow the use of SAS analytics outside of SAS software. By allowing developers to communicate with SAS through APIs, customer applications easily incorporate SAS functions. SAS has created various libraries to aid in open source integration. These tools allow developers to code in the language of their choice, yet still interface with SAS. Most of these tools exist on github.com/sassoftware or on the REST API guides page.

A use case for SAS APIs

A classic use of SAS APIs is for a loan default application. A bank creates a model in SAS that determines the likelihood of a customer defaulting on a loan based on multiple factors. The bank also builds an application where a bank representative enters the information for a new potential customer. The bank application code uses APIs to communicate this information to the SAS model and return a credit decision.

What is a developer advocate?

A developer advocate is someone who helps developers succeed with a platform or technology. Their role is to act as a bridge between the engineering team and the developer community. At SAS, the developer advocate fields questions and comments on the Developers Community and works with R&D to provide answers. The administration of developer.sas.com also falls under the responsibility of the developer advocate.

We’re not done

The site will continue to evolve, with additions of other SAS products and offerenings, and other initiatives. Check back often to see what’s new.
Now that you are an open source and SAS expert, please check out the new developer.sas.com. We encourage feedback and suggestions for content. Leave comments and questions on the site or contact Joe Furbee: joe.furbee@sas.com.

developer.sas.com 2.0: More than just a pretty interface was published on SAS Users.

11月 282019
 

With time series data analysis, we can apply moving average methods to predict data points without seasonality. This includes Simple Average (SA), Simple Moving Average (SMA), Weighted Moving Average (WMA), Exponential Moving Average (EMA), etc. For series with a trend but without seasonality, we can use linear, non-linear and autoregressive prediction models. In practice, we often use these kinds of moving average methods for series data with large variance, e.g. financial and stock market data.

Since introducing the window concept, SMA reflects historical data with a lag problem, and all points in the same window have the same importance to the current point. Weights are introduced to reflect the different importances of points in a window, and it leads to WMA and EMA. WMA gives greater weight to recent observations while less weight to the longer-term observations. The weights decrease in a linear fashion. If the sequence fluctuations are not large, the same weight is given to the observation and WMA degenerates into a SMA and the WMA becomes an EMA when weights decrease exponentially. However, these traditional moving average methods ave a lag problem due to introducing the smooth window concept and its results are less responsive to the current value. To gain a better balance between lag reduction and curve smoothing in a moving average, we must get help from some smarter mathematics algorithm.

What is Hull Moving Average?

Hull Moving Average (HMA) is a fast moving average method with low lag developed by Australia investor Alan Hull in 2005. It’s known to almost eliminate lag altogether and manages to improve smoothing at the same time. Longer period HMAs can be used as an indicator to identify trend, and shorter period HMAs can be used as entry signals in the direction of the prevailing trend. Due to lag problems in all moving average methods, it’s not suggested to be used as a reverse signal alone. Anyway, we can use different window HMAs to build a more robust trading strategy. E.g., 52-week HMA for trend indicator, and 13-week HMA for entry signal. Here is how it works:

  • Long period HMA identifies the trend: rising HMA indicates the prevailing trend is rising, it’s better to enter long positions; falling HMA indicate the prevailing trend is falling, it’s better to enter short positions.
  • Short period HMA identifies the entry signals in the direction of the prevailing trend: When the prevailing trend is rising, HMA goes up to indicate a long entry signal. When the prevailing trend is falling, HMA goes down to indicate a short entry signal.

Both long & short period HMAs in bullish mode generate Long Signal, and both in bearish mode generate Short Signal. So Long Trades get a buy signal after a Long Signal occurs, while Short Trades get a sell signal after a Short Signal occurs. Long Trades get a Sell Signal when Long or Short Trend is no longer bullish and Short Trades get a buy signal when Long or Short trend is no longer bearish. Figure 1 is a sample for this HMA based trading strategy.

In fact, HMA is quite simple. Its calculation only includes three WMA steps:

  1. Calculate WMA of original sequence X using window length n ;
  2. Calculate WMA of X using window length n/2;
  3. Calculate WMA of derived sequence Y using window length Sqrt(n), while Y is the new series of 2 * WMA(x, n/2) – WMA(x, n). So HMA(x, n) formula can be defined as below:

HMA(x, n) = WMA( 2*WMA(x, n/2) − WMA(x, n), sqrt(n) )

HMA makes the moving average line more responsive to current value and keeps the smoothing of the curve line. It almost eliminates lag altogether and manages to improve smoothing at the same time.

How does HMA achieve the perfect balance?

We know SMA can gain the curve line of an average value for historical data falling in the current window, but it has the constant lag of at least a half window. The output may also have poor smoothness. If we embed SMA multiple times to get the average of the averages, it would be smoother but would increase lag at the same time.

Suppose we have this series: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. The SMA at the last point is 5.5 with window length n=10, i.e. SUM[1...10]/10
is much different than the real value 10. If we reduce the window length to n=5, the SMA at the last point is 8, i.e. SUM[6…10]/5. If we have the second SMA with half size window and add the difference between these two SMAs, then we can get 8 + (8-5.5)=10.5, which is very close to the real value 10 and will eliminate lag. Then we can use SMA with specific window length again to reduce that slight overcompensation and improve smoothness. HMA uses linear WMA instead of SMA, and the last window length is sqrt(n). This is the key trick with using HMA.

Figure 2 below shows the difference between HMA(x, 4) and embed SMA twice SMA( SMA(x,4),4).

Figure 2: Comparison of SMA( SMA(x, 4), 4) and HMA(x, 4).

In stock trading, there are lots of complicated technical indicators derived from stock historical data, such as Ease of Movement indicators (EMV), momentum indicators (MTM), MACD indicators, Energy indicator CR, KDJ indicators, Bollinger (BOL) and so on. Its essence is like feature engineering extraction before neural network training. The fundamental purpose is to find some intrinsic property from price fluctuation. The Hull moving average itself was discovered and applied to real trading practices, which gives us a glimpse to the complication of building an automatic financial trading strategy under the veil. It’s just the beginning, the reflex in trading and aging of models may both bring uncertainty, so the effectiveness of quantitative trading strategies is a very complex topic which needs to be validated through real practice.

HMA in SAS Code

SAS was created for data analysis however, and my colleagues go into detail about how data analysis connects with HMA. Check out Cindy Wang's How to Calculate Hull Moving Average in SAS Visual Analytics and Rick Wicklin's The Hull moving average: Implement a custom time series smoother in SAS. I would like to share how to implement both HMA/WMA with only 25 lines of code in this article. You can use the macro anywhere to build HMA series with specific window lengths.

First, implement Weighted Moving Average (WMA) with SAS macro as shown below (only 11 lines). The macro has four arguments:

  • WMA_N       WMA window length
  • INPUT          Input variable name
  • OUT             Output variable name, default is input name with _WMA_n suffix
  • OUTLABEL The label of output variable, default is “input name WMA(n)
%macro AddWMA(WMA_N=4, INPUT=close, OUT=&INPUT._WMA_&WMA_N., OUTLABEL=&INPUT. WMA(&WMA_N.));
  retain _sumx_&INPUT._WMA_&WMA_N.;
  _sumx_&INPUT._WMA_&WMA_N.=sum (_sumx_&INPUT._WMA_&WMA_N., &INPUT., -lag&WMA_N.( &INPUT. )) ;
 
  retain _sum_&INPUT._WMA_&WMA_N.;
  _sum_&INPUT._WMA_&WMA_N.=sum (_sum_&INPUT._WMA_&WMA_N., &WMA_N * &INPUT., -lag( _sumx_&INPUT._WMA_&WMA_N.  )) ;
 
  retain _sumw_&INPUT._WMA_&WMA_N.;
  if _N_ <= &WMA_N then _sumw_&INPUT._WMA_&WMA_N. = sum (_sumw_&INPUT._WMA_&WMA_N., &WMA_N, -_N_, 1);
 
  &OUT=_sum_&INPUT._WMA_&WMA_N. / _sumw_&INPUT._WMA_&WMA_N.;
 
  drop _sumx_&INPUT._WMA_&WMA_N. _sumw_&INPUT._WMA_&WMA_N. _sum_&INPUT._WMA_&WMA_N.;
  label &OUT="&OUTLABEL"; 
%mend;

We must watch out for the trick in the upper implementation of WMA with linear weights. In fact, there is no do-loop in the code, and it also doesn’t perform repeat computations for weights and values. We just keep the last sum of all values in the window (See Ln2-3), and the sum for WMA is the last sum for WMA plus window length times the value subtracted from it (See Ln4-5). For weight sum computation, we just need to pay attention to the points less than the window length(See Ln6-7).

Second, implement HMA with three times WMA invocation as below, (only 9 lines). The macro has four arguments:

  • HMA_N       HMA window length
  • INPUT         Input variable name
  • OUT            Output variable name, default is input name with _HMA_n suffix
  • OUTLABEL The label of output variable, default is “input name HMA(n)
%macro AddHMA(HMA_N=5, INPUT=close, OUT=&INPUT._HMA_&HMA_N., outlabel=&INPUT. HMA(&HMA_N.));
  %AddWMA(WMA_N=&HMA_N, INPUT=&INPUT.);
 
  %AddWMA(WMA_N=%sysfunc(round(&HMA_N./2)), INPUT=&INPUT.);
 
  &INPUT._HMA_&HMA_N._DELTA=2 * &INPUT._WMA_%sysfunc(round(&HMA_N./2))  - &INPUT._WMA_&HMA_N;
  %AddWMA(WMA_N=%sysfunc(round(%sysfunc(sqrt(&HMA_N)))), INPUT=&INPUT._HMA_&HMA_N._DELTA);
 
  rename &INPUT._HMA_&HMA_N._DELTA_WMA_%sysfunc(round(%sysfunc(sqrt(&HMA_N.))))=&OUT;
  label &INPUT._HMA_&HMA_N._DELTA_WMA_%sysfunc(round(%sysfunc(sqrt(&HMA_N.))))="&outlabel"; 
 
  drop &INPUT._WMA_&HMA_N &INPUT._WMA_%sysfunc(round(&HMA_N./2)) &INPUT._HMA_&HMA_N._DELTA;
%mend;

The upper code is very intuitive as the first two lines perform WMA with window n and n/2 for X, then the next two lines generate new series Y and performs WMA with window sqrt(n). Other codes are just to improve macro flexibility and to drop temp variables, etc.

To compare HMA with SMA result, we must also implement SMA with the 7 lines of code as shown below. It has the same argument as WMA/HMA above, but SMA is an unnecessary part of HMA macro %ADDHMA implmentation.

 
%macro AddMA(MA_N=5, INPUT=close, out=&INPUT._MA_&MA_N., outlabel=&INPUT. MA(&MA_N.));
  retain _sum_&INPUT._MA_&MA_N.;
  _sum_&INPUT._MA_&MA_N.=sum (_sum_&INPUT._MA_&MA_N., &INPUT., -lag&MA_N.( &INPUT. )) ;
  &out=_sum_&INPUT._MA_&MA_N. / min(_n_, &MA_N.);
 
  drop _sum_&INPUT._MA_&MA_N.;
  label &out="&outlabel";
%mend;

Now we can test all the above implementations as shown below. First is a simple case we described above:

data a;
  input x @@;
  %AddMA( input=x, MA_N=5, out=SMA, outlabel=%str(SMA)); 
  %AddHMA( input=x, HMA_N=5, out=HMA, outlabel=%str(HMA))  
datalines;
1 2 3 4 5 6 7 8 9 10
;
proc print data=a label;
  format _all_ 8.2;
run;

The output is listed below, the HMA result is much more close to the real value x than SMA. It’s more responsive and less in lag.

Figure 3: HMA output sample

Now we can use the HMA macro to check IBM stock data in the SAS system dataset SASHELP.Stocks and draw trends in the plot to compare HMA(x, 5) and SMA(x, 5) output.

data Stocks_HMA;
  set sashelp.Stocks(where=(stock='IBM'));
  %AddHMA( HMA_N=5, out=HMA, outlabel=%str(HMA))
  %AddMA(MA_N=5, out=SMA, outlabel=%str(SMA));
run; 
ods graphics / width=800px height=300px;
title "Comparison of Simple and Hull Moving Average";
proc sgplot data=Stocks_HMA noautolegend;
   highlow x=Date high=High low=Low / lineattrs=(color=LightGray);
   scatter x=Date y=Close / markerattrs=(color=LightGray);
   series x=Date y=SMA / curvelabel curvelabelattrs=(color=DarkGreen) lineattrs=(color=DarkGreen);
   series x=Date y=HMA / curvelabel curvelabelattrs=(color=DarkOrange) lineattrs=(color=DarkOrange);
   refline '01JUN1992'd / axis=x label='June 1992' labelloc=inside;
   xaxis grid display=(nolabel); 
   yaxis grid max=200 label="IBM Closing Price";
run;

Figure 4: Comparison of SMA and HMA

Summary

We have talked about what Hull Moving Average is, how it works, and worked through a super simple SAS Macro implementation for HMA, WMA and SMA with only 25 lines of SAS code. This HMA implementation makes it quite simple and efficient to calculate HMA in SAS Data step without SAS/IML and Visual Analytics. In fact, it shows how easy process rolling-window computation for time series data can be in SAS, and opens a new window to build complicated technical indicators for stock trading systems yourself.

Learn more

Implementing HMA, WMA & SMA with 25 lines of SAS code was published on SAS Users.

11月 282019
 

With time series data analysis, we can apply moving average methods to predict data points without seasonality. This includes Simple Average (SA), Simple Moving Average (SMA), Weighted Moving Average (WMA), Exponential Moving Average (EMA), etc. For series with a trend but without seasonality, we can use linear, non-linear and autoregressive prediction models. In practice, we often use these kinds of moving average methods for series data with large variance, e.g. financial and stock market data.

Since introducing the window concept, SMA reflects historical data with a lag problem, and all points in the same window have the same importance to the current point. Weights are introduced to reflect the different importances of points in a window, and it leads to WMA and EMA. WMA gives greater weight to recent observations while less weight to the longer-term observations. The weights decrease in a linear fashion. If the sequence fluctuations are not large, the same weight is given to the observation and WMA degenerates into a SMA and the WMA becomes an EMA when weights decrease exponentially. However, these traditional moving average methods ave a lag problem due to introducing the smooth window concept and its results are less responsive to the current value. To gain a better balance between lag reduction and curve smoothing in a moving average, we must get help from some smarter mathematics algorithm.

What is Hull Moving Average?

Hull Moving Average (HMA) is a fast moving average method with low lag developed by Australia investor Alan Hull in 2005. It’s known to almost eliminate lag altogether and manages to improve smoothing at the same time. Longer period HMAs can be used as an indicator to identify trend, and shorter period HMAs can be used as entry signals in the direction of the prevailing trend. Due to lag problems in all moving average methods, it’s not suggested to be used as a reverse signal alone. Anyway, we can use different window HMAs to build a more robust trading strategy. E.g., 52-week HMA for trend indicator, and 13-week HMA for entry signal. Here is how it works:

  • Long period HMA identifies the trend: rising HMA indicates the prevailing trend is rising, it’s better to enter long positions; falling HMA indicate the prevailing trend is falling, it’s better to enter short positions.
  • Short period HMA identifies the entry signals in the direction of the prevailing trend: When the prevailing trend is rising, HMA goes up to indicate a long entry signal. When the prevailing trend is falling, HMA goes down to indicate a short entry signal.

Both long & short period HMAs in bullish mode generate Long Signal, and both in bearish mode generate Short Signal. So Long Trades get a buy signal after a Long Signal occurs, while Short Trades get a sell signal after a Short Signal occurs. Long Trades get a Sell Signal when Long or Short Trend is no longer bullish and Short Trades get a buy signal when Long or Short trend is no longer bearish. Figure 1 is a sample for this HMA based trading strategy.

In fact, HMA is quite simple. Its calculation only includes three WMA steps:

  1. Calculate WMA of original sequence X using window length n ;
  2. Calculate WMA of X using window length n/2;
  3. Calculate WMA of derived sequence Y using window length Sqrt(n), while Y is the new series of 2 * WMA(x, n/2) – WMA(x, n). So HMA(x, n) formula can be defined as below:

HMA(x, n) = WMA( 2*WMA(x, n/2) − WMA(x, n), sqrt(n) )

HMA makes the moving average line more responsive to current value and keeps the smoothing of the curve line. It almost eliminates lag altogether and manages to improve smoothing at the same time.

How does HMA achieve the perfect balance?

We know SMA can gain the curve line of an average value for historical data falling in the current window, but it has the constant lag of at least a half window. The output may also have poor smoothness. If we embed SMA multiple times to get the average of the averages, it would be smoother but would increase lag at the same time.

Suppose we have this series: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. The SMA at the last point is 5.5 with window length n=10, i.e. SUM[1...10]/10
is much different than the real value 10. If we reduce the window length to n=5, the SMA at the last point is 8, i.e. SUM[6…10]/5. If we have the second SMA with half size window and add the difference between these two SMAs, then we can get 8 + (8-5.5)=10.5, which is very close to the real value 10 and will eliminate lag. Then we can use SMA with specific window length again to reduce that slight overcompensation and improve smoothness. HMA uses linear WMA instead of SMA, and the last window length is sqrt(n). This is the key trick with using HMA.

Figure 2 below shows the difference between HMA(x, 4) and embed SMA twice SMA( SMA(x,4),4).

Figure 2: Comparison of SMA( SMA(x, 4), 4) and HMA(x, 4).

In stock trading, there are lots of complicated technical indicators derived from stock historical data, such as Ease of Movement indicators (EMV), momentum indicators (MTM), MACD indicators, Energy indicator CR, KDJ indicators, Bollinger (BOL) and so on. Its essence is like feature engineering extraction before neural network training. The fundamental purpose is to find some intrinsic property from price fluctuation. The Hull moving average itself was discovered and applied to real trading practices, which gives us a glimpse to the complication of building an automatic financial trading strategy under the veil. It’s just the beginning, the reflex in trading and aging of models may both bring uncertainty, so the effectiveness of quantitative trading strategies is a very complex topic which needs to be validated through real practice.

HMA in SAS Code

SAS was created for data analysis however, and my colleagues go into detail about how data analysis connects with HMA. Check out Cindy Wang's How to Calculate Hull Moving Average in SAS Visual Analytics and Rick Wicklin's The Hull moving average: Implement a custom time series smoother in SAS. I would like to share how to implement both HMA/WMA with only 25 lines of code in this article. You can use the macro anywhere to build HMA series with specific window lengths.

First, implement Weighted Moving Average (WMA) with SAS macro as shown below (only 11 lines). The macro has four arguments:

  • WMA_N       WMA window length
  • INPUT          Input variable name
  • OUT             Output variable name, default is input name with _WMA_n suffix
  • OUTLABEL The label of output variable, default is “input name WMA(n)
%macro AddWMA(WMA_N=4, INPUT=close, OUT=&INPUT._WMA_&WMA_N., OUTLABEL=&INPUT. WMA(&WMA_N.));
  retain _sumx_&INPUT._WMA_&WMA_N.;
  _sumx_&INPUT._WMA_&WMA_N.=sum (_sumx_&INPUT._WMA_&WMA_N., &INPUT., -lag&WMA_N.( &INPUT. )) ;
 
  retain _sum_&INPUT._WMA_&WMA_N.;
  _sum_&INPUT._WMA_&WMA_N.=sum (_sum_&INPUT._WMA_&WMA_N., &WMA_N * &INPUT., -lag( _sumx_&INPUT._WMA_&WMA_N.  )) ;
 
  retain _sumw_&INPUT._WMA_&WMA_N.;
  if _N_ <= &WMA_N then _sumw_&INPUT._WMA_&WMA_N. = sum (_sumw_&INPUT._WMA_&WMA_N., &WMA_N, -_N_, 1);
 
  &OUT=_sum_&INPUT._WMA_&WMA_N. / _sumw_&INPUT._WMA_&WMA_N.;
 
  drop _sumx_&INPUT._WMA_&WMA_N. _sumw_&INPUT._WMA_&WMA_N. _sum_&INPUT._WMA_&WMA_N.;
  label &OUT="&OUTLABEL"; 
%mend;

We must watch out for the trick in the upper implementation of WMA with linear weights. In fact, there is no do-loop in the code, and it also doesn’t perform repeat computations for weights and values. We just keep the last sum of all values in the window (See Ln2-3), and the sum for WMA is the last sum for WMA plus window length times the value subtracted from it (See Ln4-5). For weight sum computation, we just need to pay attention to the points less than the window length(See Ln6-7).

Second, implement HMA with three times WMA invocation as below, (only 9 lines). The macro has four arguments:

  • HMA_N       HMA window length
  • INPUT         Input variable name
  • OUT            Output variable name, default is input name with _HMA_n suffix
  • OUTLABEL The label of output variable, default is “input name HMA(n)
%macro AddHMA(HMA_N=5, INPUT=close, OUT=&INPUT._HMA_&HMA_N., outlabel=&INPUT. HMA(&HMA_N.));
  %AddWMA(WMA_N=&HMA_N, INPUT=&INPUT.);
 
  %AddWMA(WMA_N=%sysfunc(round(&HMA_N./2)), INPUT=&INPUT.);
 
  &INPUT._HMA_&HMA_N._DELTA=2 * &INPUT._WMA_%sysfunc(round(&HMA_N./2))  - &INPUT._WMA_&HMA_N;
  %AddWMA(WMA_N=%sysfunc(round(%sysfunc(sqrt(&HMA_N)))), INPUT=&INPUT._HMA_&HMA_N._DELTA);
 
  rename &INPUT._HMA_&HMA_N._DELTA_WMA_%sysfunc(round(%sysfunc(sqrt(&HMA_N.))))=&OUT;
  label &INPUT._HMA_&HMA_N._DELTA_WMA_%sysfunc(round(%sysfunc(sqrt(&HMA_N.))))="&outlabel"; 
 
  drop &INPUT._WMA_&HMA_N &INPUT._WMA_%sysfunc(round(&HMA_N./2)) &INPUT._HMA_&HMA_N._DELTA;
%mend;

The upper code is very intuitive as the first two lines perform WMA with window n and n/2 for X, then the next two lines generate new series Y and performs WMA with window sqrt(n). Other codes are just to improve macro flexibility and to drop temp variables, etc.

To compare HMA with SMA result, we must also implement SMA with the 7 lines of code as shown below. It has the same argument as WMA/HMA above, but SMA is an unnecessary part of HMA macro %ADDHMA implmentation.

 
%macro AddMA(MA_N=5, INPUT=close, out=&INPUT._MA_&MA_N., outlabel=&INPUT. MA(&MA_N.));
  retain _sum_&INPUT._MA_&MA_N.;
  _sum_&INPUT._MA_&MA_N.=sum (_sum_&INPUT._MA_&MA_N., &INPUT., -lag&MA_N.( &INPUT. )) ;
  &out=_sum_&INPUT._MA_&MA_N. / min(_n_, &MA_N.);
 
  drop _sum_&INPUT._MA_&MA_N.;
  label &out="&outlabel";
%mend;

Now we can test all the above implementations as shown below. First is a simple case we described above:

data a;
  input x @@;
  %AddMA( input=x, MA_N=5, out=SMA, outlabel=%str(SMA)); 
  %AddHMA( input=x, HMA_N=5, out=HMA, outlabel=%str(HMA))  
datalines;
1 2 3 4 5 6 7 8 9 10
;
proc print data=a label;
  format _all_ 8.2;
run;

The output is listed below, the HMA result is much more close to the real value x than SMA. It’s more responsive and less in lag.

Figure 3: HMA output sample

Now we can use the HMA macro to check IBM stock data in the SAS system dataset SASHELP.Stocks and draw trends in the plot to compare HMA(x, 5) and SMA(x, 5) output.

data Stocks_HMA;
  set sashelp.Stocks(where=(stock='IBM'));
  %AddHMA( HMA_N=5, out=HMA, outlabel=%str(HMA))
  %AddMA(MA_N=5, out=SMA, outlabel=%str(SMA));
run; 
ods graphics / width=800px height=300px;
title "Comparison of Simple and Hull Moving Average";
proc sgplot data=Stocks_HMA noautolegend;
   highlow x=Date high=High low=Low / lineattrs=(color=LightGray);
   scatter x=Date y=Close / markerattrs=(color=LightGray);
   series x=Date y=SMA / curvelabel curvelabelattrs=(color=DarkGreen) lineattrs=(color=DarkGreen);
   series x=Date y=HMA / curvelabel curvelabelattrs=(color=DarkOrange) lineattrs=(color=DarkOrange);
   refline '01JUN1992'd / axis=x label='June 1992' labelloc=inside;
   xaxis grid display=(nolabel); 
   yaxis grid max=200 label="IBM Closing Price";
run;

Figure 4: Comparison of SMA and HMA

Summary

We have talked about what Hull Moving Average is, how it works, and worked through a super simple SAS Macro implementation for HMA, WMA and SMA with only 25 lines of SAS code. This HMA implementation makes it quite simple and efficient to calculate HMA in SAS Data step without SAS/IML and Visual Analytics. In fact, it shows how easy process rolling-window computation for time series data can be in SAS, and opens a new window to build complicated technical indicators for stock trading systems yourself.

Learn more

Implementing HMA, WMA & SMA with 25 lines of SAS code was published on SAS Users.

11月 282019
 

With time series data analysis, we can apply moving average methods to predict data points without seasonality. This includes Simple Average (SA), Simple Moving Average (SMA), Weighted Moving Average (WMA), Exponential Moving Average (EMA), etc. For series with a trend but without seasonality, we can use linear, non-linear and autoregressive prediction models. In practice, we often use these kinds of moving average methods for series data with large variance, e.g. financial and stock market data.

Since introducing the window concept, SMA reflects historical data with a lag problem, and all points in the same window have the same importance to the current point. Weights are introduced to reflect the different importances of points in a window, and it leads to WMA and EMA. WMA gives greater weight to recent observations while less weight to the longer-term observations. The weights decrease in a linear fashion. If the sequence fluctuations are not large, the same weight is given to the observation and WMA degenerates into a SMA and the WMA becomes an EMA when weights decrease exponentially. However, these traditional moving average methods ave a lag problem due to introducing the smooth window concept and its results are less responsive to the current value. To gain a better balance between lag reduction and curve smoothing in a moving average, we must get help from some smarter mathematics algorithm.

What is Hull Moving Average?

Hull Moving Average (HMA) is a fast moving average method with low lag developed by Australia investor Alan Hull in 2005. It’s known to almost eliminate lag altogether and manages to improve smoothing at the same time. Longer period HMAs can be used as an indicator to identify trend, and shorter period HMAs can be used as entry signals in the direction of the prevailing trend. Due to lag problems in all moving average methods, it’s not suggested to be used as a reverse signal alone. Anyway, we can use different window HMAs to build a more robust trading strategy. E.g., 52-week HMA for trend indicator, and 13-week HMA for entry signal. Here is how it works:

  • Long period HMA identifies the trend: rising HMA indicates the prevailing trend is rising, it’s better to enter long positions; falling HMA indicate the prevailing trend is falling, it’s better to enter short positions.
  • Short period HMA identifies the entry signals in the direction of the prevailing trend: When the prevailing trend is rising, HMA goes up to indicate a long entry signal. When the prevailing trend is falling, HMA goes down to indicate a short entry signal.

Both long & short period HMAs in bullish mode generate Long Signal, and both in bearish mode generate Short Signal. So Long Trades get a buy signal after a Long Signal occurs, while Short Trades get a sell signal after a Short Signal occurs. Long Trades get a Sell Signal when Long or Short Trend is no longer bullish and Short Trades get a buy signal when Long or Short trend is no longer bearish. Figure 1 is a sample for this HMA based trading strategy.

In fact, HMA is quite simple. Its calculation only includes three WMA steps:

  1. Calculate WMA of original sequence X using window length n ;
  2. Calculate WMA of X using window length n/2;
  3. Calculate WMA of derived sequence Y using window length Sqrt(n), while Y is the new series of 2 * WMA(x, n/2) – WMA(x, n). So HMA(x, n) formula can be defined as below:

HMA(x, n) = WMA( 2*WMA(x, n/2) − WMA(x, n), sqrt(n) )

HMA makes the moving average line more responsive to current value and keeps the smoothing of the curve line. It almost eliminates lag altogether and manages to improve smoothing at the same time.

How does HMA achieve the perfect balance?

We know SMA can gain the curve line of an average value for historical data falling in the current window, but it has the constant lag of at least a half window. The output may also have poor smoothness. If we embed SMA multiple times to get the average of the averages, it would be smoother but would increase lag at the same time.

Suppose we have this series: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. The SMA at the last point is 5.5 with window length n=10, i.e. SUM[1...10]/10
is much different than the real value 10. If we reduce the window length to n=5, the SMA at the last point is 8, i.e. SUM[6…10]/5. If we have the second SMA with half size window and add the difference between these two SMAs, then we can get 8 + (8-5.5)=10.5, which is very close to the real value 10 and will eliminate lag. Then we can use SMA with specific window length again to reduce that slight overcompensation and improve smoothness. HMA uses linear WMA instead of SMA, and the last window length is sqrt(n). This is the key trick with using HMA.

Figure 2 below shows the difference between HMA(x, 4) and embed SMA twice SMA( SMA(x,4),4).

Figure 2: Comparison of SMA( SMA(x, 4), 4) and HMA(x, 4).

In stock trading, there are lots of complicated technical indicators derived from stock historical data, such as Ease of Movement indicators (EMV), momentum indicators (MTM), MACD indicators, Energy indicator CR, KDJ indicators, Bollinger (BOL) and so on. Its essence is like feature engineering extraction before neural network training. The fundamental purpose is to find some intrinsic property from price fluctuation. The Hull moving average itself was discovered and applied to real trading practices, which gives us a glimpse to the complication of building an automatic financial trading strategy under the veil. It’s just the beginning, the reflex in trading and aging of models may both bring uncertainty, so the effectiveness of quantitative trading strategies is a very complex topic which needs to be validated through real practice.

HMA in SAS Code

SAS was created for data analysis however, and my colleagues go into detail about how data analysis connects with HMA. Check out Cindy Wang's How to Calculate Hull Moving Average in SAS Visual Analytics and Rick Wicklin's The Hull moving average: Implement a custom time series smoother in SAS. I would like to share how to implement both HMA/WMA with only 25 lines of code in this article. You can use the macro anywhere to build HMA series with specific window lengths.

First, implement Weighted Moving Average (WMA) with SAS macro as shown below (only 11 lines). The macro has four arguments:

  • WMA_N       WMA window length
  • INPUT          Input variable name
  • OUT             Output variable name, default is input name with _WMA_n suffix
  • OUTLABEL The label of output variable, default is “input name WMA(n)
%macro AddWMA(WMA_N=4, INPUT=close, OUT=&INPUT._WMA_&WMA_N., OUTLABEL=&INPUT. WMA(&WMA_N.));
  retain _sumx_&INPUT._WMA_&WMA_N.;
  _sumx_&INPUT._WMA_&WMA_N.=sum (_sumx_&INPUT._WMA_&WMA_N., &INPUT., -lag&WMA_N.( &INPUT. )) ;
 
  retain _sum_&INPUT._WMA_&WMA_N.;
  _sum_&INPUT._WMA_&WMA_N.=sum (_sum_&INPUT._WMA_&WMA_N., &WMA_N * &INPUT., -lag( _sumx_&INPUT._WMA_&WMA_N.  )) ;
 
  retain _sumw_&INPUT._WMA_&WMA_N.;
  if _N_ <= &WMA_N then _sumw_&INPUT._WMA_&WMA_N. = sum (_sumw_&INPUT._WMA_&WMA_N., &WMA_N, -_N_, 1);
 
  &OUT=_sum_&INPUT._WMA_&WMA_N. / _sumw_&INPUT._WMA_&WMA_N.;
 
  drop _sumx_&INPUT._WMA_&WMA_N. _sumw_&INPUT._WMA_&WMA_N. _sum_&INPUT._WMA_&WMA_N.;
  label &OUT="&OUTLABEL"; 
%mend;

We must watch out for the trick in the upper implementation of WMA with linear weights. In fact, there is no do-loop in the code, and it also doesn’t perform repeat computations for weights and values. We just keep the last sum of all values in the window (See Ln2-3), and the sum for WMA is the last sum for WMA plus window length times the value subtracted from it (See Ln4-5). For weight sum computation, we just need to pay attention to the points less than the window length(See Ln6-7).

Second, implement HMA with three times WMA invocation as below, (only 9 lines). The macro has four arguments:

  • HMA_N       HMA window length
  • INPUT         Input variable name
  • OUT            Output variable name, default is input name with _HMA_n suffix
  • OUTLABEL The label of output variable, default is “input name HMA(n)
%macro AddHMA(HMA_N=5, INPUT=close, OUT=&INPUT._HMA_&HMA_N., outlabel=&INPUT. HMA(&HMA_N.));
  %AddWMA(WMA_N=&HMA_N, INPUT=&INPUT.);
 
  %AddWMA(WMA_N=%sysfunc(round(&HMA_N./2)), INPUT=&INPUT.);
 
  &INPUT._HMA_&HMA_N._DELTA=2 * &INPUT._WMA_%sysfunc(round(&HMA_N./2))  - &INPUT._WMA_&HMA_N;
  %AddWMA(WMA_N=%sysfunc(round(%sysfunc(sqrt(&HMA_N)))), INPUT=&INPUT._HMA_&HMA_N._DELTA);
 
  rename &INPUT._HMA_&HMA_N._DELTA_WMA_%sysfunc(round(%sysfunc(sqrt(&HMA_N.))))=&OUT;
  label &INPUT._HMA_&HMA_N._DELTA_WMA_%sysfunc(round(%sysfunc(sqrt(&HMA_N.))))="&outlabel"; 
 
  drop &INPUT._WMA_&HMA_N &INPUT._WMA_%sysfunc(round(&HMA_N./2)) &INPUT._HMA_&HMA_N._DELTA;
%mend;

The upper code is very intuitive as the first two lines perform WMA with window n and n/2 for X, then the next two lines generate new series Y and performs WMA with window sqrt(n). Other codes are just to improve macro flexibility and to drop temp variables, etc.

To compare HMA with SMA result, we must also implement SMA with the 7 lines of code as shown below. It has the same argument as WMA/HMA above, but SMA is an unnecessary part of HMA macro %ADDHMA implmentation.

 
%macro AddMA(MA_N=5, INPUT=close, out=&INPUT._MA_&MA_N., outlabel=&INPUT. MA(&MA_N.));
  retain _sum_&INPUT._MA_&MA_N.;
  _sum_&INPUT._MA_&MA_N.=sum (_sum_&INPUT._MA_&MA_N., &INPUT., -lag&MA_N.( &INPUT. )) ;
  &out=_sum_&INPUT._MA_&MA_N. / min(_n_, &MA_N.);
 
  drop _sum_&INPUT._MA_&MA_N.;
  label &out="&outlabel";
%mend;

Now we can test all the above implementations as shown below. First is a simple case we described above:

data a;
  input x @@;
  %AddMA( input=x, MA_N=5, out=SMA, outlabel=%str(SMA)); 
  %AddHMA( input=x, HMA_N=5, out=HMA, outlabel=%str(HMA))  
datalines;
1 2 3 4 5 6 7 8 9 10
;
proc print data=a label;
  format _all_ 8.2;
run;

The output is listed below, the HMA result is much more close to the real value x than SMA. It’s more responsive and less in lag.

Figure 3: HMA output sample

Now we can use the HMA macro to check IBM stock data in the SAS system dataset SASHELP.Stocks and draw trends in the plot to compare HMA(x, 5) and SMA(x, 5) output.

data Stocks_HMA;
  set sashelp.Stocks(where=(stock='IBM'));
  %AddHMA( HMA_N=5, out=HMA, outlabel=%str(HMA))
  %AddMA(MA_N=5, out=SMA, outlabel=%str(SMA));
run; 
ods graphics / width=800px height=300px;
title "Comparison of Simple and Hull Moving Average";
proc sgplot data=Stocks_HMA noautolegend;
   highlow x=Date high=High low=Low / lineattrs=(color=LightGray);
   scatter x=Date y=Close / markerattrs=(color=LightGray);
   series x=Date y=SMA / curvelabel curvelabelattrs=(color=DarkGreen) lineattrs=(color=DarkGreen);
   series x=Date y=HMA / curvelabel curvelabelattrs=(color=DarkOrange) lineattrs=(color=DarkOrange);
   refline '01JUN1992'd / axis=x label='June 1992' labelloc=inside;
   xaxis grid display=(nolabel); 
   yaxis grid max=200 label="IBM Closing Price";
run;

Figure 4: Comparison of SMA and HMA

Summary

We have talked about what Hull Moving Average is, how it works, and worked through a super simple SAS Macro implementation for HMA, WMA and SMA with only 25 lines of SAS code. This HMA implementation makes it quite simple and efficient to calculate HMA in SAS Data step without SAS/IML and Visual Analytics. In fact, it shows how easy process rolling-window computation for time series data can be in SAS, and opens a new window to build complicated technical indicators for stock trading systems yourself.

Learn more

Implementing HMA, WMA & SMA with 25 lines of SAS code was published on SAS Users.

11月 252019
 

The SAS Global Certification Program started in 1999 and has issued over 150,000 credentials to SAS users. Today, the program offers 23 different credentials across seven categories. According to Medium.com, “SAS Certification gives recognition of competency, shows commitment to the profession, and helps with job advancement.”

The SAS Global Certification Program and SAS Documentation have partnered together to work on SAS® Certified Specialist Prep Guide: Base Programming Using SAS® 9.4 and SAS® Certified Professional Prep Guide: Advanced Programming Using SAS® 9.4. This partnership has improved the prep guides and offers additional assets to support their use.

Changes to the Prep Guides

We have implemented changes to the certification guide based on the changes to the exam. We have taken the opportunity to make numerous general improvements to the certification guides. Both prep guides have been streamlined, shortened (significantly), and include more easy-to-use examples. For example, through reorganization and removing redundant topics, we have shortened SAS® Certified Professional Prep Guide: Advanced Programming Using SAS® 9.4 by more than 400 pages (428 pages to be exact).

Other changes to the prep guides include:

  • streamlining the examples and making them easier to read
  • aligning the quiz questions so that they are more like what the user will see on the exam
  • developing a workbook to provide hands-on practice for the programming portion of the exam

Tip Sheets


In 2018, I attended my first SAS Global Forum and I spoke with professors who use the certification guides in their class. They expressed interest in having a downloadable document with syntax that students can use for quick reference. Based on this feedback, I worked with SAS Training to develop tip sheets that summarize key concepts and syntax covered in the certification guides.

You can download the tip sheets from each guide’s book page. The tip sheets are expected to be incorporated into the extended learning pages for the related classes offered through Training.

We also made changes to how you can download sample data.

Sample Data Changes

In the past, we received feedback that the sample data that users were trying to download and import into SAS was generating errors. This was happening because of the copy and paste functionality that we had before. Due to some changes internal to SAS, we decided to streamline the process of downloading sample data.

We wanted to make the sample data easy to access.

The sample data now is “environment agnostic” and can run on any SAS environment. These changes are particularly helpful to customers who use SAS University Edition, SAS Studio, and SAS On-Demand. Incorporating these changes required modification to virtually all the previous examples in both prep guides.

The sample data now lives on the sas-cert-prep-data repository, which is on the SAS Software page in the GitHub repository. The repository is organized by the name of each prep guide. This enables our users to provide feedback by simply creating an issue on GitHub. The GitHub repository itself contains a lot of additional resources such as other SAS Press books, links to SAS documentation, Base SAS Glossary, and the SAS User YouTube channel with How-To videos.

With all of these changes and more, we hope that you find the new prep guides easier to use on the path to becoming SAS Certified.

Are you a SAS certified professional? was published on SAS Users.