3月 222021
 

This article shows how to use PROC SGPLOT in SAS to create the scatter plot shown to the right. The scatter plot has the following features:

  • The colors of markers are determined by the value of a third variable.
  • The outline of each marker is the same color (such as black).
  • The X axis is reversed because the X coordinate increases from east to west.

The purpose of this article is twofold. The primary purpose is to show how you can use the FILLEDOUTLINEDMARKERS option to separately control the fill colors and outline colors of markers. A secondary purpose is to remind longtime SAS programmers that a program that was written 10-15 years ago can sometimes benefit from a "facelift" by using more recent features in SAS procedures.

Is your program taking advantage of recent features?

Sometimes, I see an old SAS program and notice that it can be written more compactly by using modern features of the SAS language. I've noticed that there are even examples in the SAS documentation that can be simplified! In most cases, the documentation examples were written for an earlier version of SAS, before the modern features were implemented. These examples persist because some SAS customers are still using older versions of SAS, and because of the maxim, "if it ain't broke, don't fix it!"

When I see an old SAS program that uses the Graph Template Language (GTL), I study it to determine whether it could be rewritten to use PROC SGPLOT. In the early days of ODS graphics, PROC SGPLOT did not support as many statements and options as it does today. It was not uncommon to want to create a graph that required using GTL. As SAS released new versions (especially the early releases of SAS 9.4), many features in GTL made their way into the syntax of PROC SGPLOT.

An example of a feature that once required GTL is coloring scatter plot markers by the values of a third variable. In modern versions of PROC SGPLOT, you can use the COLORRESPONSE= option to specify a variable whose values determine the color of markers. Furthermore, you can use the COLORMODEL= option to specify a palette of colors to use for the markers. In a 2016 article, I wrote that the COLORRESPONSE= option is "one of my favorite new features in PROC SGPLOT in SAS 9.4m2."

Another useful feature is the FILLEDOUTLINEDMARKERS option (introduced in SAS 9.4 [M0]), which enables you to control the fill and outline attributes of markers separately. In this article, the fill colors are determined by using the values of a third variable. The outline color of the markers is specified by using the MARKEROUTLINEATTRS= option.

A remake of a classic scatter plot

The LOESS procedure in SAS/STAT was released in SAS 8 and was one of the first SAS/STAT procedures to incorporate ODS graphics. An example in the PROC LOESS documentation uses GTL to create a scatter plot of 179 locations and measurements of sulfate (SO4). The SAS 9.3 documentation (circa 2011) was the first to use the red-blue three-color ramp to assign the fill colors of markers based on the S04 values. The filled markers are overlaid with a second scatter plot that uses outline-only markers to create a boundary outline for the filled markers. The graph looks similar to the graph at the top of the page. You can click the documentation link to see the GTL and the 9.3 scatter plot. In contrast, the following call to PROC SGPLOT requires only two statements. (The documentation shows how to generate the SO4 data set.)

title "Sulfate Measurements";
proc sgplot data=SO4;
   scatter x=longitude y=latitude / colorresponse = SO4
               colormodel  = ThreeColorRamp
               filledoutlinedmarkers
               markerattrs = (symbol=circleFilled size=9)
               markeroutlineattrs=(color=black); /* omit this option if you want the GraphData1 color */
    xaxis reverse label="West Longitude";
run;

The graph is shown at the top of the article. The locations of these points are in the 48 contiguous US states. From the scatter plot, you can make out New England, the Great Lakes region, Texas, Florida, and other geographic features. (With more work, you can overlay the points on a map of the 48 states.) The REVERSE option in the XAXIS statement is necessary because the locations of the measurements are "west longitude," which increases from east to west.

Summary

This article shows how to use the FILLEDOUTLINEDMARKERS option in PROC SGPLOT to control the fill color and outline color of markers in a scatter plot. In this example, the COLORRESPONSE= option specifies a variable to use for color-coding. The MARKEROUTLINEATTRS= option specifies the outline color of the markers.

These features are useful, but it is equally useful to recognize that ODS statistical graphics (in particular, PROC SGPLOT) has evolved a lot since SAS 9.2 and 9.3. If you see an old program that uses GTL to create a plot, think about whether you can create the graph by using newer features in PROC SGPLOT. Because PROC SGPLOT code is easier to read and to maintain, it might be worth the effort.

What do you think? Is it a waste of time to rewrite a program that produces the correct output? Or is it beneficial to simplify and modernize old programs? Let me know your opinion in the comments.

The post Control the fill and outline colors of scatter plot markers in SAS appeared first on The DO Loop.

3月 172021
 

Here is an interesting math question: How many reduced fractions in the interval (0, 1) have a denominator less than 100? The question is difficult is because of the word "reduced." If we only care about the total number of fractions in (0,1) whose denominator is less than 100, we could simply count the number of integer pairs (p,q), where 1 ≤ p < q < 100. For example, the table at the right shows all fractions whose denominators are less than 6. The green triangle encloses all n(n+1)/2 fractions in (0,1) for n=6. But only the fractions that have a gold background are in reduced form.

You can answer the question by generating a Farey sequence, which is the ordered set of reduced fractions (in reduced form) on the closed interval [0,1]. The Farey sequence has a surprising number of applications in pure and applied math. I first encountered it while studying the dynamics near resonant frequencies of forced coupled oscillators.

The Farey sequence of order n

The Farey sequence is defined for fractions on the closed interval [0,1] and includes the endpoints as the reduced fractions 0/1 and 1/1. The Farey sequence of order n is the sequence of all reduced fractions on [0,1] whose denominators do not exceed n, listed in increasing order.

One algorithm for generating a Farey sequence would be to mimic the table at the top of this article. That is, generate all fractions with denominators up to n, then use a "sieve" to exclude fractions for which the numerator and denominator have a common factor. You would then have to sort the final set of results by the size of the fractions.

In fact, there is a better algorithm. The Wikipedia article about Farey sequences describes the algorithm as "surprisingly simple." The algorithm uses the "mediant property" of the Farey sequence to generate successive terms in order. The mediant property is the name given to the fact that each term in a Farey sequence is the mediant of its neighbors. Given two reduced fractions a/b and c/d, the mediant is the fraction (a+c)/(b+d). The mediant is sometimes called "freshman addition" (or, less kindly, "fool's addition") because the formula looks like a common mistake that children make when they learn to add fractions.

You can use the mediant property to generate the Farey sequence of order n from the Farey sequence of order n-1. However, as the Wikipedia article explains, you can also use the property to generate the Farey sequence of order n directly. The following DATA step generates the Farey sequences of order n for n ≤ 25.

/* Farey sequence: http://en.wikipedia.org/wiki/Farey_sequence
   For each n, generate the Farey sequence of order n. 
   This algorithm uses the mediant property of the Farey sequence to generate the terms.
*/
%let MaxOrder = 25;
data Farey;
do n = 1 to &MaxOrder;  /* for each n, generate Farey sequence of order n */
   Term=1; Numer = 0; Denom = 1; output;
   Term=2; Numer = 1; Denom = n; output;
   a=0; b=1; c=1; d=n;  /* fractions are a/b and c/d */
   do while (d > 1);
     Term = Term + 1;
     k = floor((n + b)/d);      /* integer division */
     a_prev = a;  b_prev = b;
     a = c; b = d;                        /* current c/d becomes new a/b */
     c = k*c - a_prev; d = k*d - b_prev;  /* new c/d */
     Numer = c; Denom = d; output;
   end;
end;
keep n Term Numer Denom;
label n="Order";
run;

Let's print out the terms of the Farey sequence of order n=8. The output shows the 21 fractions in the table at the top of this article, plus the trivial fractions 0/1 and 1/1 that are part of every Farey sequence. Notice the use of a DATA step view to compute the decimal values for the fractions, and the use of the FRACTw. format to display the decimals as fractions in their reduced form:

data Fractions / view=Fractions;
set Farey;
Fraction = Numer / Denom;
run;
 
proc print data=Fractions noobs;
   where n=8;
   format Fraction FRACT32.;
run;

Visualize the Farey sequences

The Wikipedia article does not display my favorite visualization of the Farey sequences. It is interesting to plot the Farey sequence for order n for many values of n, as follows:

ods graphics / width=500px height=400px;
Title "Farey Sequences: Order 1 to 40";
proc sgplot data=Fractions;
   where n <= 40;
   scatter x=Fraction y=n / markerattrs=(symbol=CircleFilled) transparency=0.75;
   yaxis reverse grid integer values=(1, 5 to 40 by 5);
   xaxis grid display=(nolabel) values=(0 to 1 by 0.1);
run;

The graph shows that there is a gap around each rational number in the Farey sequence. It turns out that the width of the gap (for each n) is larger for fractions that appear earlier in the Farey sequence. That is, if you look at the width of the gaps around 0 and 1, they are larger than the gap around 1/2, which is larger than the gaps around 1/3 and 2/3, and so forth. This turns out to be important in certain applications.

The length of Farey sequences

In the previous graph, you can see that there are two markers on the first row, three on the second row, five on the third, and so forth. You can use PROC MEANS to find out how many terms are in the Farey sequence of order n. The following statements display the number of terms for n ≤ 10:

proc means data=Farey noNObs nolabels N;
   where n <= 10;
   class n;
   var n;
run;

You can use this technique to answer the question at the top of this article: how many reduced fractions in the interval (0, 1) have a denominator less than 100? The length of the Farey sequence of order 99 answers the question. You can use PROC MEANS to find the length of the Farey sequence of order 99:

proc means data=Farey noNObs nolabels N;
   where n = 99;
   var n;
run;

There are 3005 terms in the Farey sequence of order 99, which includes the fractions 0/1 and 1/1. If you exclude the endpoints, you get the answer to the question: there are 3003 reduced fractions in the interval (0,1) that have a denominator less than 100.

Further reading

This problem was asked in The Women's Almanack, 1747, (p. 34), a publication printed in annually London that included riddles and mathematical problems "adapted for the use and diversion of the fair-sex." For more information about The Women's Almanack, including the correct proof that was published in 1751, see Ainsworth, Dawson, Pianta, and Warwick, 2012, "The Farey Sequence." The same paper contains many other interesting facts about Farey sequences.

The post The Farey sequence appeared first on The DO Loop.

3月 162021
 

When organizations move to the SAS Cloud, SAS takes care of the design and delivery of software, infrastructure and services so that our customers can focus on using analytics to solve business challenges and see a quick return on investment. That’s where Michael Watson and his team of Technical Architects come in. Michael leads the Architecture [...]

Designing the perfect SAS Cloud solution for you was published on SAS Voices by Lindsay Marshall

3月 162021
 

SAS global statementsSAS IF-THEN/ELSE statement that executes DATA step statements depending on specified conditions:

IF expression THEN executable-statement1;
<ELSE executable-statement2;>

Try sticking it in there and SAS will slap you with an ERROR:

data _null_;
   set SASHELP.CARS nobs=n;
   if n=0 then libname outlib 'c:\temp';
run;

SAS log will show:

3       if n=0 then libname outlib 'c:\temp';
                    -------
                    180
ERROR 180-322: Statement is not valid or it is used out of proper order.

But global statements’ “not executable” status only means that they cannot be executed as part of a DATA step execution. Otherwise, “they take effect” (in my mind that equates to “they execute”) right after the compilation phase but before DATA step executes (or processes) its data reads, writes, logic and iterations.

Here is another illustration. Let’s get a little creative and tack a LIBNAME global statement within conditionally executed DO-group of the IF-THEN statement:

data OUTLIB.CARS;
   set SASHELP.CARS nobs=n;
   if n=0 then
   do;
      libname OUTLIB 'c:\temp';
   end;
run;
 
In this case, SAS log will show:
NOTE: Libref OUTLIB was successfully assigned as follows:
      Engine:        V9
      Physical Name: c:\temp
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set OUTLIB.CARS has 428 observations and 15 variables.

As you can see, not only our LIBNAME statement “executed” (or “took effect”) despite the IF-THEN condition was FALSE, it successfully assigned the OUTLIB library and applied it to the data OUTLIB.CARS; statement that appears earlier in the code. That is because the LIBNAME global statement took effect (executed) right after the DATA step compilation before its execution.

For the same reason, you can place global statement TITLE either in open code before PROC that produces output with a title or within that PROC. In the first case, the stand-alone TITLE statement is compiled on its own and immediately executed thus setting the title for the PROCs that follow. In the latter case, it is compiled with the PROC step, then immediately executed before PROC step’s execution.

Now, when we have a solid grasp of the global statements timing habits, let’s look at the coding techniques allowing us to take full control of when and whether global statements take effect (executed).

Macro language to conditionally execute SAS global statements

Since

%let dsname = SASHELP.CARS;
/*%let dsname = SASHELP.CLASS;*/
 
%let name = %scan(&dsname,2);
 
%if (&name eq CARS) or (&name eq CLASS) %then
%do;
   options DLCREATEDIR;
   libname outlib "c:\temp\&name";
%end;
%else
%do;
   libname outlib "c:\temp";
%end;
 
data OUTLIB.&name;
   set &dsname;
run;

In this code, if name is either CARS or CLASS the following global statements will be generated and passed on to the SAS compiler:

   options DLCREATEDIR;
   libname outlib "c:\temp\&name";

This will create a directory c:\temp\&name (if it does not exist) and assign libref OUTLIB to that directory.

Otherwise, the following global statement will be generated and passed on to the SAS compiler:

   libname outlib "c:\temp";

The DATA step then creates data set OUTLIB.&name in the corresponding dynamically assigned library. Using this technique, you can conditionally generate global statements for SAS system options, librefs, filerefs, titles, footnotes, etc. SAS compiler will pick up those generated global statements and execute (activate, put in effect) them.

CALL EXECUTE to conditionally execute SAS global statements

Sometimes, it is necessary to conditionally execute global statements based on values contained in data, whether in raw data or SAS data sets. Such a data-driven approach can be easily implemented using CALL EXECUTE routine in a DATA step.

data _null_;
   set SASHELP.CARS;
   by MAKE;
   if first.MAKE then
   do;
      call execute('title "'||trim(MAKE)||' models";');
      call execute('proc print noobs data=SASHELP.CARS(where=(MAKE="'||trim(MAKE)||'"));');
      call execute('   var MAKE MODEL TYPE;');
      call execute('run;');
   end;
run;

In this code, for every block of unique MAKE values (identified by first.MAKE) we have CALL EXECUTE generating lines of SAS code and pushing them outside the DATA step boundary where they compile and execute. The code snippets for TITLE and WHERE clause are data-driven and generated dynamically. The SAS log will show a series of the generated statements:

NOTE: CALL EXECUTE generated line.
1   + title "Acura models";
2   + proc print noobs data=SASHELP.CARS(where=(MAKE="Acura"));
3   +    var MAKE MODEL TYPE;
4   + run;
 
5   + title "Audi models";
6   + proc print noobs data=SASHELP.CARS(where=(MAKE="Audi"));
7   +    var MAKE MODEL TYPE;
8   + run;

. . . and so forth.

In this implementation, global statement TITLE is prepared (“pre-cooked”) conditionally (if first.MAKE is TRUE) within the DATA step in a form of a character value. It’s still not a global statement until CALL EXECUTE pushes it out of the DATA step. There it becomes a global statement as part of SAS code stream. There it gets compiled and executed, setting a nice data-driven title for the PROC PRINT output (individually for each Make):

PROC PRINT outputs with dynamically generated titles

Additional resources

Your thoughts?

Have you found this blog post useful? Do you have any questions? Please feel free to ask and share your thoughts and feedback in the comments section below.

How to conditionally execute SAS global statements was published on SAS Users.

3月 152021
 

A SAS customer wanted to compute the cumulative distribution function (CDF) of the generalized gamma distribution. For any continuous distribution, the CDF is the integral of the probability density function (PDF), which usually has an explicit formula. Accordingly, he wanted to compute the CDF by using the QUAD function in PROC IML to integrate the PDF.

I have previously shown how to integrate the PDF to obtain the CDF. I have also shown how to use the trapezoid rule of numerical integration to obtain the entire CDF curve from a curve of the PDF. But before I sent him those links, I looked closer at the distribution that he wanted to compute. It turns out that you don't have to perform an integral to get the CDF of the generalized gamma distribution.

What is the generalized gamma distribution

I was not familiar with the generalized gamma distribution, so I looked at an article on Wikipedia. The original formulation is due to Stacy, E. (1962), "A Generalization of the Gamma Distribution." The distribution is "generalized" in the sense that it contains many other familiar distributions for certain parameter values, including the gamma, chi-square, exponential, half-normal, and Weibull distributions.

The following definition is from Wikipedia, but I changed the notation for the incomplete gamma function to agree with my previous article. The generalized gamma has three parameters: a>0, d>0, and p>0. For non-negative x, the probability density function of the generalized gamma is
\(f(x;a,d,p)=(p/a^{d})x^{d-1}e^{-(x/a)^{p}} / \,\Gamma (d/p)\)
where \(\Gamma(\cdot)\) denotes the complete gamma function.

The cumulative distribution function is
\(F(x;a,d,p)=\gamma ((x/a)^{p}, d/p) / \,\Gamma (d/p)\),
where \(\gamma (\cdot )\) denotes the lower incomplete gamma function.

Compute the generalized gamma distribution in SAS

Evaluating the PDF is usually easy because it has an explicit formula. It is the CDF that requires thought and effort. In this case, however, the CDF is given in terms of the lower incomplete gamma function, and I recently showed how to compute the lower incomplete gamma function in SAS. As a rule of thumb, if a distribution has 'gamma' as part of its name, you might be able to use this trick to evaluate the CDF.

In my previous article, I showed that the lower incomplete gamma function is nothing more than the "unstandardized" CDF of the gamma distribution, as represented by this SAS macro:

%macro LowerIncGamma(x, alpha); /* lower incomplete gamma function */
   (Gamma(&alpha) * CDF('Gamma', &x, &alpha))
%mend;

But the formula for the CDF of the generalized gamma distribution divides by the complete gamma function, which just leaves the call to the CDF function. Instead of the parameter x, you use (x/a)p. For the parameter α, you substitute α=d/p. This means that you can compute the CDF for the generalized gamma distribution by using the CDF('GAMMA') function in SAS. The following SAS DATA step evaluates the PDF and CDF for five combinations of the a, d, and p parameters. These are the same parameter values that are used in the Wikipedia article.

data GenGamma;
label PDF = "Density" CDF = "Cumulative Probability";
array _a[5] _temporary_ (2,   1,   2, 5, 7);
array _d[5] _temporary_ (0.5, 1,   1, 1, 1);
array _p[5] _temporary_ (0.5, 0.5, 2, 5, 7);
do i = 1 to dim(_a);
   a = _a[i]; d = _d[i]; p = _p[i];
   params = cats('a=',a,', d=',d,', p=',p);
   do x = 0.0001, 0.001, 0.01 to 7 by 0.01;
      PDF = (p/a**d) * x**(d-1) * exp(-(x/a)**p) / Gamma(d/p);
      CDF = CDF('Gamma', (x/a)**p, d/p);
      output;
   end;
end;
run;

The following call to PROC SGPLOT creates a graph of the PDF for the five sets of parameter values:

title "PDF of the Generalized Gamma Distribution";
proc sgplot data=GenGamma;
   series x=x y=PDF / group=params;
   yaxis max = 0.7 grid;
   xaxis grid;
   keylegend / location=inside position=NE across=1 title="" opaque;
run;

The PDF curves are shown in the graph in the previous section. You can use almost the same statements to create a graph of the five CDF functions:

title "CDF of the Generalized Gamma Distribution";
proc sgplot data=GenGamma;
   series x=x y=CDF / group=params;
   yaxis grid;
   xaxis grid;
   keylegend / location=inside position=SE across=1 title="" opaque;
run;

Quantiles of the generalized gamma distribution

Although the SAS programmer did not request the quantile function for the generalized gamma distribution, it can be evaluated by using the QUANTILE('GAMMA") function in SAS. The details are in the Wikipedia article. For completeness, the following DATA step computes the quantiles. The graph is not shown, since it is the same as the graph of the cumulative distribution, but with the axes exchanged.

data GenGammaQuantile;
label prob = "Cumulative Probability" q = "Quantile";
array _a[5] _temporary_ (2,   1,   2, 5, 7);
array _d[5] _temporary_ (0.5, 1,   1, 1, 1);
array _p[5] _temporary_ (0.5, 0.5, 2, 5, 7);
do i = 1 to dim(_a);
   a = _a[i]; d = _d[i]; p = _p[i];
   params = cats('a=',a,', d=',d,', p=',p);
   do prob = 0.001, 0.01 to 0.99 by 0.01;
      y = quantile("Gamma", prob, d/p);
      q = a * y**(1/p);
      output;
   end;
end;
run;
 
title "Quantiles of the Generalized Gamma Distribution";
proc sgplot data=GenGammaQuantile;
   series x=prob y=q / group=params;
   yaxis grid max=7;
   xaxis grid;
   keylegend / location=inside position=NW across=1 title="" opaque;
run;

Summary

Although you can integrate the PDF to obtain the CDF for a continuous probability distribution, sometimes it is possible to evaluate the CDF in an easier way. In the case of the generalized gamma distribution, the CDF can be evaluated by using the CDF of the ordinary gamma distribution. You just need to be clever about how to pass in the parameter values.

The post The generalized gamma distribution appeared first on The DO Loop.

3月 132021
 

As an American child growing up on a military base in Germany, I learned many important life lessons. One of the most important was about language and communication. I realized at age 12 that, even though I could speak German, I thought in English -- and it led me to [...]

3 ways new SAS, Microsoft solutions can change the way manufacturers think was published on SAS Voices by Marcia Elaine Walker

3月 132021
 

As a programmer, you might be accustomed to how SAS® stores date, time, and datetime values as 8-byte floating-point numeric values. A SAS date is the number of days since January 1, 1960; a SAS time is the number of seconds since midnight; and a SAS datetime is the number of seconds since midnight, January 1, 1960. Are you aware that date and time values are handled differently if you are programming in DS2? Because DS2 can process databases, it has access to ANSI data types, which have greater precision. The ANSI data types that are relevant for processing dates and times include DOUBLE, TIME, DATE, and TIMESTAMP. However, these data types are not comparable to SAS date and time values, so they are not automatically converted.

Dates and times in DS2

Declaring DS2 DATE, TIME, and TIMESTAMP constants requires a specific structure. The syntax is shown here:

  • DATE 'yyyy-mm-dd'
  • TIME 'hh:mm:ss[.fraction]'
  • TIMESTAMP 'yyyy-mm-dd hh:mm:ss[.fraction]'

Here is an example:

proc ds2;
data _null_;
method run();
dcl date ds2dt;
dcl time ds2tm;
dcl timestamp ds2dtm;
ds2dt = date '2017-01-31';
ds2tm = time '20:44:59';
ds2dtm = timestamp '2017-02-07 07:00:00.7569';
put ds2dt=;
put ds2tm=;
put ds2dtm=;
end;
enddata;
run;
quit;

Results:

ds2dt=2017-01-31
ds2tm=20:44:59
ds2dtm=2017-02-07 07:00:00.756900000

Using functions to help with conversion

When date, time, or timestamp values are read into DS2 from a database, they are processed in their native data types. Variables in a SAS data set that are formatted with a date, time, or datetime format and read into DS2 must be converted to the equivalent ANSI DATE, TIME, or TIMESTAMP data types. To successfully process ANSI values in DS2 using SAS interval functions, such as INTCK or INTNX, you must first explicitly convert them to the appropriate SAS double-precision numeric value. The following functions can assist with the conversion between ANSI and SAS:

  • TO_DOUBLE—converts any ANSI date, time, or timestamp value to the appropriate SAS numeric date, time, or datetime value
  • TO_DATE—converts an unformatted SAS date to an ANSI date
  • TO_TIME—converts an unformatted SAS time to an ANSI time
  • TO_TIMESTAMP—converts an unformatted SAS datetime to an ANSI timestamp

Example 1: Convert an unformatted SAS date, time, or datetime value in DS2

You can convert unformatted SAS date, time, or datetime values in DS2 by using the TO_DATE, TO_TIME, and TO_TIMESTAMP functions. The HAVING FORMAT option in the DECLARE (DCL) statement assigns a format to the new variables.

data dates;
sas_time=time();
sas_date=today();
sas_datetime=datetime();
run;
 
proc ds2;
data _null_;
dcl date ds2dt having format YYMMDD10.;
dcl time ds2tm having format TIME18.5;
dcl timestamp ds2dtm having format DATETIME28.5;
method run();
set dates;
ds2dt=to_date(sas_date);
ds2tm=to_time(sas_time);
ds2dtm=to_timestamp(sas_datetime);
put 'SAS: ' sas_date sas_time sas_datetime;
put 'ANSI: ' ds2dt ds2tm ds2dtm;
end;
enddata;
run;
quit;

Results:

SAS: 22326 30565.0650000572 1928996965.065
ANSI: 2021-02-15 8:29:25.06500 15FEB2021:08:29:25.06500

Example 2: Convert a formatted SAS date, time, or datetime value in DS2

If you have applied formats to the SAS variables, you must first convert the variables by using the TO_DOUBLE function. If you do not do this conversion in advance and then try to use a SAS function, such as INTNX, you see messages like the following in the log:

ERROR: Compilation error.
ERROR: Line 402: Invalid conversion for date or time type.
WARNING: Implicit conversion of time(4) type to double type. Statement 402.
WARNING: Implicit conversion of int type to double type. Statement 402.

Here is the syntax that you can use for conversion:

data dates;
sas_time=time();
sas_date=today();
sas_datetime=datetime();
format sas_time time. sas_date mmddyy10. sas_datetime datetime.;
run;
 
proc ds2;
data _null_;
dcl double SAS_Date_New having format mmddyy10.;
dcl double SAS_Time_New having format time.;
dcl double SAS_Datetime_New having format datetime.;
method run();
set work.dates;
SAS_Date_New=INTNX('week',to_double(sas_date),26);
SAS_Time_New=INTNX('hour',to_double(sas_time),3);
SAS_Datetime_New=INTNX('dtweek',to_double(sas_datetime),2);
put SAS_Date_New=;
put SAS_Time_New=;
put SAS_Datetime_New=;
end;
enddata;
run;
quit;

Results:

SAS_Date_New=08/15/2021
SAS_Time_New=11:00:00
SAS_Datetime_New=15AUG21:00:00:00

Example 3: Convert an ANSI date, time, or timestamp value to a SAS date in DS2

If you want to convert an ANSI value into a SAS value, use the TO_DOUBLE function. In the example below, the TO_DOUBLE function converts the ANSI timestamp value to a SAS datetime. After you have a SAS datetime, you can then use the SAS functions DATEPART and TIMEPART to extract the date and time values.

proc ds2;
data _null_;
method run();
dcl timestamp ds2dtm;
dcl double sas_date having format mmddyy10.;
dcl double sas_time having format time8.2;
dcl double sas_datetime having format datetime18.2;
ds2dtm = timestamp '2020-02-01 01:23:45.7569';
sas_datetime=to_double(ds2dtm);
sas_date=datepart(sas_datetime);
sas_time=timepart(sas_datetime);
put ds2dtm=;
put sas_date=;
put sas_time=;
put sas_datetime= ;
end;
enddata;
run;
quit;

Results:

ds2dtm=2020-02-01 01:23:45.756900000
sas_date=02/01/2020
sas_time= 1:23:46
sas_datetime=01FEB20:01:23:45.7

Working with both ANSI and SAS date, time, and datetime values in a DS2 program is easily managed with the TO_DOUBLE, TO_DATE, TO_TIME, and TO_DATETIME functions. These functions make the conversion between ANSI and SAS values seamless and efficient.

Thank you for reading and thanks to Mark Jordan for his expertise and help in reviewing this blog before publication.

Additional references

Tips for working with date, time, and datetime values in DS2 was published on SAS Users.

3月 122021
 

Utilities all over the country are facing multiple disruptions, from climate change to distributed energy generation to a growing need to embrace digital transformation. These challenges are more pronounced for midsize public power utilities, which are community-owned, not-for-profit and often more vulnerable to economic challenges than investor-owned utilities. As new [...]

A unique partnership helps identify unmet needs in the energy and utilities industry was published on SAS Voices by Philip Jones

3月 102021
 

This is my Pi Day post for 2021. Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. Most years I write about lower-case pi (π), which is the ratio of a circle's circumference to its diameter. But this year I thought it would be fun to write about upper-case pi (\(\Pi\)), which is the mathematical notation for a product of terms.

Just as \(\Sigma\) is used to express a summation, so \(\Pi\) is used to express a product. Did you know that there are infinite products that converge to pi? A famous one is the Wallis product, which converges to π/2:
\(\pi / 2 = \prod_{n=1}^{\infty} \left( \frac{2n}{2n-1} \cdot \frac{2n}{2n+1} \right) = \left( \frac{2}{1}\cdot \frac{2}{3} \right) \cdot \left( \frac{4}{3}\cdot \frac{4}{5} \right) \cdot \left( \frac{6}{5}\cdot \frac{6}{7} \right) \cdot \left( \frac{8}{7}\cdot \frac{8}{9} \right) \cdot \;\cdots \)

Implement Wallis's formula for pi

Suppose you wanted to implement Wallis's formula on a computer. How would you do it? The simplest way is to literally translate the formula into a loop that multiplies the first N terms in the product. You can plot the partial products against the number of terms to see how quickly (or slowly!) the product converges, as shown in the following SAS DATA step:

ods graphics / reset;
%let N = 1000;
data WallisProd;
prod = 1;
do n = 1 to &N;
   term = (2*n/(2*n-1)) * (2*n/(2*n+1));
   prod = prod * term;                  
   output;                     
end;
run;
 
title "Convergence of Wallis Product to pi/2";
proc sgplot data=WallisProd;
   where n >= 100;
   scatter x=n y=prod;
   refline &pi2 / axis=y noclip lineattrs=(color=red) label="pi/2";
   xaxis grid; yaxis grid;
run;

The graph indicates that the convergence of the Wallis product is very slow. In fact, you can prove that the n_th term of the product has an asymptotic error of size π/(8n) as n → ∞. Because of this, the Wallis formula is not a good choice for computing the digits of pi. It takes about 400,000 terms to get six-digit accuracy!

But let's not focus on π, let's focus on the product, \(\Pi\). Although a direct translation of the product formula is simple, it is not necessarily the best way to compute the product, computationally speaking. Pi Day seems like a great time to review an alternative computational method that is often useful when you compute with products.

Products and log transformations in statistics

The Wallis product is well behaved because the individual terms are all close to 1. However, many products that you encounter in computational statistics are not so well behaved. For example, in statistical modeling, you encounter the likelihood function. Given a set of data (x1, x2,..., xn) and a probability density function (f) that you believe models the data, the likelihood of the data is the product \(\prod_{i=1}^{n} f(x_{i})\). In this formulation, every term in the product is less than 1. If the data contain outliers, there might be individual probabilities that are tiny. Therefore, a naive computation of the product can result in a very small number. In fact, the computation might encounter an underflow error, especially for large data sets.

A computational technique that avoids this potential problem is to apply a log transformation, which converts the product into a summation of the log of the terms. After you have computed the sum, you exponentiate it to obtain the product. This technique requires that all terms in the product be positive, which is true in most applications.

In symbols, if you have a product of positive values, \(y = \prod_{i} a_{i}\), you can log-transform it to \(\log(y) = \sum_{i} \log(a_{i})\). The product is mathematically equivalent to \(y = \exp\left( \sum_{i} \log(a_{i}) \right)\).

Let's apply the log transformation to the Wallis formula for π/2. The following SAS DATA step computes a sum of log-transformed terms. After each step, the EXP function gives the value of the product at that step.

data WallisSum;
do n = 1 to &N;
   logterm = 2*log(2*n) - log(2*n-1) - log(2*n+1);  /* log transform of terms */
   logprod + logterm;                               /* summation */
   prod = exp(logprod);                             /* exp(sum) */
   output;
end;
run;

The graph of the partial products is the same graph as in the previous section. The numerical difference between the two computations is less than 3E-14. However, in general, it is usually safer to sum the log-transformed terms because the method avoids numerical overflow and underflow issues that are common when you compute a product in a naive way.

So, this Pi Day, don't just celebrate lower-case pi, π = 3.14159265.... Take a moment to think about upper-case pi (\(\Pi\)), which stands for a product of numbers. For many products, a numerically stable way to compute the product to log-transform the problem, compute a summation, and then exponentiate the sum. And that is a numerical technique that is worth celebrating, not just on Pi Day but every day!

Further reading

The post Pi and products appeared first on The DO Loop.