SAS functions

11月 022017
 

The purpose of this blog post is to demonstrate a SAS coding technique that allows for calculations with multiple variables across a SAS dataset, whether or not their values belong to the same or different observations.

calculations across observations of a SAS data table

What do we want?

As illustrated in the picture on the right, we want to be able to hop, or jump, back and forth, up and down across observations of a data table in order to implement calculations not just on different variables, but with their values from different observations of a data table.

In essence, we want to access SAS dataset variable values similar to accessing elements of a matrix (aij), where rows represent dataset observations, and columns represent dataset variables.

Combine and Conquer

In the spirit of my earlier post Combine and Conquer with SAS, this technique combines the functionality of the LAG function, which allows us to retrieve the variable value of a previous observation from a queue, with an imaginary, non-existent in SAS, LEAD function that reads a subsequent observation in a data set while processing the current observation during the same iteration of the DATA step.

LAG function

LAG<n> function in SAS is not your usual, ordinary function. While it provides a mechanism of retrieving previous observations in a data table, it does not work “on demand” to arbitrarily read a variable value from a previous observation n steps back. If you want to use it conditionally in some observations of a data step, you still need to call it in every iteration of that data step. That is because it retrieves values from a queue that is built sequentially for each invocation of the LAG<n> function. In essence, in order to use the LAG function even just once in a data step, you need to call it every time in each data step iteration until that single use.

Moreover, if you need to use each of the LAG1, LAG2, . . . LAGn functions just once, in order to build these queues, you have to call each of them in every data step iteration even if you are going to use them in some subsequent iterations.

LEAD function

The LEAD function is implemented in Oracle SQL and it returns data from the next or subsequent row of a data table. It allows you to query more than one row in a table at a time without having to join the table to itself.

There is no such function in SAS. However, the POINT= option of the SET statement in a SAS data step allows retrieving any observation by its number from a data set using random (direct) access to read a SAS data set. This will allow us to simulate a LEAD function in SAS.

HOP function

But why do we need two separate functions like LAG and LEAD in order to retrieve non-current observations. In essence, these two functions do the same thing, just in opposite directions. Why can’t we get by with just one function that does both backwards and forwards “hopping?”

Let’s combine and conquer.

Ideally, we would like to construct a new single function - let’s call it HOP(x, j) - that combines the best qualities of both LAG and LEAD functions. The two arguments of the HOP function would be as follows:

x – SAS variable name (numeric or character) the value of we are retrieving;

j – hop distance (numeric) – an offset from the current observation; negative values being lagging (hopping back), positive values being leading (hopping forward), and a zero-value meaning staying within the current observation.

The sign of the second argument defines whether we lag (minus) or lead (plus). The absolute value of this second argument defines how far from the current observation we hop.

Alternatively, we could have the first argument as a column number, and the second argument as a row/observation number, if we wanted this function to deal with the data table more like with a matrix. But relatively speaking, the method doesn’t really matter as long as we can unambiguously identify a data element or a cell. To stay within the data step paradigm, we will stick with the variable name and offset from the current observation (_n_) as arguments.

Let’s say we have a data table SAMPLE, where for each event FAIL_FLAG=1 we want to calculate DELTA as the difference between DATE_OUT, one observation after the event, and DATE_IN, two observations before the event:

Calculations across observations of a SAS data table

That is, we want to calculate DELTA in the observation where FAIL_FLAG = 1 as

26MAR2017 18JAN2017 = 67 (as shown in light-blue highlighting in the above figure).

With the HOP() function, that calculation in the data step would look like this:

data SAMPLE;
   set SAMPLE;
   if FAIL_FLAG then DELTA = hop(DATE_OUT,1) - hop(DATE_IN,-2);
run;

It would be reasonable to suggest that the hop() function should return a missing value when the second argument produces an observation number outside of the dataset boundary, that is when

_n_ + j < 0 or _n_ + j > num, where _n_ is the current observation number of the data step iteration; num is the number of observations in the dataset; j is the offset argument value.

Do you see anything wrong with this solution? I don’t. Except that the HOP function exists only in my imagination. Hopefully, it will be implemented soon if enough SAS users ask for it. But until then, we can use its surrogate in the form of a %HOP macro.

%HOP macro

The HOP macro grabs the value of a specified variable in an offset observation relative to the current observation and assigns it to another variable. It is used within a SAS data step, but it cannot be used in an expression; each invocation of the HOP macro can only grab one value of a variable across observations and assign it to another variable within the current observation.

If you need to build an expression to do calculations with several variables from various observations, you would need to first retrieve all those values by invoking the %hop macro as many times as the number of the values involved in the expression.

Here is the syntax of the %HOP macro, which has four required parameters:

%HOP(d,x,y,j)

d – input data table name;

x – source variable name;

y – target variable name;

j – integer offset relative to the current observation. As before, a negative value means a previous observation, a positive value means a subsequent observation, and zero means the current observation.

Using this %HOP macro we can rewrite our code for calculating DELTA as follows:

 data SAMPLE (drop=TEMP1 TEMP2);
   set SAMPLE;
   if FAIL_FLAG then
   do;
      %hop(SAMPLE,DATE_OUT,TEMP1, 1)
      %hop(SAMPLE,DATE_IN, TEMP2,-2)
      DELTA = TEMP1 - TEMP2;
   end;
run;

Note that we should not have temporary variables TEMP1 and TEMP2 listed in a RETAIN statement, as this could mess up our calculations if the j-offset throws an observation number out of the dataset boundary.

Also, the input data table name (d parameter value) is the one that is specified in the SET statement, which may or may not be the same as the name specified in the DATA statement.

In case you are wondering where you can download the %HOP macro from, here it is in its entirety:

%macro hop(d,x,y,j);
   _p_ = _n_ + &j;
   if (1 le _p_ le _o_) then set &d(keep=&x rename=(&x=&y)) point=_p_ nobs=_o_;
%mend hop;

Of course, it is “free of charge” and “as is” for your unlimited use.

Your turn

Please provide your feedback and share possible use cases for the HOP function/macro in the Comment section below. This is your chance for your voice to be heard!

Hopping for the best - calculations across SAS dataset observations was published on SAS Users.

5月 082017
 

In his recent article Perceptions of probability, Rick Wicklin explores how vague statements about "likeliness" translate into probabilities that we can express numerically. It's a fun, informative post -- I recommend it! You'll "Almost Certainly" enjoy it.

To prepare the article, Rick first had to download the source data from the study he cited. The data was shared as a CSV file on GitHub. Rick also had to rename the variables (column names) from the data table so that they are easier to code within SAS. Traditionally, SAS variable names must adhere to a few common programming rules: they must be alphanumeric, begin with a letter, and contain no spaces or special characters. The complete rules are documented in the this method for reading data from a cloud service like DropBox and GitHub. It's still my favorite technique for reading data from the Internet. You'll find lots of papers and examples that use FILENAME URL for the same job in fewer lines of code, but PROC HTTP is more robust. It runs faster, and it allows you to separate the step of fetching the file from the subsequent steps of processing that file.

You can see the contents of the CSV file at this friendly URL: https://github.com/zonination/perceptions/blob/master/probly.csv. But that's not the URL that I need for PROC HTTP or any programmatic access. To download the file via a script, I need the "Raw" file URL, which I can access via the Raw button on the GitHub page.

GitHub preview

In this case, that's https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv. Here's the PROC HTTP step to download the CSV file into a temporary fileref.

/* Fetch the file from the web site */
filename probly temp;
proc http
 url="https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv"
 method="GET"
 out=probly;
run;

A note for SAS University Edition users: this step won't work for you, as the free software does not support access to secure (HTTPS) sites. You'll have to manually download the file via your browser and then continue with the remaining steps.

Step 2. Import the data into SAS with PROC IMPORT

SAS can process data with nonstandard variable names, including names that contain spaces and special characters. You simply have to use the VALIDVARNAME= system option to put SAS into the right mode (oops, almost wrote "mood" there but it's sort of the same thing).

With 'crime against nature'n.)

For this step, I'll set VALIDVARNAME=ANY to allow PROC IMPORT to retain the original column names from the CSV file. The same trick would work if I was importing from an Excel file, or any other data source that was a little more liberal in its naming rules.

/* Tell SAS to allow "nonstandard" names */
options validvarname=any;
 
/* import to a SAS data set */
proc import
  file=probly
  out=work.probly replace
  dbms=csv;
run;

Step 3. Create RENAME and LABEL statements with PROC SQL

This is one of my favorite SAS tricks. You can use PROC SQL SELECT INTO to create SAS programming statements for you, based on the data you're processing. Using this technique, I can build the parts of the LABEL statement and the RENAME statement dynamically, without knowing the variable names ahead of time.

The LABEL statement is simple. I'm going to build a series of assignments that look like this:

  'original variable name'n = 'original variable name'

I used the SELECT INTO clause to build a label assignment for each variable name. I used the CAT function to assemble the label assignment piece-by-piece, including the special literal syntax, the variable name, the assignment operator, and the label value within quotes. I'm fetching the variable names from SASHELP.VCOLUMN, one of the built-in dictionary tables that SAS provides to surface table and column metadata.

  select cat("'",trim(name),"'n","=","'",trim(name),"'") 
     into :labelStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK";

Here's part of the value of &labelStmt:

'Almost Certainly'n='Almost Certainly' 
'Highly Likely'n='Highly Likely' 
'Very Good Chance'n='Very Good Chance' 
'Probable'n='Probable' 
'Likely'n='Likely' 
'Probably'n='Probably' 
'We Believe'n='We Believe' 

The RENAME statement is a little trickier, because I have to calculate a new valid variable name. For this specific data source that's easy, because the only SAS "rule" that these column names violate is the ban on space characters. I can create a new name by using the COMPRESS function to remove the spaces. To be a little safer, I used the "kn" modifier on the COMPRESS function to keep only English letters, numbers, and underscores. That should cover all cases except for variable names that are too long (greater than 32 characters) or that begin with a number (or that don't contain any valid characters to begin with).

Some of the column names are one-word names that are already valid. If I include those in the RENAME statement, SAS will generate an error (you cannot "rename" a variable to its current name). I used the

/* Generate new names to comply with SAS rules.                          */
/* Assumes names contain spaces, and can fix with COMPRESS               */
/* Other deviations (like special chars, names that start with a number) */
/* would need different adjustments                                      */
/* NVALID() function can check that a name is a valid V7 name           */
proc sql noprint;
 
  /* retain original names as labels */
  select cat("'",trim(name),"'n","=","'",trim(name),"'") 
     into :labelStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK";
 
  select cat("'",trim(name),"'n","=",compress(name,,'kn')) 
     into :renameStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK"
  /* exclude those varnames that are already valid */
  AND not NVALID(trim(name),'V7');
quit;

Step 4. Modify the data set with new names and labels using PROC DATASETS

With the body of the LABEL and RENAME statements built, it's time to plug them into a PROC DATASETS step. PROC DATASETS can change data set attributes such as variable names, labels, and formats without requiring a complete rewrite of the data -- it's a very efficient operation.

I include the LABEL statement first, since it references the original variable names. Then I include the RENAME statement, which changes the variable names to their new V7-compliant values.

Finally, I reset the VALIDVARNAME= option to the normal V7 sanity. (Unless you're running in SAS Enterprise Guide, in which case the option is already set to ANY by default. Check this blog post for a less disruptive method of setting/restoring options.)

proc datasets lib=work nolist ;
  modify probly / memtype=data;
  label &labelStmt.;
  rename &renameStmt.;
  /* optional: report on the var names/labels */
  contents data=probly nodetails;
quit;
 
/* reset back to the old rules */
options validvarname=v7;

Here's the CONTENTS output from the PROC DATASETS step, which shows the final variable attributes. I now have easy-to-code variable names, and they still have their descriptive labels. My data dictionary dreams are coming true!

DATASETS rename output

Download the entire program example from my public Gist: import_renameV7.sas.

The post How to download and convert CSV files for use in SAS appeared first on The SAS Dummy.

5月 082017
 

In his recent article Perceptions of probability, Rick Wicklin explores how vague statements about "likeliness" translate into probabilities that we can express numerically. It's a fun, informative post -- I recommend it! You'll "Almost Certainly" enjoy it.

To prepare the article, Rick first had to download the source data from the study he cited. The data was shared as a CSV file on GitHub. Rick also had to rename the variables (column names) from the data table so that they are easier to code within SAS. Traditionally, SAS variable names must adhere to a few common programming rules: they must be alphanumeric, begin with a letter, and contain no spaces or special characters. The complete rules are documented in the this method for reading data from a cloud service like DropBox and GitHub. It's still my favorite technique for reading data from the Internet. You'll find lots of papers and examples that use FILENAME URL for the same job in fewer lines of code, but PROC HTTP is more robust. It runs faster, and it allows you to separate the step of fetching the file from the subsequent steps of processing that file.

You can see the contents of the CSV file at this friendly URL: https://github.com/zonination/perceptions/blob/master/probly.csv. But that's not the URL that I need for PROC HTTP or any programmatic access. To download the file via a script, I need the "Raw" file URL, which I can access via the Raw button on the GitHub page.

GitHub preview

In this case, that's https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv. Here's the PROC HTTP step to download the CSV file into a temporary fileref.

/* Fetch the file from the web site */
filename probly temp;
proc http
 url="https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv"
 method="GET"
 out=probly;
run;

A note for SAS University Edition users: this step won't work for you, as the free software does not support access to secure (HTTPS) sites. You'll have to manually download the file via your browser and then continue with the remaining steps.

Step 2. Import the data into SAS with PROC IMPORT

SAS can process data with nonstandard variable names, including names that contain spaces and special characters. You simply have to use the VALIDVARNAME= system option to put SAS into the right mode (oops, almost wrote "mood" there but it's sort of the same thing).

With 'crime against nature'n.)

For this step, I'll set VALIDVARNAME=ANY to allow PROC IMPORT to retain the original column names from the CSV file. The same trick would work if I was importing from an Excel file, or any other data source that was a little more liberal in its naming rules.

/* Tell SAS to allow "nonstandard" names */
options validvarname=any;
 
/* import to a SAS data set */
proc import
  file=probly
  out=work.probly replace
  dbms=csv;
run;

Step 3. Create RENAME and LABEL statements with PROC SQL

This is one of my favorite SAS tricks. You can use PROC SQL SELECT INTO to create SAS programming statements for you, based on the data you're processing. Using this technique, I can build the parts of the LABEL statement and the RENAME statement dynamically, without knowing the variable names ahead of time.

The LABEL statement is simple. I'm going to build a series of assignments that look like this:

  'original variable name'n = 'original variable name'

I used the SELECT INTO clause to build a label assignment for each variable name. I used the CAT function to assemble the label assignment piece-by-piece, including the special literal syntax, the variable name, the assignment operator, and the label value within quotes. I'm fetching the variable names from SASHELP.VCOLUMN, one of the built-in dictionary tables that SAS provides to surface table and column metadata.

  select cat("'",trim(name),"'n","=","'",trim(name),"'") 
     into :labelStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK";

Here's part of the value of &labelStmt:

'Almost Certainly'n='Almost Certainly' 
'Highly Likely'n='Highly Likely' 
'Very Good Chance'n='Very Good Chance' 
'Probable'n='Probable' 
'Likely'n='Likely' 
'Probably'n='Probably' 
'We Believe'n='We Believe' 

The RENAME statement is a little trickier, because I have to calculate a new valid variable name. For this specific data source that's easy, because the only SAS "rule" that these column names violate is the ban on space characters. I can create a new name by using the COMPRESS function to remove the spaces. To be a little safer, I used the "kn" modifier on the COMPRESS function to keep only English letters, numbers, and underscores. That should cover all cases except for variable names that are too long (greater than 32 characters) or that begin with a number (or that don't contain any valid characters to begin with).

Some of the column names are one-word names that are already valid. If I include those in the RENAME statement, SAS will generate an error (you cannot "rename" a variable to its current name). I used the

/* Generate new names to comply with SAS rules.                          */
/* Assumes names contain spaces, and can fix with COMPRESS               */
/* Other deviations (like special chars, names that start with a number) */
/* would need different adjustments                                      */
/* NVALID() function can check that a name is a valid V7 name           */
proc sql noprint;
 
  /* retain original names as labels */
  select cat("'",trim(name),"'n","=","'",trim(name),"'") 
     into :labelStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK";
 
  select cat("'",trim(name),"'n","=",compress(name,,'kn')) 
     into :renameStmt separated by ' '  
  from sashelp.vcolumn where memname="PROBLY" and libname="WORK"
  /* exclude those varnames that are already valid */
  AND not NVALID(trim(name),'V7');
quit;

Step 4. Modify the data set with new names and labels using PROC DATASETS

With the body of the LABEL and RENAME statements built, it's time to plug them into a PROC DATASETS step. PROC DATASETS can change data set attributes such as variable names, labels, and formats without requiring a complete rewrite of the data -- it's a very efficient operation.

I include the LABEL statement first, since it references the original variable names. Then I include the RENAME statement, which changes the variable names to their new V7-compliant values.

Finally, I reset the VALIDVARNAME= option to the normal V7 sanity. (Unless you're running in SAS Enterprise Guide, in which case the option is already set to ANY by default. Check this blog post for a less disruptive method of setting/restoring options.)

proc datasets lib=work nolist ;
  modify probly / memtype=data;
  label &labelStmt.;
  rename &renameStmt.;
  /* optional: report on the var names/labels */
  contents data=probly nodetails;
quit;
 
/* reset back to the old rules */
options validvarname=v7;

Here's the CONTENTS output from the PROC DATASETS step, which shows the final variable attributes. I now have easy-to-code variable names, and they still have their descriptive labels. My data dictionary dreams are coming true!

DATASETS rename output

Download the entire program example from my public Gist: import_renameV7.sas.

The post How to download and convert CSV files for use in SAS appeared first on The SAS Dummy.

1月 272015
 

SAS Technical Support Problem Solvers“Here’s Johnny!!!” and well sometimes John and sometimes Jonathan and sometimes Jon.

In the real world, you sometimes need to make matching character strings more flexible. This is especially common when merging data sets. Variables, especially names, are not always exactly the same in all sources of your data. When matching data, you need to be able to programmatically determine if ‘John Doe’ is the same as ‘Johnny Doe’. The term most often associated with this type of matching is ‘fuzzy matching’. Fortunately within SAS, there are several functions that allow you to perform a fuzzy match. I’ll show you the most common of these functions and then I will show you an example that uses my favorite from this list.

COMPARE Function

The COMPARE function returns the position of the leftmost character by which two strings differ, or returns 0 if there is no difference. This function is most useful when comparing two strings that should be the same, but may only differ in case or due to leading or trailing blanks. This function takes the place of the following code:

if strip(upcase(name1))=strip(upcase(name2)) then do;

The COMPARE function can be used to make the same comparison.

if compare(name1,name2,’il’)=0 then do;

COMPGED Function

The COMPGED function returns the generalized edit distance between two strings. Specifically, the COMPGED function returns a generalization of the Levenshtein edit distance, which is a measure of dissimilarity between two strings. The Levenshtein edit distance is the number of operations (deletions, insertions, or replacement) of a single characters that are required to transform string-1 into string-2. 

Each operation basically ‘costs’ a certain value. For example, if string-1 is the result of inserting a space into string-2, this has a cost of 10. The more dramatic the operation, the greater the cost.  The COMPGED will return the total cost for all operations that occur. The costs returned by COMPGED can be altered by using CALL COMPCOST so that the cost are specific to your needs. A common use I have seen for using the COMPGED function is using it to compare email addresses.

email1='JohnDoe@abc.com';                                                                                                        
email2='John_Doe@abc.com';                                                                                                          
cost=compged(email1,email2);

The value of COST will be 30 which is the cost of adding punctuation to a string.

COMPLEV Function

The COMPLEV function is very similar to the COMPGED function. The difference is that the Levenshtein edit distance that is computed by COMPLEV is a special case of the generalized edit distance that is computed by COMPGED. The result is the COMPLEV executes much more quickly than COMPGED. However, the COMPLEV function is not as powerful or versatile as the COMPGED function. The COMPLEV function is generally most useful when comparing simple strings and when speed of comparison is important.

SPEDIS Function

The SPEDIS function is the oldie-but-a-goodie of the bunch. The SPEDIS function determines the likelihood of two words matching, expressed as the asymmetric spelling distance between the two words. SPEDIS is similar to COMPGED in that it assigns a cost to the each operation such as swap, append and delete. SPEDIS will sum the costs and then divide the sum by the length of the first argument. It is important to remember this because this makes the order of the arguments important. This means that SPEDIS(X,Y) does not always equal SPEDIS(Y,X).

Summary: Review your data

Any process that is described as ‘fuzzy’ is obviously not an exact science. With each of these functions using different algorithms, each is going to have its own strengths.

  • If you are comparing complex strings and need the most control, then I would recommend looking at the COMPGED function.
  • If you are comparing fairly simple strings and within large data sets, then COMPLEV may be a better choice.

The bottom line is that you will have to review your data and what differences are important to you. The results returned by these functions are subjective. You have to determine what an acceptable difference is. I suggest that you simply test out each function to see which one works best for you.

COMPGED and SOUNDEX Example

Of the above functions, COMPGED is the one I tend to use most often because, for the scenarios brought to me by various customers, COMPGED has produced the most precise results. I have been able to get the best results by combining the COMPGED function with the SOUNDEX function. Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.

The below example encodes each of the first names that is to be compared with the SOUNDEX function and then evaluates the results of using SOUNDEX with the COMPGED function. The goal of this example is to determine if the value of FNAME is the same as ‘Johnathan’ or ‘William’.

/* Sample data that contains names and the class being taken */                                                                         
data class;                                                                                                                             
  input fname : $12. lname $ class : $9.;                                                                                               
  datalines;                                                                                                                            
Jon Smith Math                                                                                                                          
John Smith Math                                                                                                                         
Johnny Smith Math                                                                                                                       
James Smith Math                                                                                                                        
Will Miller Chemistry                                                                                                                   
Willy Miller Chemistry                                                                                                                  
Willie Miller Chemistry                                                                                                                 
Bonny Davis Gym                                                                                                                         
Milly Wilson Biology                                                                                                                    
;                                                                                                                                       
                                                                                                                                        
/*  Data set that contains the name and the grade for the class */                                                                      
data grade;                                                                                                                             
  input fname : $12. lname $ grade $;                                                                                                   
  datalines;                                                                                                                            
Johnathan Smith A                                                                                                                       
William Miller B                                                                                                                        
;                                                                                                                                       
                                                                                                                                        
                                                                                                                                        
/* This code uses both the SOUNDEX and COMPGED functions.  I have found */                                                              
/* that by comparing the strings produced by SOUNDEX, I have been able */                                                               
/* to get a tighter comparison.                                        */                                                               
data c;                                                                                                                                 
  set class;                                                                                                                            
  tmp1=soundex(fname);                                                                                                                  
  do i=1 to nobs;                                                                                                                       
    set grade(rename=(fname=fname2)) point=i nobs=nobs;                                                                                 
    tmp2=soundex(fname2);                                                                                                               
    dif=compged(tmp1,tmp2);                                                                                                             
    if dif<=50 then do;                                                                                                                 
      possible_match='Yes';                                                                                                             
      drop  i tmp1 tmp2  fname2;                                                                                                        
      output;                                                                                                                           
    end;                                                                                                                                
  end;                                                                                                                                  
run;                                                                                                                                    
proc print; run;

One of the great things about SAS is the number of different ways you can accomplish a given goal.  If you have a technique to make fuzzy comparisons that you like to use, please share.  I would love to see some of the techniques being used!

tags: character data, fuzzy match, Problem Solvers, SAS functions, soundex
9月 222011
 

SAS programming is taught in schools all over the world, including in high schools.  Occasionally, I receive questions via my blog such as this one:

Can somebody help me on this?
Write a short DATA _NULL_ step to determine the largest integer you can store on your computer in 3, 4, 5, 6, and 7 bytes.

This sounds like a homework assignment to me, not a typical "how do I" programming question from a person trying to get a job done.  I'm pretty sure that the professor who assigned it would not sanction a lazy web approach.

Besides that, given the brief problem statement above, how do we know what the professor wants?  Is the student supposed to write a program to calculate the largest integer that you can safely store within SAS?  Or is it okay to be clever and simply use built-in SAS functions to tell you the answer?  The first approach is an exercise in programming logic and math.  The second approach is a test of your resourcefulness -- can you find the answer without a "brute force" approach, perhaps by becoming familiar with SAS documentation?  Both approaches are valid, and will provide you with practice in skills that will help you with your ongoing SAS programming endeavors.

At the risk of rewarding lazy behavior, I will present one method to find the answer.  I'm sharing it only because it sheds light on the useful CONSTANT function, and other readers might find that helpful in their own work.

In this example, we'll use the EXACTINT constant. From the SAS documentation:

The exact integer is the largest integer k such that all integers less than or equal to k in absolute value have an exact representation in a SAS numeric variable of length nbytes. This information can be useful to know before you trim a SAS numeric variable from the default 8 bytes of storage to a lower number of bytes to save storage.

Here's the program:

data _null_;
  array len{8} _numeric_ BYTES_1-BYTES_8;
  do i=3 to 8;
   len{i} = constant('exactint',i);
  end;
  pi = constant('pi');
  put "Decimal:" (BYTES_3-BYTES_8) (=/comma32.0);
  put "Hexidecimal:" (BYTES_3-BYTES_8) (=/hex14.);
  put "Binary:" (BYTES_3-BYTES_8) (=/binary64.);
  put "and a slice of Pi:" (pi) (=/16.14);
run;

You'll have to run the program in SAS to see the results. As my high school history teacher used to say, "I'm not going to spoon-feed the answers to you." (Although I suppose that's what I just did...)

Bonus (even though you don't deserve it)

Here are a couple of resources that may be helpful in future assignments:

tags: CONSTANT function, homework, SAS functions, SAS programming