Problem Solvers

1月 112020
 
In SAS 9.3 and earlier, the default value of the YEARCUTOFF= option is 1920. This default setting could trigger data integrity issues because any 2-digit years of "20" in dates will be assumed to occur in 1920 instead of 2020. If the intended year in the date is 2020, you must set the YEARCUTOFF= option to a value larger than 1920. Of course, the best alternative is to always specify date values with 4-digit years.

Luckily, SAS makes it easy to change the YEARCUTOFF= option so that it works best for your data. The default value for the YEARCUTOFF= option changed in SAS 9.4 to 1926. This change makes it easier for customers who are still using 2-digit years of "20" to make sure that the date is assigned to 2020. Let's review some of the frequently asked questions that customers ask about how SAS works with 2-digit years.

What is the YEARCUTOFF= option?

The YEARCUTOFF= option lets you specify which century SAS software should assign to dates with 2-digit years.

How do I specify the YEARCUTOFF= option in my SAS programs?

The option is specified in an OPTIONS statement. Here is an example:

options yearcutoff=1930;

You can also specify the option in an autoexec file or a config file. If you don't specify the YEARCUTOFF= option, the SAS system default is used. Remember that 1920 is the default for SAS 9.3 and earlier releases and 1926 is the default for SAS 9.4. (For reference, SAS 9.3 was released in 2011. The first release of SAS 9.4 was released in 2013.)

How does the YEARCUTOFF= option work?

The YEARCUTOFF= option specifies the first year of a 100-year window in which all 2-digit years are assumed to occur. For example, if the YEARCUTOFF= option is set to 1920, all 2-digit years are assumed to occur between 1920 and 2019. This means that two-digit years from 20 - 99 are assigned a century prefix of "19" and all 2-digit years from 00 - 19 have a century prefix of "20."

Which types of date values are affected by the YEARCUTOFF= option?

The YEARCUTOFF= option affects the interpretation of 2-digit years in the following cases:

  • Reading date values from external files
  • Specifying dates or year values in SAS functions
  • Specifying SAS date literals

The YEARCUTOFF= option does not influence the following cases:

  • Processing dates with 4-digit years
  • Processing dates already stored as SAS date values (the number of days since January 1, 1960)
  • Displaying dates with SAS date formats

Which value should I set the YEARCUTOFF= option to?

The optimal value depends on the range of dates in your data. The YEARCUTOFF= option should be set so that the 100-year range encompasses the range of your data values. In general, SAS recommends setting the YEARCUTOFF= option to a value equal to or slightly less than the first year in your data. For example, if the range of dates that you are processing is from 1930 - 2010, a YEARCUTOFF value of 1925 or 1930 would be appropriate. If you set YEARCUTOFF=1925, then all 2-digit years are assumed to be in the 100-year period from 1925 to 2024. If all the dates in your data fall within that range, they will be interpreted correctly.

What do I do if my dates with 2-digit years span more than 100 years?

The YEARCUTOFF= option cannot reliably assign centuries to 2-digit years if the range of dates for a variable is greater than 100 years. If the date ranges for a variable span more than 100 years, you must either specify the dates with 4-digit years or use DATA step logic to assign a century to each year (perhaps based on the value of another variable).

But why does the YEARCUTOFF= option allow only a 100-year span? If the YEARCUTOFF= option allowed for more than a 100-year span, there would be no way to determine which century a 2-digit year should have. For example, let’s assume that YEARCUTOFF=1950 with a 150-year span and your external data file had 2-digit years. In this scenario, your 150-year span would be from 1950 to 2100. Since you have 2-digit years, there would be no way to determine if the year 00 was meant to occur during 2000 or 2100.

How do I change the default setting for all the SAS users at my site?

Setting system default option values is usually done by a site SAS Installation Representative. The recommended method for setting a system default YEARCUTOFF= value is to specify the desired value in the system SAS configuration file. Note that even if you set a default value for all the users at your site, they can override the default value in their SAS programs, in personal autoexec files, in config files, by setting an environment variable, or when invoking SAS software.

How do I change the default setting for my own programs if I want a default that is different from the rest of the users at my site?

You can specify personal default values either in a personal configuration file or in an autoexec file. If you specify the value in a personal configuration file, the syntax depends on your operating system and is the same as that for setting the value in the system-wide configuration file on each system. If you use an autoexec file, you can specify the YEARCUTOFF= option in an OPTIONS statement.  Here is an example:

options yearcutoff=1930;

Additional Resources

  • YEARCUTOFF= System Option section in SAS® 9.3 System Options: Reference, Second Edition
  • YEARCUTOFF= System Option section in SAS® 9.4 System Options: Reference, Fifth Edition
  • SAS Note 46368, "The default value for the YEARCUTOFF= system option has changed in SAS® 9.4 and beyond"
  • SAS Note 65307, "You might encounter an issue in which 2-digit year dates have the wrong century in SAS® 9.3 and earlier releases"

Why does my SAS date have the wrong century? was published on SAS Users.

12月 102019
 
The DATA step has been around for many years and regardless of how many new SAS® products and solutions are released, the DATA step remains a popular way to create and manipulate SAS data sets. The SAS language contains a wide variety of approaches that provide an endless number of ways to accomplish a goal. Whether you are reshaping a data set entirely or simply assigning values to a new variable, there are numerous tips and tricks that you can use to save time and keystrokes. Here are a few that Technical Support offers to customers on a regular basis.

Writing file contents to the SAS® log

Perhaps you are reading a file and seeing “invalid data” messages in the log or you are not sure which delimiter is used between variables. You can use the null INPUT statement to read a small sample of the data. Specify the amount of data that you want to read by adjusting options in the INFILE statement. This example reads one record with 500 bytes:

   data _null_;
      infile 'path' recfm=f lrecl=500 obs=1;
      input;
      list;
   run;

The LIST statement writes the current record to the log. In addition, it includes a ruled line above the record so that it is easier to see the data for each column of the file. If the data contains at least one unprintable character, you will see three lines of output for each record written.

For example:

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+
 
1   CHAR  this is a test.123.dog.. 24
    ZONE  766726726276770333066600
    NUMR  48930930104534912394F7DA

The line beginning with CHAR describes the character that is represented by the two hexadecimal characters underneath. When you see a period in that line, it might actually be a period, but it often means that the hexadecimal characters do not map to a specific key on the keyboard. Also, column 15 for this data is a tab ('09'x). By looking at this record in the log, you can see that the first variable contains spaces and that the file is tab delimited.

Writing hexadecimal output for all input

Since SAS® 9.4M6 (TS1M6), the HEXLISTALL option in the LIST statement enables all lines of input data to be written in hexadecimal format to the SAS log, regardless of whether there are unprintable characters in the data:

   data mylib.new / hexlistall;

Removing “invalid data” messages from the log

When SAS reads data into a data set, it is common to encounter values that are invalid based on the variable type or the informat that is specified for a variable. You should examine the SAS log to assess whether the informat is incorrect or if there are some records that are actually invalid. After you ensure that the data is being read correctly, you can dismiss any “invalid data” messages by using double question mark (??) modifiers after the variable name in question:

   input date ?? mmddyy10.;

You can also use the question mark modifiers when you convert a character variable to numeric with the INPUT function:

   newdate=input(olddate,?? mmddyy10.);

The above syntax reads all values with the MMDDYY10. informat and then dismisses the notes to the log when some values for the OLDDATE variable are invalid.

Sharing files between UNIX and Microsoft Windows operating systems

The end-of-record markers in text files are different for UNIX and Windows operating systems. If you needed to share a file between these systems, the file used to need preprocessing in order to change the markers to the desired type or perhaps specify a delimiter. Now, the TERMSTR= option in the INFILE statement enables you to specify the end-of-record marker in the incoming file.

If you are working in a UNIX environment and you need to read a file that was created in a Windows environment, use the TERMSTR=CRLF option:

   infile 'file-specification'  termstr=crlf ;

If you are in a Windows environment and you need to read a file that was created in a UNIX environment, use this syntax:

   infile 'file-specification'  termstr=lf ;

Adapting array values from an ARRAY statement

The VNAME function makes it very convenient to use the variable names from an ARRAY statement. You most often use an ARRAY statement to obtain the values from numerous variables. In this example, the SQL procedure makes it easy to store the unique values of the variable Product into the macro variable &VARLIST and that number of values into the macro variable &CT (another easy tip). Within the DO loop, you obtain the names of the variables from the array, and those values from the array then become variable names.

   proc sql noprint;
      select distinct(product) into :varlist separated by ' '
      from one;
      select count(distinct product) into :ct
      from one;
   quit;
 
   …more DATA step statements…
   array myvar(&ct) $ &varlist;
      do i=1 to &ct;
         if product=vname(myvar(i)) then do;
         myvar(i)=left(put(contract,8.));
   end;
   …more DATA step statements…

Using a mathematical equation in a DO loop

A typical DO loop has a beginning and an end, both represented by integers. Did you know you can use an equation to process data more easily? Suppose that you want to process every four observations as one unit. You can run code similar to the following:

   …more DATA step statements…
   J+1;
   do i=((j*4)-3) to (j*4);
      set data-set-name point=I;
      …more DATA step statements…
   end;

Using an equation to point to an array element

With small data sets, you might want to put all values for all observations into a single observation. Suppose that you have a data set with four variables and six observations. You can create an array to hold the existing variables and also create an array for the new variables. Here is partial code to illustrate how the equation dynamically points to the correct new variable name in the array:

   array old(4) a b c d;
   array test (24) var1-var24;
   retain var1-var24;
   do i=1 to nobs;
      set w nobs=nobs point=i;
      do j=1 to 4;
      test(((i-1)*4)+j)=old(j);
      end;
   end;

Creating a hexadecimal reference chart

How often have you wanted to know the hexadecimal equivalent for a character or vice versa? Sure, you can look up a reference chart online, but you can also create one with a short program. The BYTE function returns values in your computer’s character set. The PUT statement writes the decimal value, hexadecimal value, and equivalent character to the SAS log:

   data a;
      do i=0 to 255;
      x=byte(i);
      put i=z3. i=hex2. x=;
   end;
   run;

When the resulting character is unprintable, such as a carriage return and line feed (CRLF) character, the X value is blank.

Conclusion

Hopefully, these new tips will be useful in your future programming. If you know some additional tips, please comment them below so that readers of this blog can add even more DATA step tips to their arsenal! If you would like to learn more about DATA step, check out these other blogs:

Old reliable: DATA step tips and tricks was published on SAS Users.

12月 102019
 
The DATA step has been around for many years and regardless of how many new SAS® products and solutions are released, the DATA step remains a popular way to create and manipulate SAS data sets. The SAS language contains a wide variety of approaches that provide an endless number of ways to accomplish a goal. Whether you are reshaping a data set entirely or simply assigning values to a new variable, there are numerous tips and tricks that you can use to save time and keystrokes. Here are a few that Technical Support offers to customers on a regular basis.

Writing file contents to the SAS® log

Perhaps you are reading a file and seeing “invalid data” messages in the log or you are not sure which delimiter is used between variables. You can use the null INPUT statement to read a small sample of the data. Specify the amount of data that you want to read by adjusting options in the INFILE statement. This example reads one record with 500 bytes:

   data _null_;
      infile 'path' recfm=f lrecl=500 obs=1;
      input;
      list;
   run;

The LIST statement writes the current record to the log. In addition, it includes a ruled line above the record so that it is easier to see the data for each column of the file. If the data contains at least one unprintable character, you will see three lines of output for each record written.

For example:

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+
 
1   CHAR  this is a test.123.dog.. 24
    ZONE  766726726276770333066600
    NUMR  48930930104534912394F7DA

The line beginning with CHAR describes the character that is represented by the two hexadecimal characters underneath. When you see a period in that line, it might actually be a period, but it often means that the hexadecimal characters do not map to a specific key on the keyboard. Also, column 15 for this data is a tab ('09'x). By looking at this record in the log, you can see that the first variable contains spaces and that the file is tab delimited.

Writing hexadecimal output for all input

Since SAS® 9.4M6 (TS1M6), the HEXLISTALL option in the LIST statement enables all lines of input data to be written in hexadecimal format to the SAS log, regardless of whether there are unprintable characters in the data:

   data mylib.new / hexlistall;

Removing “invalid data” messages from the log

When SAS reads data into a data set, it is common to encounter values that are invalid based on the variable type or the informat that is specified for a variable. You should examine the SAS log to assess whether the informat is incorrect or if there are some records that are actually invalid. After you ensure that the data is being read correctly, you can dismiss any “invalid data” messages by using double question mark (??) modifiers after the variable name in question:

   input date ?? mmddyy10.;

You can also use the question mark modifiers when you convert a character variable to numeric with the INPUT function:

   newdate=input(olddate,?? mmddyy10.);

The above syntax reads all values with the MMDDYY10. informat and then dismisses the notes to the log when some values for the OLDDATE variable are invalid.

Sharing files between UNIX and Microsoft Windows operating systems

The end-of-record markers in text files are different for UNIX and Windows operating systems. If you needed to share a file between these systems, the file used to need preprocessing in order to change the markers to the desired type or perhaps specify a delimiter. Now, the TERMSTR= option in the INFILE statement enables you to specify the end-of-record marker in the incoming file.

If you are working in a UNIX environment and you need to read a file that was created in a Windows environment, use the TERMSTR=CRLF option:

   infile 'file-specification'  termstr=crlf ;

If you are in a Windows environment and you need to read a file that was created in a UNIX environment, use this syntax:

   infile 'file-specification'  termstr=lf ;

Adapting array values from an ARRAY statement

The VNAME function makes it very convenient to use the variable names from an ARRAY statement. You most often use an ARRAY statement to obtain the values from numerous variables. In this example, the SQL procedure makes it easy to store the unique values of the variable Product into the macro variable &VARLIST and that number of values into the macro variable &CT (another easy tip). Within the DO loop, you obtain the names of the variables from the array, and those values from the array then become variable names.

   proc sql noprint;
      select distinct(product) into :varlist separated by ' '
      from one;
      select count(distinct product) into :ct
      from one;
   quit;
 
   …more DATA step statements…
   array myvar(&ct) $ &varlist;
      do i=1 to &ct;
         if product=vname(myvar(i)) then do;
         myvar(i)=left(put(contract,8.));
   end;
   …more DATA step statements…

Using a mathematical equation in a DO loop

A typical DO loop has a beginning and an end, both represented by integers. Did you know you can use an equation to process data more easily? Suppose that you want to process every four observations as one unit. You can run code similar to the following:

   …more DATA step statements…
   J+1;
   do i=((j*4)-3) to (j*4);
      set data-set-name point=I;
      …more DATA step statements…
   end;

Using an equation to point to an array element

With small data sets, you might want to put all values for all observations into a single observation. Suppose that you have a data set with four variables and six observations. You can create an array to hold the existing variables and also create an array for the new variables. Here is partial code to illustrate how the equation dynamically points to the correct new variable name in the array:

   array old(4) a b c d;
   array test (24) var1-var24;
   retain var1-var24;
   do i=1 to nobs;
      set w nobs=nobs point=i;
      do j=1 to 4;
      test(((i-1)*4)+j)=old(j);
      end;
   end;

Creating a hexadecimal reference chart

How often have you wanted to know the hexadecimal equivalent for a character or vice versa? Sure, you can look up a reference chart online, but you can also create one with a short program. The BYTE function returns values in your computer’s character set. The PUT statement writes the decimal value, hexadecimal value, and equivalent character to the SAS log:

   data a;
      do i=0 to 255;
      x=byte(i);
      put i=z3. i=hex2. x=;
   end;
   run;

When the resulting character is unprintable, such as a carriage return and line feed (CRLF) character, the X value is blank.

Conclusion

Hopefully, these new tips will be useful in your future programming. If you know some additional tips, please comment them below so that readers of this blog can add even more DATA step tips to their arsenal! If you would like to learn more about DATA step, check out these other blogs:

Old reliable: DATA step tips and tricks was published on SAS Users.

11月 052019
 

After you start to use arrays, it is easy to see their usefulness and power. In this post, I will share tips and techniques that can increase the functionality of arrays and enable you to use programming shortcuts to take maximum advantage of them.

These tips will also streamline your programming, even if you are not using arrays. I'll provide complete code examples for each tip at the end of each section, as well as explain how the code is constructed along the way.

Use arrays to zoom out for greater perspective

One of the biggest uses of arrays, of course, is to reshape your data from one observation per identifying variable per data point, to one observation per ID containing all the data points for that ID. For example, consider the following data set:

Figure 1

Now, you want the values of the EVENT variable to become the names of the variables in the output data set, as shown here:

Figure 2

How do you know which variables to create if your data set is more complicated than shown above?

Create a macro variable

You can use the SQL procedure to create a macro variable (called EVENTLIST here) that contains all the unique values of EVENT. You then can use that macro variable in your array definition statement:

proc sql noprint;
   select distinct event into :eventlist
   separated by ' '
   from work.events;
quit;

When you run the following statement, the resulting log shows that &EVENTLIST contains the unique values for the EVENT variable:

%put &eventlist;

Here is the log information:

3365 %put &eventlist;
aaa bbb ccc ddd

You can then use the EVENTLIST macro variable in an ARRAY statement to define the variables that you are including:

array events_[*] &eventlist;

Use CALL MISSING to set all variables

However, you do not want the values for the previous ID to contaminate the next ID, so you need to reset the array variables to missing with each new ID. You can use a DO loop that uses FIRST.variable and LAST.variable processing in order to set each value to missing:

if first.id then do;
   do i = 1 to dim(events_);
      events_[i] = .;
   end;
end;

DO loops were commonly used in the past to initialize all elements of an array to missing. And it still needs to be used if you want to set all the elements to a non-missing value, such as 0. However, you can replace this code with one function call. The CALL MISSING routine sets all the variables in the array to missing in one statement:

if first.id then call missing(of events_[*]);

Compare via OF variable-list syntax

The OF variable-list syntax is another helpful feature. You can use OF with a list of variables or an array reference.

So, now you have the EVENT values as variable names. How are you going to compare the value of a variable in an observation with a variable name in the array? You can reference each element of the array that you created with EVENTS_[i], but that will return the value of that element. For this comparison, you need to obtain the variable name for each element in the array.

To return the name of each variable in the array, you can use the VNAME function:

if event = vname(events_[i]) then do;

Now you can find when the value of the EVENT variable in the original data set matches the name of the variable in the array.

Additional ways to extract variables information in arrays

Other variable information functions can also be used in the same way to extract information about each of the variables in the array.

In this example, a value that is read from each observation can match only one variable name in the array, so you want to stop when you achieve that instead of continuing the DO loop.

To stop the DO loop at that point, use the LEAVE statement, which stops processing the current loop and resumes with the next statement in the DATA step.

Because you want to output only one observation per ID, you must explicitly use the OUTPUT statement at the end of each ID to override the default output that occurs at the end of each iteration of the DATA step:

if last.id then output;

Combine above methods with DATA Step

Putting all these methods together, here is a short DATA Step that will reshape the original data set into the output data set that you want:

data events_out (drop =i event);
   set events;
   by id;
   array events_[*] &eventlist;
   retain &eventlist;
   if first.id then call missing(of events_[*]);
   do i = 1 to dim(events_);
      if event = vname(events_[i]) then do;
	 events_[i] = 1;
	 leave;
   end;
end;
if last.id then output;
run;

Bonus shortcuts using arrays

Want to use even more shortcuts with arrays? Here, your original data has a numeric variable called RATING, which is included in every observation:

Figure 3

You want to find the lowest rating and corresponding event and also the highest rating and corresponding event for each ID. You can start off by defining the arrays and reading each observation in the same way as in the previous example. In this case, you have two arrays, so you need to read both the EVENTS and RATINGS variables into their own arrays:


array ratings_[5];
array events_[5] $3.;
…more SAS statements…
ratings_[count]=rating;
events_[count]=event;

After you have read all the observations for an ID, how do you find the lowest rating? Of course, you can use a DO loop to loop through the array and check each value to see whether it is the lowest and keep track of which one it is. Or you can use the OF array syntax with the MIN and MAX functions:


lowest_rating=min(of ratings_[*]);
highest_rating=max(of ratings_[*]);

This code returns the value of the lowest and highest ratings for ID 1, which are 3 and 9, respectively. But you also want to know which events were rated the highest and lowest for this ID. First, you have to determine which elements in the RATINGS_ array had the lowest and highest values. You know the values, but you do not yet know which specific elements match those values. Use the WHICHN function to determine the indexes for those values in the array. Keep in mind that if there is more than one element that matches the value using the WHICHN function, the function returns only the index of the first element in the array to match:


min_index=whichn(lowest_rating, of ratings_[*]);
max_index=whichn(highest_rating, of ratings_[*]);

For ID 1, these functions return 3 and 5, respectively. Now you can use the indexes to retrieve the corresponding elements in the EVENTS_ array:


lowest_rated_event=events_[min_index];
highest_rated_event=events_[max_index];

Putting all these methods together, here is the complete code for returning the desired results:

data rating (keep = id lowest_rated_event lowest_rating  
             highest_rated_event highest_rating);
   set events;
   by id;
   array ratings_[5];
   array events_[5] $3.;
   retain ratings: events:;
   if first.id then call missing(of ratings_[*],of events_[*], count);
   count + 1;
   ratings_[count] = rating;
   events_[count]=event;
   if last.id then do;
      lowest_rating = min(of ratings_[*]);
	 highest_rating = max(of ratings_[*]);
	 min_index =whichn(lowest_rating,of ratings_[*]);
	 max_index=whichn(highest_rating, of ratings_[*]);
	 lowest_rated_event=events_[min_index];
	 highest_rated_event=events_[max_index];
	 output;
   end;
run;

Here is the resulting output:

Figure 4

You can expand the functionality and capability of arrays by incorporating other features of the SAS programming language in your DATA Step code. I hope these tips and links improve your array use and lead you to explore even more ways to work with arrays. The resource below complements this post and provides additional tips and tricks.

Adventures in Arrays | Learn more about array processing!

How to make the most of arrays with SAS® software and streamline your programming was published on SAS Users.

10月 162019
 

Introduction

Generating a word cloud (also known as a tag cloud) is a good way to mine internet text. Word (or tag) clouds visually represent the occurrence of keywords found in internet data such as Twitter feeds. In the visual representation, the importance of each keyword is denoted by the font size or font color.

You can easily generate Word clouds by using the Python language. Now that Python has been integrated into the SAS® System (via the SASPy package), you can take advantage of the capabilities of both languages. That is, you create the word cloud with Python. Then you can use SAS to analyze the data and create reports. You must have SAS® 9.4 and Python 3 or later in order to connect to SAS from Python with SASPy. Developed by SAS, SASPy a Python package that contains methods that enable you to connect to SAS from Python and to generate analysis in SAS.

Configuring SASPy

The first step is to configure SASPy. To do so, see the instructions in the SASPy Installation and configuration document. For additional details, see also the SASPy Getting started document and the API Reference document.

Generating a word cloud with Python

The example discussed in this blog uses Python to generate a word cloud by reading an open table from the data.world website that is stored as a CSV file. This file is from a simple Twitter analysis job where contributors commented via tweets as to how they feel about self-driving cars. (For this example, we're using data that are already scored for sentiment. SAS does offer text analytics tools that can score text for sentiment too -- see this example about rating conference presentations.) The sentiments were classified as very positive, slightly positive, neutral, slightly negative, very negative, and not relevant. (In the frequency results that are shown later, these sentiments are specified, respectively, as 1, 2, 3, 4, 5, and not_relevant.) This information is important to automakers as they begin to -design more self-driving vehicles and as transportation companies such as Uber and Lyft are already adding self- driving cars to the road. Along with understanding the sentiments that people expressed, we are also interested in exactly what is contributors said. The word cloud gives you a quick visual representation of both. If you do not have the wordcloud package installed, you need to do that by submitting the following command:

pip install wordcloud

After you install the wordcloud package, you can obtain a list of required and optional parameters by submitting this command:

?wordcloud

Then, follow these steps:

  1. First, you import the packages that you need in Python that enable you to import the CSV file and to create and save the word-cloud image, as shown below.

  2. Create a Python Pandas dataframe from the twitter sentiment data that is stored as CSV data in the data file. (The data in this example is a cleaned-up subset of the original CSV file on the data.world website.)

  3. Use the following code, containing the HEAD() method, to display the first five records of the Sentiment and Text columns. This step enables you to verify that the data was imported correctly.

  4. Create a variable that holds all of the text in a single row of data that can be used in the generation of the word cloud.

  5. Generate the word cloud from the TEXTVAR variable that you create in step 4. Include any parameters that you want. For example, you might want to change the background color from black to white (as shown below) to enable you to see the values better. This step includes the STOPWORDS= parameter, which enables you to supply a list of words that you want to eliminate. If you do not specify a list of words, the parameter uses the built-in default list.

  6. Create the word-cloud image and modify it, as necessary.

Analyzing the data with SAS®

After you create the word cloud, you can further analyze the data in Python. However, you can actually connect to SAS from Python (using the SASPy API package), which enables you to take advantage of SAS software's powerful analytics and reporting capabilities. To see a list of all available APIs, see the API Reference.

The following steps explain how to use SASPy to connect to SAS.

  1. Import the SASPy package (API) . Then create and generate a SAS session, as shown below. The code below creates a SAS session object.

  2. Create a SAS data set from the Python dataframe by using the DATAFRAME2SASDATA method. In the code below, that method is shown as the alias DF2DS.

  3. Use the SUBMIT() method to include SAS code that analyzes the data with the FREQ procedure. The code also uses the GSLIDE procedure to add the word cloud to an Adobe PDF file.

    When you submit the code, SAS generates the PDF file that contains the word-cloud image and a frequency analysis, as shown in the following output:

Summary

As you can see from the responses in the word cloud, it seems that the contributors are quite familiar with Google driverless cars. Some contributors are also familiar with the work that Audi has done in this area. However, you can see that after further analysis (based on a subset of the data), most users are still unsure about this technology. That is, 74 percent of the users responded with a sentiment frequency of 3, which indicates a neutral view about driverless cars. This information should alert automakers that more education and marketing is required before they can bring self-driving cars to market. This analysis should also signal companies such as Uber Technologies Inc. and Lyft, Inc. that perhaps consumers need more information in order to feel secure with such technology.

Creating a word cloud using Python and SAS® software was published on SAS Users.

8月 162019
 

The Output Delivery System (ODS) Graphics procedures provide many options to give you control over the look of your output. However, there are times when your output does not look like you thought it would.

This blog discusses how to solve some common output-related problems that we hear about in Technical Support.

All of the examples in this blog relate to creating scatter plots and bar charts from the same data set, SASHELP.CLASS. This data set, included in your SAS® installation, provides information about heights and ages for both male and female students.

Colors in the output are not as desired

Using the STYLEATTRS statement

The STYLEATTRS statement enables you to define attributes, such as color, for graphical elements.

In the following SGPLOT procedure example, the STYLEATTRS statement defines the colors for the marker symbols on a scatter plot as either blue or pink:

proc sgplot data=sashelp.class;
   styleattrs datacolors=(blue pink);
   scatter x=age y=height / group=sex
   markerattrs=(symbol=circlefilled);
run;

However, after you submit the code, the resulting plot does not use the specified colors. Instead, you see blue and red:

When defining colors for graphical elements, the DATACOLORS= option defines the colors for filled areas, and the DATACONTRASTCOLORS= option defines the colors for marker symbols and lines.

Because the scatter plot is creating marker symbols, you have to change the STYLEATTRS statement to use the DATACONTRASTCOLORS= option instead of the DATACOLORS= option. Here is the revised code:

proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink);
   scatter x=age y=height / group=sex 
   markerattrs=(symbol=circlefilled);
run;

Now, when you submit the updated code, the correct colors appear:

You can find more information about the STYLEATTRS statement in the STYLEATTRS section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Using an attribute map

An attribute map enables you to associate specific values for your plot GROUP= variable with specific graphical attributes.

The attribute map is defined in a data set that includes the following:

  • an ID variable that contains the name of the attribute map definition
  • a VALUE variable that contains the value of the plot statement GROUP= variable
  • any other variables for the attributes that you want to define

In the following example, the attribute map BARCOLORS is defined to associate the group value F with the color pink and the group value M with the color blue. Note that the FILLCOLOR variable is also used to define the colors for the bars of the bar chart.

data attrmap;
   id='barcolors';
      input value $ fillcolor $;
      datalines;
F pink
M blue
;
run;
proc format;
   value $ genderfmt
      'F'='Female'
      'M'='Male';
run;
proc sgplot data=sashelp.class dattrmap=attrmap;
   vbar age / response=height group=sex groupdisplay=cluster 
   nooutline attrid=barcolors;
   format sex $genderfmt.;
run;

However, the output shows blue and red bars, instead of using the pink and blue values that you specified:

In this case, a format is defined to display the group values as Female and Male. The attribute map associates the group values of F and M with the pink and blue bar colors that you want, so the values do not match.

You need to change the attribute map VALUE variable so that it contains the formatted value of the GROUP= variable. Here is the first part of the code again, highlighted to show where it has changed:

With the updated values, the output now displays the correct colors:

You can find more information about attribute maps in the SG Attribute Maps section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Symbols in the output are not as desired

In the following PROC SGPLOT example, the STYLEATTRS statement defines the colors for the marker symbols on a scatter plot as either blue or pink. Also, the marker symbols should be either filled circles or filled squares:

ods html style=styles.htmlblue;
proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink) 
   datasymbols=(circlefilled squarefilled);
   scatter x=age y=height / group=sex;
run;

You submit the code using the STYLES.HTMLBLUE style. The output shows all the symbols as circles, and none are squares:

The ATTRPRIORITY ODS Graphics option determines how attributes are cycled. The default value for the ATTRPRIORITY option is defined in the style that is being used.

The STYLES.HTMLBLUE style sets the default value COLOR for the ATTRPRIORITY option. This COLOR value cycles the symbols through your specified colors before the second symbol is generated.

You want to set the ATTRPRIORITY ODS Graphics option to NONE in an ODS GRAPHICS statement. That ODS GRAPHICS statement then prevents the symbols from cycling through the colors list:

ods graphics /attrpriority=none;
ods html style=styles.htmlblue;
proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(blue pink) 
   datasymbols=(circlefilled squarefilled);
   scatter x=age y=height / group=sex;
run;

In the updated output, squares are now seen in the correct color, pink:

You can read more about how attributes are cycled in the following blog post by Rick Wicklin:
Attrs, attrs, everywhere: The interaction between ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS in ODS graphics

Annotation is not placed in the output as desired

Adding an oval

In this example, you want the resulting scatter plot to contain an oval around the circle that represents the tallest student:

proc sql;
   create table maxheight as
   select height, age
   from sashelp.class
        having height=max(height);
quit;
 
data anno;
   set maxheight;
       drawspace='datavalue';
       function='oval';
       x=age;
       y=height;
       width=6;
       height=6;
       linecolor='red';
run;
 
proc sgplot data=sashelp.class sganno=anno;
   scatter x=age y=height;
      xaxis offsetmax=0.1 offsetmin=0.1;
      yaxis offsetmax=0.1 offsetmin=0.1;
run;

After you submit the code, you notice that the resulting plot does not include the oval:

If you have used annotation in SAS/GRAPH® software, you might be accustomed to using the X and Y variables in the annotation data set to indicate the location of the annotation. However, ODS Statistical Graphics (SG) annotation uses X1 and Y1 variables for the location of the annotation.

Therefore, you need to change the X and Y variables in the annotation data set to X1 and Y1 instead:

data anno;
   set maxheight;
       drawspace='datavalue';
       function='oval';
       x1=age;
       y1=height;
       width=6;
       height=6;
       linecolor='red';
run;

In the next version of the scatter plot, the oval now appears:

Adding a text label

In this example, you want to place a text label next to the circle that represents the tallest student.

proc sql;
   create table maxheight as
   select height, age, name
   from sashelp.class
        having height=max(height);
quit;
 
data anno;
   set maxheight;
       drawspace='datavalue';
       function='label';
       x1=age;
       y1=height;
       label=name;
       textsize=10;
       textcolor='red';
       anchor='bottom';
run;
 
proc sgplot data=sashelp.class sganno=anno;
   scatter x=age y=height;
      xaxis offsetmax=0.1 offsetmin=0.1;
      yaxis offsetmax=0.1 offsetmin=0.1;
run;

However, after you run this code, the output does not include the text label:

Again, if you are a SAS/GRAPH user, you might assume that the LABEL function can place text on a plot. However, ODS SG annotation needs to use the TEXT function instead.

In the previous DATA step, you need to change the FUNCTION variable so that it contains the value TEXT:

data anno;
   set maxheight;
       drawspace='datavalue';
       function='text';
       x1=age;
       y1=height;
       label=name;
       textsize=10;
       textcolor='red';
       anchor='bottom';
run;

After you revise the DATA step and resubmit your code, you then see that the text label appears where intended:

You can find more information about SG annotation in the SG Annotation section of the SAS® 9.4 ODS Graphics: Procedures Guide, Sixth Edition documentation.

Summary

These are just a few examples to demonstrate some of the common output-related problems that we hear about in Technical Support. If your graphical output does not appear as you wanted, consider the options that you are using and make sure that you are using the correct option.

Learn More

How to fix common problems in output from ODS Graphics procedures was published on SAS Users.

7月 192019
 
The RANK procedure (PROC RANK) is useful for ranking numeric variables in a data set across observations. You often see PROC RANK used to rank data into quartiles, deciles, or percentiles. This action requires that you use the GROUPS= option in the PROC RANK statement.

This blog answers three questions related to using PROC RANK with groups and ties. Note that question two actually provide an alternative for using the DATA step when PROC RANK cannot provide what you need.

  1. What does PROC RANK do behind the code when you use the GROUPS= option in the PROC RANK statement?
  2. What do you do if you want equal groups, regardless of tied values?
  3. What do you do if you want groups to start at 1 rather than 0?

What does PROC RANK do when you use the GROUPS= option?

Most of us just accept the ranked values that are calculated by PROC RANK. But have you ever tried to figure out the calculation that happens behind the code? When you use the GROUPS= option in the PROC RANK statement, the values are assigned to groups that range from 0 to the number-of-groups minus 1, based on tied values. If you have many tied values, you might not obtain the number of groups that you request because observations with the same value are assigned to the same group.

The formula for calculating group values is as follows:

FLOOR(rank*k/(n+1))

In this formula:

  • rank is the data value's rank order
  • k is the value of the GROUPS= option
  • n is the number of nonmissing values

Consider the following example. If you want to see the original value as well as the ranking, use both the VAR and RANKS statements in your PROC RANK step. The RANKS variable contains the rank value. If you use a BY statement, the raw rankings start over in each BY group.

Example 1

data;
input x;
cards;
1
1
1
1
1
1
8
8
;
run;
proc rank data=test groups=5 out=test_rank 
 ties=mean /* low and high */;
var x;
ranks rank_x;
run;
 
proc print data=test_rank;
run;

Output

x rank_x
1 1
1 1
1 1
1 1
1 1
1 1
8 4
8 4

 
The following table shows the variable values, the raw ranks, the ties values, and the resulting ranks:

X RAW_RANK TIES=MEAN TIES=LOW TIES=HIGH RANK_MEAN RANK_LOW RANK_HIGH
1 1 3.5 1 6 1 0 3
1 2 3.5 1 6 1 0 3
1 3 3.5 1 6 1 0 3
1 4 3.5 1 6 1 0 3
1 5 3.5 1 6 1 0 3
1 6 3.5 1 6 1 0 3
8 7 7.5 7 8 4 3 4
8 8 7.5 7 8 4 3 4

 
Using the formula that is shown previously, k=5 and n=8. Since TIES=MEAN, you sum the raw ranks of the tied values of X and divide by the number of observations. For X=1, the rank is (1+2+3+4+5+6)=21/6=3.5. For X=2, the rank is (7+8)=15/2=7.5. Similarly, if you use TIES=LOW, for X=1, the rank is 1; for X=2, the rank is 7. Finally, if you use TIES=HIGH, for X=1, the rank is 6; for X=2, the rank is 8. When you insert those values into the formula for the first observation, you obtain the following results:

TIES=MEAN: Floor(3.5*5/9)=Floor(1.9)=1
TIES=LOW: Floor(1*5/9)=Floor(0.5)=0
TIES=HIGH: Floor(6*5/9)=Floor(3.3)=3

What do you do if you want equal groups, regardless of tied values?

Suppose that you want to create groups that have the same number of observations in each one, regardless of tied values. PROC RANK cannot do this. However, you can use the DATA step to accomplish this task.

You need to sort the data set by the ranking variable and then use the same formula in the DATA step, as shown below.

Example 2

proc sort data=test;
by x;
run;
 
data ranks;
set test nobs=numobs;
group=floor(_n_*5/(numobs+1));
run;
 
proc print data=ranks;
run;

Output

x group
1 0
1 1
1 1
1 2
1 2
1 3
8 3
8 4

 

What do you do if you want the groups to start at 1 rather than 0?

When you use the GROUPS= option, the values that are assigned to the groups start at 0. There is no way to indicate for PROC RANK to start the groups at 1. However, once you have the data set with the ranked values, you can add 1 using DATA step logic, as shown in this example:

Example 3

data test;
input y; 
cards;
11
22
10
15
25
;
run;
 
proc sort data=test; 
by y;
run;
 
proc rank data=test out=test_rank1 groups=3;
var y;
ranks rank_y;
run;
 
data test_rank1; 
set test_rank1; rank_y+1;
run;
 
proc print data=test_rank1;
run;

Output

y rank_y
10 1
11 2
15 2
22 3
25 3

 

Conclusion

PROC RANK has many statistical applications, such as helping you understand how your data is distributed. Hopefully, this blog has provided you with a better understanding of how ranks are determined when tied values are present in the data and when you want to assign those ranks to groups.

How the RANK procedure calculates ranks with groups and ties was published on SAS Users.

6月 212019
 

For every project in SAS®, the first step is almost always making your data available. This blog shows you how to load three of the most common input data types—a data set, a text file, and a Microsoft Excel file—into SAS® Cloud Analytic Services (CAS) tables.

The three methods that I show here are the three easiest ways to load each data type into CAS. Multiple tools can load data into CAS, but I am showing the tools that I consider the easiest to use and that are probably the most familiar to SAS programmers.

You need to place your data in a location that can be accessed by the programming environment that is used to access CAS. The most common programming environment that accesses CAS is SAS® Studio. Input data files that are used in CAS are going to be very large. You will need to use an SFTP tool to move your data from your PC to a directory that can be accessed by your SAS Studio session. (Check with your system administrator to see what the preferred tool is at your site.)

After your data is in a location that can be accessed by the programming environment, you need to start a CAS session. This is done with the CAS statement; here is the syntax:

cas session-name <option(s)>;

The options that you specify depend on how your system administrator configured your environment. For example, I asked my system administrator to set it up so that the only thing I need to do is issue the following statement:

cas;

That statement then creates a CAS session with the default name of CASAUTO, with an active caslib of CASUSER.

After you establish your CAS session, you can start loading data.

Load a SAS data set

The easiest way to load SAS data into CAS is to use a DATA step. The basic syntax is the same as it is when you are creating a SAS data set. The key difference is that the libref that is listed in the DATA step must point to a caslib.

The following example accesses SASHELP.CARS and then creates the table CARS in the CASUSER caslib.

cas;                   /* log on to the CAS session */
   caslib _all_ assign;   /* create a libref for each active 
                             caslib */
 
   data casuser.cars;     /* create the table in the caslib */
      set sashelp.cars;
   run;

The one thing to note about this code is that this DATA step is running in SAS and not CAS. A DATA step runs in CAS only if the input and output librefs are both using the CAS engine and only if it uses language elements that CAS supports. In this example, the SASHELP libref was not created with the CAS engine, so the step must run in SAS.

There are no calculations in this step, so there is no effect on performance.

When you load a data set into a CAS table, one task that you might want to perform is to promote the table. The following DATA step shows how to promote a table as you create it:

data casuser.cars(promote=yes);
      set sashelp.cars;
   run;

Promoting a table gives it a global scope. You can then access the table in multiple sessions, and the table is also available in SAS® Visual Analytics. The PROMOTE= data set option can be used with all three of the examples in this blog.

Load a delimited text file

The easiest way to load a delimited text file and store it as a CAS table is to use another familiar SAS step, the IMPORT procedure. The syntax is going to be basically the same as it is in SAS. Again, the key difference is that you need to point to a caslib in the OUT= option of the PROC IMPORT statement.

If you are running SAS® Studio 5.1, a common location for a text file that needs to be loaded into CAS is the SAS Content folder. This folder is a predefined repository for text files that you need to access from SAS Studio.

In order to access the files from SAS Content, you need to use the FILESRVC access method with the FILENAME statement. This method enables you to store and retrieve content using the SAS® Viya® Files service. Here is the basic syntax:

filename fileref filesrvc folderpath='path'
            filename='name';

For more information about this access method, see SAS 9.4 Global Statements Reference.

In the following example, PROC IMPORT and the FILESRVC access method are used to load the class.csv file from the SAS Content folder. The resulting ALLCLASS table is written to the CASUSER caslib.

filename myfile filesrvc folderpath='/Users/saskir'
            filename='class.csv';
   proc import datafile=myfile out=casuser.allclass(promote=yes)
        dbms=csv;
   run;

Load an Excel file

This section shows you how to load an Excel file into a CAS table by using PROC IMPORT. Loading an Excel file using PROC IMPORT requires that you have SAS/ACCESS® Interface to PC Files. If you are unsure whether you have this product, you can write all of your licensed products to the log by using the SETINIT procedure:

proc setinit;
   run;

You should see the following in the log when the product is licensed:

---SAS/ACCESS Interface to PC Files

After you confirm that you have this product, you can adapt the following PROC IMPORT code. This example loads the ReportTest.xlsx file and stores it in a table named ReportTest in the CASUSER caslib.

cas;
   caslib _all_ assign;
 
   proc import datafile='/viyashare/ReportTest.xlsx'
        out=casuser.ReportTest dbms=xlsx;
   run;

There are other methods

The purpose of this blog was to show you the easiest ways to load a SAS data set, text file, and Excel file into CAS. Although there are multiple ways to accomplish these tasks, both programmatically and interactively, these methods are the easiest and most straightforward ways to get data into CAS.

Learn the three easiest ways to load data into CAS tables was published on SAS Users.

4月 242019
 
The ODS Excel destination, which became a production feature in SAS 9.4M3 (TS1M3), generates Microsoft Excel workbooks in native XLSX format. This destination generates multiple worksheets per workbook with each output object (e.g., a table or graphic) the destination encounters by default. The ODS Excel destination is also flexible, enabling you to vertically control the worksheet and place output objects wherever you want. This blog demonstrates the destination’s flexibility and how you can modify its default behavior by using the ODS EXCEL statement's SHEET_INTERVAL= option.

Adding tables and graphics on the same Microsoft Excel worksheet

By default, the ODS Excel destination adds a new worksheet for each table and graphic. However, at times, you might not want to use this default behavior. If you want more control over this, the SHEET_INTERVAL= ODS Excel option determines when a new worksheet is created. Valid values for the SHEET_INTERVAL= option include:

  • TABLE (the default value) - new sheet for each table in output
  • NONE - keep the output that follows on the current sheet
  • PAGE - new sheet for each page of SAS output
  • PROC - new sheet beginning at the PROC step boundary
  • BYGROUP - new sheet for each BY group of output
  • NOW - begin a new sheet immediately

The value NOW, new for SAS 9.4M5 (TS1M5), triggers the creation of a new worksheet after the destination encounters the next output object.
As an example of opting not to use the default behavior, consider a case where you have a CONTENTS procedure without any options. This procedure generates three separate worksheets with the data-set attributes, the engine host information, and the variable list.

The following table shows the default output that you receive with three individual worksheets:

However, if you want to place all three objects on a single worksheet, you can do that by setting the option SHEET_INTERVAL="NONE". The option setting SHEET_INTERVAL="PROC" could also be used in this example which would create a new worksheet only when a new procedure is encountered.
The following example illustrates how to use this option to include all your output on the same worksheet:

ods excel file="c:\test.xlsx" options sheet_interval="none");
proc contents data=sashelp.class;
run; 
ods excel close;

Output

Adding text and tables to a new worksheet

Two of the most popular ways to add text on worksheets are to use either the ODS TEXT= statement or the ODSTEXT procedure with the Excel destination. The following example adds text to a worksheet by using the ODS TEXT= statement. You include this statement before each PRINT procedure in this example:

ods excel file="c:\temp\test.xlsx";
ods excel options(sheet_name="Females");
 
ods text="Data for Female Patients";
proc print data=sashelp.class(where=(sex="F"));
run;
 
ods excel options(sheet_name="Males");
ods text="Data for Male Patients";
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
ods excel close;

Output

Notice that the first text string appears at the top of the first worksheet as expected. However, the text from the second ODS TEXT= statement appears at the bottom of this same page rather than at the top of the next worksheet containing the related data table. This behavior illustrates that the ODS TEXT= option is not very flexible. There is no good way to solve this issue.

However, you can use the SAS 9.4 ODSTEXT procedure in combination with the SHEET_INTERVAL= option to move the text string to the appropriate worksheet.
The following example uses PROC ODSTEXT and the SHEET_INTERVAL= option to move the text string "Statistics for Male Patients" to the top of the second worksheet:

ods excel file="c:\temp\test.xlsx";
ods excel options(sheet_name="Females");
 
ods text="Data for Female Patients";
proc print data=sashelp.class(where=(sex="F"));
run;
 
ods excel options(sheet_name="Males" sheet_interval="now");
ods excel options(sheet_name="Males" sheet_interval="none");
 
proc odstext;
   p "Data for Male Patients";
run;
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
ods excel close;

Output

Adding multiple tables or graphs on the same worksheet

This final example demonstrates how you can use the SHEET_INTERVAL= option to add multiple tables and graphics to the same Excel worksheet. First, we use the SHEET_INTERVAL="NONE" option in the first ODS EXCEL statement to place the first table and graph on the same worksheet. Then, the SHEET_INTERVAL="NOW" option is included in the second ODS EXCEL statement option to create a second worksheet and write the second table and graph to that worksheet:

ods graphics / height=2.5in width=3.5in;
ods excel file="c:\scratch\test.xlsx" options(sheet_interval="none");
 
proc print data=sashelp.class(where=(sex="F"));
run;
 
proc sgplot data=sashelp.class(where=(sex="F"));
scatter x=age y=height;
run;
 
ods excel options(sheet_interval="now");
 
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
proc sgplot data=sashelp.class(where=(sex="M"));
scatter x=age y=height;
run;
 
ods excel close;

Output

Learn more

Control the location of tables, graphs, and text with ODS Excel was published on SAS Users.

4月 242019
 
The ODS Excel destination, which became a production feature in SAS 9.4M3 (TS1M3), generates Microsoft Excel workbooks in native XLSX format. This destination generates multiple worksheets per workbook with each output object (e.g., a table or graphic) the destination encounters by default. The ODS Excel destination is also flexible, enabling you to vertically control the worksheet and place output objects wherever you want. This blog demonstrates the destination’s flexibility and how you can modify its default behavior by using the ODS EXCEL statement's SHEET_INTERVAL= option.

Adding tables and graphics on the same Microsoft Excel worksheet

By default, the ODS Excel destination adds a new worksheet for each table and graphic. However, at times, you might not want to use this default behavior. If you want more control over this, the SHEET_INTERVAL= ODS Excel option determines when a new worksheet is created. Valid values for the SHEET_INTERVAL= option include:

  • TABLE (the default value) - new sheet for each table in output
  • NONE - keep the output that follows on the current sheet
  • PAGE - new sheet for each page of SAS output
  • PROC - new sheet beginning at the PROC step boundary
  • BYGROUP - new sheet for each BY group of output
  • NOW - begin a new sheet immediately

The value NOW, new for SAS 9.4M5 (TS1M5), triggers the creation of a new worksheet after the destination encounters the next output object.
As an example of opting not to use the default behavior, consider a case where you have a CONTENTS procedure without any options. This procedure generates three separate worksheets with the data-set attributes, the engine host information, and the variable list.

The following table shows the default output that you receive with three individual worksheets:

However, if you want to place all three objects on a single worksheet, you can do that by setting the option SHEET_INTERVAL="NONE". The option setting SHEET_INTERVAL="PROC" could also be used in this example which would create a new worksheet only when a new procedure is encountered.
The following example illustrates how to use this option to include all your output on the same worksheet:

ods excel file="c:\test.xlsx" options sheet_interval="none");
proc contents data=sashelp.class;
run; 
ods excel close;

Output

Adding text and tables to a new worksheet

Two of the most popular ways to add text on worksheets are to use either the ODS TEXT= statement or the ODSTEXT procedure with the Excel destination. The following example adds text to a worksheet by using the ODS TEXT= statement. You include this statement before each PRINT procedure in this example:

ods excel file="c:\temp\test.xlsx";
ods excel options(sheet_name="Females");
 
ods text="Data for Female Patients";
proc print data=sashelp.class(where=(sex="F"));
run;
 
ods excel options(sheet_name="Males");
ods text="Data for Male Patients";
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
ods excel close;

Output

Notice that the first text string appears at the top of the first worksheet as expected. However, the text from the second ODS TEXT= statement appears at the bottom of this same page rather than at the top of the next worksheet containing the related data table. This behavior illustrates that the ODS TEXT= option is not very flexible. There is no good way to solve this issue.

However, you can use the SAS 9.4 ODSTEXT procedure in combination with the SHEET_INTERVAL= option to move the text string to the appropriate worksheet.
The following example uses PROC ODSTEXT and the SHEET_INTERVAL= option to move the text string "Statistics for Male Patients" to the top of the second worksheet:

ods excel file="c:\temp\test.xlsx";
ods excel options(sheet_name="Females");
 
ods text="Data for Female Patients";
proc print data=sashelp.class(where=(sex="F"));
run;
 
ods excel options(sheet_name="Males" sheet_interval="now");
ods excel options(sheet_name="Males" sheet_interval="none");
 
proc odstext;
   p "Data for Male Patients";
run;
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
ods excel close;

Output

Adding multiple tables or graphs on the same worksheet

This final example demonstrates how you can use the SHEET_INTERVAL= option to add multiple tables and graphics to the same Excel worksheet. First, we use the SHEET_INTERVAL="NONE" option in the first ODS EXCEL statement to place the first table and graph on the same worksheet. Then, the SHEET_INTERVAL="NOW" option is included in the second ODS EXCEL statement option to create a second worksheet and write the second table and graph to that worksheet:

ods graphics / height=2.5in width=3.5in;
ods excel file="c:\scratch\test.xlsx" options(sheet_interval="none");
 
proc print data=sashelp.class(where=(sex="F"));
run;
 
proc sgplot data=sashelp.class(where=(sex="F"));
scatter x=age y=height;
run;
 
ods excel options(sheet_interval="now");
 
 
proc print data=sashelp.class(where=(sex="M"));
run;
 
proc sgplot data=sashelp.class(where=(sex="M"));
scatter x=age y=height;
run;
 
ods excel close;

Output

Learn more

Control the location of tables, graphs, and text with ODS Excel was published on SAS Users.