sas programming

12月 052017
 

Finding a pattern like a phone number or national ID number embedded in text can be difficult and time consuming. The traditional DATA step has a family of functions (collectively referred to as PRX functions) that allow using Perl regular expressions in your SAS programs to make pattern search easier. [...]

The post Jedi SAS Tricks: Pattern Search in DS2 appeared first on SAS Learning Post.

12月 052017
 

The internet is rich with data, and much of that data seems to exist only on web pages, which -- for some crazy reason -- are designed for humans to read. When students/researchers want to apply data science techniques to analyze collect and analyze that data, they often turn to "data scraping." What is "data scraping?" I define it as using a program to fetch the contents of a web page, sift through its contents with data parsing functions, and save its information into data fields with a structure that facilitates analysis.

Python and R users have their favorite packages that they use for scraping data from the web. New SAS users often ask whether there are similar packages available in the SAS language, perhaps not realizing that Base SAS is already well suited to this task -- no special bundles necessary.

The basic steps for data scraping are:

  1. Fetch the contents of the target web page
  2. Process the source content of the page -- usually HTML source code -- and parse/save the data fields you need.
  3. If necessary, repeat for subsequent pages. This applies to those web sites that serve up lots of information in paginated form, and you want to collect all available pages of data.

Let's map these steps to the SAS programming language:

For this step Use these features
Get the contents of the web page FILENAME URL
Process/parse the web page contents DATA step, with parsing functions such as FIND, regular expressions via PRXMATCH. Use SAS informats to convert text to native data types.
Repeat across subsequent pages SAS macro language (%DO %UNTIL processing) or DATA step with asked about scraping data from the Center for Disease Control (CDC) web site.

Step 0: Find the original data source and skip the scrape

I'm writing this article at the end of 2017, and at this point in our digital evolution, web scraping seems like a quaint pastime. Yes, there are still some cases for it, which is why I'm presenting this article. But if you find information that looks like data on a web page, then there probably is a real data source behind it. And with the movement toward open data (especially in government) and data-driven APIs (in social media and commercial sites) -- there is probably a way for you to get to that data source directly.

In my example page for this post, the source page is hosted by CDC.gov, a government agency whose mandate is to share information with other public and private entities. As one expert pointed out, the CDC shares a ton of data at its dedicated data.cdc.gov site. Does this include the specific table the user wanted? Maybe -- I didn't spend a lot of time looking for it. However, even if the answer is No -- it's a simple process to ask for it. Remember that web sites are run by people, and usually these people are keen to share their information. You can save effort and achieve a more reliable result if you can get the official data source.

If you find an official data source to use instead, you might still want to automate its collection and import into SAS. Here are two articles that cover different techniques:

Web scraping is lossy, fragile process. The information on the web page does not include data types, lengths, or constraints metadata. And one tweak to the presentation of the web page can break any automated scraping process. If you have no other alternative and you're willing to accept these limitations, let's proceed to Step 1.

Step 1: Fetch the web page

In this step, we want to achieve the equivalent of the "Save As..." function from your favorite web browser. Just like your web browser, SAS can act as an HTTP client. The two most popular methods are:

  • FILENAME URL, which assigns a SAS fileref to the web page content. You can then use INFILE and INPUT to read that file from a DATA step.
  • PROC HTTP, which requires a few more lines of code to get and store the web page content in a single SAS procedure step.

I prefer PROC HTTP, and here's why. First, as a separate explicit step it's easier to run just once and then work with the file result over the remainder of your program. You're guaranteed to fetch the file just once. Though a bit more code, it makes it more clear about what happens under the covers. Next reason: PROC HTTP has been refined considerably in SAS 9.4 to run fast and efficient. Its many options make it a versatile tool for all types of internet interactions, so it's a good technique to learn. You'll find many more uses for it.

Here's my code for getting the web page for this example:

/* Could use this, might be slower/less robust */
* filename src url "https://wwwn.cdc.gov/nndss/conditions/search/";
 
/* I prefer PROC HTTP for speed and flexibility */
filename src temp;
proc http
 method="GET"
 url="https://wwwn.cdc.gov/nndss/conditions/search/"
 out=src;
run;

Step 2: Parse the web page contents

Before you can write code that parses a web page, it helps to have some idea of how the page is put together. Most web pages are HTML code, and the data that is "locked up" within them are expressed in table tags: <table>, <tr>, <td> and so on. Before writing the first line of code, open the HTML source in your favorite text editor and see what patterns you can find.

For this example from the CDC, the data we're looking for is in a table that has 6 fields. Fortunately for us, the HTML layout is very regular, which means our parsing code won't need to worry about lots of variation and special cases.

 <tr >
	<td style="text-align:left;vertical-align:middle;">
    	<a href="/nndss/conditions/chancroid/">
		Chancroid
		</a>
	</td>
	<td class="tablet-hidden"></td>
	<td class="tablet-hidden"></td>
	<td class="tablet-hidden"><i>Haemophilus ducreyi</i></td>
	<td  >1944 </td>
	<td  > Current</td>
</tr>

The pattern is simple. We can find a unique "marker" for each entry by looking for the "/nndss/conditions" reference. The line following (call it offset-plus-1) has the disease name. And the lines that are offset-plus-7 and offset-plus-8 contain the date ranges that the original poster was asking for. (Does this seem sort of rough and "hacky?" Welcome to the world of web scraping.)

Knowing this, we can write code that does the following:

Here's the code that I came up with, starting with what the original poster shared:

/* Read the entire file and skip the blank lines */
/* the LEN indicator tells us the length of each line */
data rep;
infile src length=len lrecl=32767;
input line $varying32767. len;
 line = strip(line);
 if len>0;
run;
 
/* Parse the lines and keep just condition names */
/* When a condition code is found, grab the line following (full name of condition) */
/* and the 8th line following (Notification To date)                                */
/* Relies on this page's exact layout and line break scheme */
data parsed (keep=condition_code condition_full note_to);
 length condition_code $ 40 condition_full $ 60 note_to $ 20;
 set rep;
 if find(line,"/nndss/conditions/") then do;
   condition_code=scan(line,4,'/');
   pickup= _n_+1 ;
   pickup2 = _n_+8;
   /* "read ahead" one line */
   set rep (rename=(line=condition_full)) point=pickup;
   condition_full = strip(condition_full);
 
   /* "read ahead" 8 lines */
   set rep (rename=(line=note_to)) point=pickup2;
   /* use SCAN to get just the value we want */
   note_to = strip(scan(note_to,2,'<>'));
 
   /* write this record */
   output;
  end;
run;

The result still needs some cleanup -- there are a few errant data lines in the data set. However, it should be simple to filter out the records that don't make sense with some additional conditions. I left this as a to-do exercise to the original poster. Here's a sample of the result:

Step 3: Loop and repeat, if necessary

For this example with the CDC data, we're done. All of the data that we need appears on a single page. However, many web sites use a pagination scheme to break the data across multiple pages. This helps the page load faster in the browser, but it's less convenient for greedy scraping applications that want all of the data at once. For web sites that paginate, we need to repeat the process to fetch and parse each page that we need.

Web sites often use URL parameters such as "page=" to indicate which page of results to serve up. Once you know the URL parameter that controls this offset, you can create a SAS macro program that iterates through all of the pages that you want to collect. There are many ways to accomplish this, but I'll share just one here -- and I'll use a special page from the SAS Support Communities (communities.sas.com) as a test.

Aside from the macro looping logic, my code uses two tricks that might be new to some readers.

/* Create a temp folder to hold the HTML files */
options dlcreatedir;
%let htmldir = %sysfunc(getoption(WORK))/html;
libname html "&htmldir.";
libname html clear;
 
/* collect the first few pages of results of this site */
%macro getPages(num=);
 %do i = 1 %to &num.;
   %let url = https://communities.sas.com/t5/custom/page/page-id/activity-hub?page=&i.;
   filename dest "&htmldir./page&i..html";
   proc http 
      method="GET"
	  url= "&url."
	  out=dest;
   run;
 %end;
%mend;
 
/* How many pages to collect */
%getPages(num=3);
 
/* Use the wildcard feature of INFILE to read all */
/* matching HTML files in a single step */
data results;
 infile "&htmldir./*.html" length=len lrecl=32767;
 input line $varying32767. len ;
 line = strip(line);
 if len>0;
run;

The result of this program is one long data set that contains the results from all of the pages -- that is, all of lines of HTML code from all of the web pages. You can then use a single DATA step to post-process and parse out the data fields that you want to keep. Like magic, right?

Where to learn more

Each year there are one or two "web scraping" case studies presented at SAS Global Forum. You can find them easily by searching on the keyword "scrape" from the SAS Support site or from lexjansen.com. Here are some that I found interesting:

How JMP 13 sees the CDC page

If you have access to JMP software from SAS, you could try the File->Internet Open... feature. It's useful for a one-time, ad-hoc parsing step that you can later capture into a JSL script for future use. With Internet Open you can specify a URL and select to open it "as HTML," and then JMP will offer a selection of available tables to import as data. This is definitely worth a try if you have JMP on your desktop, as it can lend some insight about the structure of the page you're trying to scrape.

JMP 13 version of the data, imported

The post How to scrape data from a web page using SAS appeared first on The SAS Dummy.

11月 302017
 

There are so many reasons why SAS programmers love SAS -- as a matter of fact, I wrote a blog on it back in 2012. I now realize that I could've written a whole series, not just a single post. And with the recent publishing of my first book, Big Data [...]

Why SAS programmers love SAS, revisited was published on SAS Voices by David Pope

11月 292017
 

SAS Viya is an exciting addition to the SAS Platform, allowing you to conduct analysis faster than ever before and providing you the flexibility to utilize open source technologies and generate insights from data in any computing environment. The blog post “Top 12 Advantages of SAS Viya” does a great [...]

The post Learn more about SAS Viya with resources from SAS Education appeared first on SAS Learning Post.

11月 222017
 

Social media has brought anniversary dates to the forefront. Every day, my view of Google Photos or Facebook shows me a collection of photos from exactly some number of years ago to remind me of how good things were back then. These apps are performing the simplest of date-based math to trick me. They think, "We have a collection of photos from this month/day from some previous year -- let's collect those together and then prey on Chris' nostalgia instincts."

Sometimes it works and I share them with friends. Weren't my kids so cute back then? It's true, they were -- but I don't pine for those days. I live in the present and I look towards the future -- as all citizens of the world should.

Looking back at dates in SAS

But you know who likes to look at the past? Managers (and probably a few of my colleagues...). Many of my SAS jobs are in support of date-based reports. They answer questions such as, "what happened in the past 30 days?" or "how much activity in the past 6 months?" When I have SAS date values, going back 30 days is simple. The SAS date for "30 days ago" is simply today()-30. (Because, remember, a SAS date is simply an integer representing the count of days since Jan 1, 1960.)

"6 months ago" is a little more nuanced. I could fudge it by subtracting 183 or 184 from the value of today(), but that's not precise enough for the analytical wonks that I work with. Fortunately, SAS provides a function that can compute any date -- given a starting date -- using just about any criteria you can imagine. The function is called INTNX (

six_mo_ago = intnx(
 'month',  /* unit of time interval */
  today(), /* beginning date */
  -6,      /* number of intervals, negative goes to the past */
  'same'   /* alignment of interval date. "Same" is for same day of month */
 );

If I wanted to go 6 years into the past, I would simply change that first argument to 'year'. 6 days? Change it to 'day'. in %SYSFUNC -- the macro function that breaks out of macro processing to invoke built-in SAS functions. I also need to remove the quotes around the interval and alignment values -- the SAS macro processor will treat these as string literals and would not approve of the quotes.

%let six_mo_ago = 
 %sysfunc(
  intnx(
   month,             /* unit of time interval */
   %sysfunc(today()), /* function to get current date */
   -6,                /* number of intervals, negative goes to the past */
   same               /* alignment of interval date. "Same" is for same day of month */
   )
  );
 %put &=six_mo_ago;

Output (from today's run):

45          %put &=six_mo_ago;
SIX_MO_AGO=20960

Notice the result is a number -- which is exactly what I need for date value comparisons. But what if I wanted a formatted version of the date? The

 %let six_mo_ago_fmt = 
  %sysfunc(
   intnx(
   month,             /* unit of time interval */
   %sysfunc(today()), /* function to get current date */
   -6,                /* number of intervals, negative goes to the past */
   same               /* alignment of interval date. "Same" is for same day of month */
   ), date9.  /* Tell %SYSFUNC how to format the result */
  );
 %put &=six_mo_ago_fmt;

Output (from today's run):

56          %put &=six_mo_ago_fmt;
SIX_MO_AGO_FMT=21MAY2017

Adapting INTNX for SAS datetime values

This computed date works perfectly when my data sets contain SAS date values that I want to filter. For example, I can limit the records to those from the past 6 months with code similar to this:

proc freq data=comm.visits (where=(date > &six_mo_ago.)) 
 noprint 
 ;
tables geo_country_code / out=country_visits_6mo;
run;

But I realized that some of my data sets use datetime values, not date values. A SAS datetime value is simply the number of seconds since midnight on Jan 1, 1960. I've seen programs that adapt the code above by converting the computed cutoff date from a date value to a datetime value -- it's simple to do with math:

proc freq data=comm.visits (where=(event_time > %sysevalf( &six_mo_ago. * 60 * 60 * 24 ))) 
 noprint 
 ;
tables geo_country_code / out=country_visits_6mo;
run;

The

%let six_mo_ago_dt = 
 %sysfunc(
  intnx(
   dtmonth,              /* unit of time interval */
   %sysfunc(datetime()), /* function to get current datetime */
   -6,                   /* number of intervals, negative goes to the past */
   same                  /* alignment of interval date. "Same" is for same day of month/same time */
   )
  );
 %put &=six_mo_ago_dt;

Output (at this moment):

34          %put &=six_mo_ago_dt;
SIX_MO_AGO_DT=1810991917.93526

I can then use the much more readable version of my comparison code:

proc freq data=comm.visits (where=(event_time > &six_mo_ago_dt)) 
 noprint 
 ;
tables geo_country_code / out=country_visits_6mo;
run;

Learning more about date and datetime intervals

First, an acknowledgment. I recently had the privilege of presenting at the Quebec user groups with SAS-world superstar Marje Fecht. Marje is an excellent instructor, and she delivered a popular talk about working with SAS date values -- a topic that trips up many new SAS users. It's useful for me -- someone who has used SAS for a long time -- to see a basic topic presented to me as if I were brand new. Her talk reminded me of some good practices that I was able to bring home and apply in my own production SAS jobs. I know that Marje was invited to present this talk at SAS Global Forum 2018, and I hope that I did not steal too much of her thunder. She has a good origin story about the 01JAN1960 date decision. You should attend the conference and see her talk in person.

The INTNX (and its sister function for computing date differences, INTCK) are powerful tools for manipulating date and datetime values. Other programming languages offer complex code libraries to accomplish what these two functions can do as part of Base SAS. They are tricky to learn at first, but once you get the hang of them they can really simplify your SAS programs that deal with time-based data.

Rick Wicklin presented a useful introduction to both functions in INTCK and INTNX: Two essential functions for computing intervals between dates in SAS.

While these functions are available in Base SAS, they are maintained by the developers who look after SAS/ETS (econometrics and time series). You'll find see this thread on the SAS Support Communities.

The post Computing a date from the past (or future) with SAS appeared first on The SAS Dummy.

11月 112017
 

Here's a Proc Print trick for grouped data. Suppose your data is divided into groups, such as males and females. You could sort by the grouping variable before printing, like this: Suppose you want to better emphasize the groups. You could add a BY statement, like this: OK, but, personally, [...]

The post Simple Proc Print trick for grouped data appeared first on SAS Learning Post.

11月 082017
 

A SAS programmer wanted to display a table in which the rows have different formats. An example is shown below. The programmer wanted columns that represent statistics and rows that represent variables. She wanted to display formats (such as DOLLAR) for some variables—but only for certain statistics. For example, the number of nonmissing observations (N) should never use the format of the variable, whereas the minimum, maximum, and mean values should.

Format rows of a table

In SAS you can easily apply a format to a column, but not to a row. However, SAS provides two techniques that enable you to control the appearance of cells in tables:

Both PROC TEMPLATE and PROC REPORT have a learning curve. If you are in a hurry, an alternative solution is to use the DATA step to create a factoid. Recall that a factoid is an ODS table that contains character values and (formatted) numbers in the same column. A factoid can display mixed-type data by converting every value to a string. This conversion process gives you complete control over the format of each cell in the table. Consequently, you can use context to format some numbers differently from other numbers.

Creating a custom factoid

Suppose that you want to generate the table shown at the top of this article. You can obtain the statistics from PROC MEANS and then transpose the data. The following statements produce an output data set that contains descriptive statistics for three variables in the Sashelp.Cars data:

proc means data=sashelp.cars noprint;
var Cylinders EngineSize MSRP;
output out=stats(drop=_TYPE_ _FREQ_
                 rename=(_STAT_=Label1)
                 where=(Label1 ^= "STD") );
run;
 
proc print data=stats noobs;
run;
Descriptive statistics with formats

I have highlighted the first row of the output data to indicate why this output is not suitable for a report. The output variables retain the format of the original variables. However, the 'N' statistic is the sample size, and a sample of size '$428' does not make sense! Even '428.000' is a strange value to see for a sample size. It would be great to format the first row with an intrger format such as 8.0.

The next DATA step creates three character variables (CVALUE1-CVALUE3) that contain formatted values of the three numerical variables in the data. The SELECT-WHEN statement does the following:

  • For each observation and for each of the three numerical variables:
    • If the LABEL1 variable is 'N', then apply the 8.0 format to the value of the numerical variable.
    • Otherwise use the VVALUE function to get the formatted value for the numerical variable.
data Factoid;
set stats;
array nValue[3] Cylinders EngineSize MSRP;      /* numerical variables */
array cValue[3] $12.;                           /* cValue[i] is formatted version of nValue[i] */
/* with a little effort you could assign labels dynamically */
label cValue1="Cylinders" cValue2="EngineSize" cValue3="MSRP";
 
do i = 1 to dim(nValue);
   select (Label1);
      when ('N')    cValue[i] = put(nvalue[i], 8.0);
      otherwise     cValue[i] = vvalue(nvalue[i]);
   end;
end;
run;
 
proc print data=Factoid label noobs;
   var Label1 cValue:;
run;

The output displays the character columns, which contain formatted numbers. With additional logic, you could display fewer decimal values for the MEAN row.

Transpose the factoid

The previous table contains the information that the programmer wants. However, the programmer asked to display the transpose of the previous table. For completeness, here are SAS statements that transpose the table:

proc transpose data=Factoid out=TFactoid;
   var cValue:;
   ID Label1;
   IDLABEL Label1;
run;
 
proc print data=TFactoid(drop=_NAME_) noobs label;
   label _LABEL_ = "Variable";
run;

The final table is displayed at the top of the article.

Summary

In summary, this article shows one way to display a column of data in which each cell potentially has a different format. The trick is to create a factoid, which means that you create a character copy of numeric data. As you create the copy, you can use the PUT function to apply a custom format, or you can use the VVALUE function to get the existing formatted value.

In general, you should try to avoid creating copies of data, so use PROC TEMPLATE and PROC REPORT if you know how. However, if you don't know how to use those tools, the factoid provides an alternative way to control the formatted values that appear in a table. Although not shown in this article, a factoid also enables you to display character and numeric values in the same column.

The post How to format rows of a table in SAS appeared first on The DO Loop.

11月 082017
 

A SAS programmer wanted to display a table in which the rows have different formats. An example is shown below. The programmer wanted columns that represent statistics and rows that represent variables. She wanted to display formats (such as DOLLAR) for some variables—but only for certain statistics. For example, the number of nonmissing observations (N) should never use the format of the variable, whereas the minimum, maximum, and mean values should.

Format rows of a table

In SAS you can easily apply a format to a column, but not to a row. However, SAS provides two techniques that enable you to control the appearance of cells in tables:

Both PROC TEMPLATE and PROC REPORT have a learning curve. If you are in a hurry, an alternative solution is to use the DATA step to create a factoid. Recall that a factoid is an ODS table that contains character values and (formatted) numbers in the same column. A factoid can display mixed-type data by converting every value to a string. This conversion process gives you complete control over the format of each cell in the table. Consequently, you can use context to format some numbers differently from other numbers.

Creating a custom factoid

Suppose that you want to generate the table shown at the top of this article. You can obtain the statistics from PROC MEANS and then transpose the data. The following statements produce an output data set that contains descriptive statistics for three variables in the Sashelp.Cars data:

proc means data=sashelp.cars noprint;
var Cylinders EngineSize MSRP;
output out=stats(drop=_TYPE_ _FREQ_
                 rename=(_STAT_=Label1)
                 where=(Label1 ^= "STD") );
run;
 
proc print data=stats noobs;
run;
Descriptive statistics with formats

I have highlighted the first row of the output data to indicate why this output is not suitable for a report. The output variables retain the format of the original variables. However, the 'N' statistic is the sample size, and a sample of size '$428' does not make sense! Even '428.000' is a strange value to see for a sample size. It would be great to format the first row with an intrger format such as 8.0.

The next DATA step creates three character variables (CVALUE1-CVALUE3) that contain formatted values of the three numerical variables in the data. The SELECT-WHEN statement does the following:

  • For each observation and for each of the three numerical variables:
    • If the LABEL1 variable is 'N', then apply the 8.0 format to the value of the numerical variable.
    • Otherwise use the VVALUE function to get the formatted value for the numerical variable.
data Factoid;
set stats;
array nValue[3] Cylinders EngineSize MSRP;      /* numerical variables */
array cValue[3] $12.;                           /* cValue[i] is formatted version of nValue[i] */
/* with a little effort you could assign labels dynamically */
label cValue1="Cylinders" cValue2="EngineSize" cValue3="MSRP";
 
do i = 1 to dim(nValue);
   select (Label1);
      when ('N')    cValue[i] = put(nvalue[i], 8.0);
      otherwise     cValue[i] = vvalue(nvalue[i]);
   end;
end;
run;
 
proc print data=Factoid label noobs;
   var Label1 cValue:;
run;

The output displays the character columns, which contain formatted numbers. With additional logic, you could display fewer decimal values for the MEAN row.

Transpose the factoid

The previous table contains the information that the programmer wants. However, the programmer asked to display the transpose of the previous table. For completeness, here are SAS statements that transpose the table:

proc transpose data=Factoid out=TFactoid;
   var cValue:;
   ID Label1;
   IDLABEL Label1;
run;
 
proc print data=TFactoid(drop=_NAME_) noobs label;
   label _LABEL_ = "Variable";
run;

The final table is displayed at the top of the article.

Summary

In summary, this article shows one way to display a column of data in which each cell potentially has a different format. The trick is to create a factoid, which means that you create a character copy of numeric data. As you create the copy, you can use the PUT function to apply a custom format, or you can use the VVALUE function to get the existing formatted value.

In general, you should try to avoid creating copies of data, so use PROC TEMPLATE and PROC REPORT if you know how. However, if you don't know how to use those tools, the factoid provides an alternative way to control the formatted values that appear in a table. Although not shown in this article, a factoid also enables you to display character and numeric values in the same column.

The post How to format rows of a table in SAS appeared first on The DO Loop.

11月 072017
 

SAS Tips and TricksThere is certainly no shortage of terrific tips and tricks in various SAS blogs from some of our most distinguished SAS in-house experts. But, there's another group of equally qualified experts who don't often get to share their expertise on this channel: our customers. So, I went on a quest to get the inside scoop from various SAS users, polling Friends of SAS members to get their feedback on their favorite SAS tips.

We asked a few of these Friends of SAS members who are regular SAS users to share with us their top SAS tips and tricks for improving performance or something they wished they had known earlier in their SAS career. Based on that, we got a wide range of tips and tricks from a number of different SAS users – ranging from novice to expert and across various industries and product users. Check out some of them below:

FUNCTIONS

Functions are either built into SAS itself or you can write your own customized code that act in the same manner, all of which help in analyzing and processing data. There are a variety of function categories that include mathematical, date and time, character, truncation, and miscellaneous. Using functions makes us more efficient, and we don’t have to re-invent the wheel every time we want to figure something out. With this being said, some of our regular SAS users have a thing or two to say about dealing with functions that may help you out:

“Before you program any complex code, look for a SAS function that will do the task for you.”
     - John Ladds, Past President, OASUS

“Insert a line break in a concatenated string, such as: manylines = catx('0a'x,a,b,c);”
     - Aroop Ghosh, Principal Consultant, Webtalk Communications

“Use the lag function to create time related variables, for example, in time punch data”
     - Yolanda, Analyst, TD

“A good trick that I have recently learnt [sic]which can make the code less wordier is using the functions IFN and IFC as an alternative to IF THEN ELSE statements in conditional processing.”
     - Sunny Giroti, Master of Business Analytics Candidate, Schulich School of Business

“IFN can be used in place of IF THEN ELSE to shorten code”
     - Neil Menezes, Senior Business Anlyst, CTFS

“Ron Cody’s link from SAS.COM. It has many SAS function examples.”
     - John Lam, CIBC

SYNTAX/SHORTCUTS/EFFICIENCIES

You know what they say: time is money. So for a SAS programmer, finding shortcuts and ways to work more efficiently and faster are important to get a job done quicker. Here are a few ways SAS users think can make your life easy while working with SAS:

“Use missover to ensure no records are skipped when reading in a file”
     - Scott Bellefeuille, IT Solutions Developer (Merchant Services), TD Bank

“Pressing keys 'Ctrl'+'/' to comment out a line of code.”
     - Bunce Leung, Execution Manager, RBC

“Variable Lists - being able to refer to variables using double dashes to indicate all variables between first and last in a dataset is super useful for many procs. The later versions of being able to use the prefix and colon to indicate all datasets with a prefix is a great shortcut as well.”
     - Fareeza Khurshed, Manager (Statistical Services), Alberta Treasury Board and Finance

“I like to use PERL in SAS for finding stuff in character variables.”
     - Peter Timusk, Statistics Officer, Statistics Canada

“Title "SAS can give you an Inheritance". Have an ODBC driver on your local PC but not on a remote server? No problem. Use rsubmit with the inheritlib option. Your remote server will now inherit the ODBC driver and be able to access a database you thought you could only reach with your PC.”
     - Horst Wolter, Manager, TD Bank

“If you want to speed the processing of your program. Run your join statements on the "work" library. It is must faster.”
     - Estela Tavares, Economist, Statistics Canada

“When dealing with probability, can logistic be used in all cases? Trick Q - as A is N0. What about the times, probability is 0 and 1. What if the data is heavily distributed on 1s and 0s.”
     - Mukul Pandey, Student Business Analytics, Schulich School of Business

“Proc tabulate can perform descriptive statistics better than proc freq and proc means.”
     - Taha Azizi, Senior Business Insight Analyst, TD

Your turn

Were any of these tips and tricks useful? Do you use them already? What are some of your top SAS tips and tricks? Please be sure to share in the comments below!

Looking for more tips and tricks? Check out this video featuring six Canadian SAS programmers, including a few Friends of SAS members, who share some of their favourite SAS programming tips.

About Friends of SAS

If you’re not familiar with Friends of SAS, it is an exclusive online community available only to our Canadian SAS customers and partners to recognize and show our appreciation for their affinity to SAS. Members complete activities called 'challenges' and earn points that can be redeemed for rewards. There are opportunities to build powerful connections, gain privileged access to SAS resources and events, and boost your learning and development of SAS all in a fun environment.

Interested in learning more about Friends of SAS? Feel free to email myself at Natasha.Ulanowski@sas.com or Martha.Casanova@sas.com with any questions or more details.

SAS tips and tricks: Users-tell-all edition was published on SAS Users.

11月 062017
 
A factoid in SAS is a table that displays numeric and chanracter values in a single column

Have you ever seen the "Fit Summary" table from PROC LOESS, as shown to the right? Or maybe you've seen the "Model Information" table that is displayed by some SAS analytical procedures? These tables provide brief interesting facts about a statistical procedure, hence they are called factoids.

In SAS, a "factoid" has a technical meaning:

  • A factoid is an ODS table that can display numerical and character values in the same column.
  • A factoid displays row headers in a column. In most ODS styles, the row headers are displayed in the first column by using a special style, such as a light-blue background color in the HTMLBlue style.
  • A factoid does not display column headers because the columns display disparate and potentially unrelated information.

I want to emphasize the first item in the list. Since variables in a SAS data set must be either character or numeric, you might wonder how to access the data that underlies a factoid. You can use the ODS OUTPUT statement to look at the data object behind any SAS table, as shown below:

proc loess data=sashelp.cars plots=none;
   model mpg_city = weight;
   ods output FitSummary=Fit(drop=SmoothingParameter);
run;
 
proc print data=Fit noobs;
run;
The data object that underlies a factoid in SAS

The PROC PRINT output shows how the factoid manages to display characters and numbers in the same column. The underlying data object contains three columns. The LABEL1 column contains the row headers, which identify each row. The CVALUE1 column is the column that is displayed in the factoid. It is a character column that contains character strings and formatted copies of the numbers in the NVALUE1 column. The NVALUE1 column contains the raw numeric value of every number in the table. Missing values represent rows for which the table displays a character value.

All factoids use the same naming scheme and display the LABEL1 and CVALUE1 columns. The form of the data is important when you want to use the numbers from a factoid in a SAS program. Do not use the CVALUE1 character variable to get numbers! Those values are formatted and possibly truncated, as you can see by looking at the "Smoothing Parameter" row. Instead, read the numbers from the NVALUE1 variable, which stores the full double-precision number.

For example, if you want to use the AICC statistic (the last row) in a program, read it from the NVALUE1 column, as follows:

data _NULL_;
   set Fit(where=( Label1="AICC" ));  /* get row with AICC value */
   call symputx("aicc", NValue1);     /* read value of NValue1 variable into a macro */
run;
%put &=aicc;                          /* display value in log */
AICC=3.196483775

Some procedures produce factoids that display multiple columns. For example, PROC CONTENTS creates the "Attributes" table, which is a factoid that displays four columns. The "Attributes table displays two columns of labels and two columns of values. When you use the ODS OUTPUT statement to create a data set, the variables for the first two columns are LABEL1, CVALUE1, and NVALUE1. The variables for the second two columns are LABEL2, CVALUE2, and NVALUE2.

Be aware that the values in the LABEL1 (and LABEL2) columns depend on the LOCALE= option for your SAS session. This means that some values in the LABEL1 column might be translated into French, German, Korean, and so forth. When you use a WHERE clause to extract a value, be aware that the WHERE clause might be invalid in other locales. If you suspect that your program will be run under multiple locales, use the _N_ automatic variable, such as if _N_=14 then call symputx("aicc", NValue1);. Compared with the WHERE clause, using the _N_ variable is less readable but more portable.

Now that you know what a factoid is, you will undoubtedly notice them everywhere in your SAS output. Remember that if you need to obtain numerical values from a factoid, use the ODS OUTPUT statement to create a data set. The NVALUE1 variable contains the full double-precision numbers in the factoid. The CVALUE1 variable contains character values and formatted versions of the numbers.

The post What is a factoid in SAS? appeared first on The DO Loop.