sas programming

12月 282016
 

SAS temporary arrays are an underutilized jewel in the SAS toolbox. I find that many beginning to intermediate SAS programmers are not familiar with temporary arrays. The good news is that there is nothing complicated about them and they are very useful. First of all, what is a temporary array? […]

The post SAS Temporary Arrays, Not Just for Experts appeared first on SAS Learning Post.

12月 222016
 

TL; DR

Free training from SAS: "SAS Programming for R Users." The schedule of Live Web offerings is here. If you prefer self-study, the complete course materials are on the SAS Software GitHub space and you can practice with the free SAS University Edition software.


The details: how R programmers can learn SAS for free

diagbeta As much as I would love for SAS customers to use SAS to the exclusion of everything else, that rarely happens. Every time I visit a SAS customer, I hear about the other non-SAS tools that they use alongside SAS and their integration points. The most popular of these include desktop tools such as Microsoft Excel, or enterprise databases from other vendors. But increasingly, I hear from users who dabble in open source tools such as Python and R, or who work with other teams that use those tools.

Programmers tend to favor the programming languages that they know. When you learn a new programming language, your experience is colored by inevitable comparisons with the languages you've already mastered. If you work with R coders who want to learn SAS, you should consider that they probably won't learn SAS the same way that you did.

A SAS programming course for experienced programmers

The traditional way to learn SAS begins with the DATA step, where you learn how to read files, how to write files, about the program data vector, and basically how the DATA step "thinks". Then you move on to the various procedures for descriptive stats, reporting, and maybe even some graphing. While this approach can make you productive with simple tasks quickly, to an R coder this might feel too much like "starting over." That's why R programmers (or even MATLAB or Stata users) need an approach that leverages what they already know to hit the ground running.

That's the thinking behind the new SAS Programming for R Users course. This course does not start with the basics about statistics or the importance of data prep -- the assumption is that you already know that. Instead, you'll get hands-on experience with SAS/IML -- a statistical matrix language that will certainly feel familiar to R users. You'll eventually get to the DATA step and other procedures, of course -- and these will open new worlds for you -- but you'll learn to be productive quickly using the skills you already have. (You can read more about the genesis of the course from its creator and main instructor, Jordan Bakerman.)

The course centers around classic and real statistical problems, from Bayesian logistic regression to the Monty Hall problem. If you don't know your statistics, you might feel that you're swimming in waters over your head. But if you're comfortable with the concepts, you should feel right at home. (If you're just beginning with statistics, SAS offers this different free e-learning course.)

The classic game show proof

The classic game show proof - click for code

"SAS Programming for R Users" also shows you how to use SAS and R together, submitting R code from within your SAS program. That's made possible by a special connection between SAS/IML and R -- something that SAS has supported for years.

This is a free instructor-led course that's offered in Live Web format. "Live Web" means that you connect from your desk at home or work, tune into the lecture and demos, and then practice your skills on a hosted classroom environment. And this course is free -- costing you only your time (5 half-day sessions). Check out the SAS Training site to see when the next offering might meet your schedule.

Find the course materials on GitHub, right now

What if you can't find a Live Web offering that meets your schedule? In the spirit of openness, the SAS Training team has published the complete course materials on GitHub. You'll find the course notes (over 600 pages), data sets, and over 80 SAS programs to support the course exercises. You can use the free SAS University Edition to try the course exercises yourself and practice with the software. (The only part that you can't practice is the "submit to R" lessons, because the SAS University Edition doesn't support the connection to R.)

tags: open source, SAS programming, sas training

The post Learning SAS programming for R users appeared first on The SAS Dummy.

12月 082016
 

As technology expands, we have a similarly increasing need to create programs that can be handed off – to clients, to regulatory agencies, to parent companies, or to other projects – and handed off with little or no modification needed by the recipient. Minimizing modification by the recipient often requires […]

The post Using the SAS Macro Language to Create Portable Programs appeared first on SAS Learning Post.

12月 032016
 

JSON is the new XML. The number of SAS users who need to access JSON data has skyrocketed, thanks mainly to the proliferation of REST-based APIs and web services. Because JSON is structured data in text format, we've been able to offer simple parsing techniques that use DATA step and most recently PROC DS2. But finally*, with SAS 9.4 Maintenance 4, we have a built-in LIBNAME engine for JSON.

Simple JSON example: Who is in space right now?

Speaking of skyrocketing, I discovered a cool web service that reports who is in space right now (at least on the International Space Station). It's actually a perfect example of a REST API, because it does just that one thing and it's easily integrated into any process, including SAS. It returns a simple stream of data that can be easily mapped into a tabular structure. Here's my example code and results, which I produced with SAS 9.4 Maintenance 4.

filename resp temp;
 
/* Neat service from Open Notify project */
proc http 
 url="http://api.open-notify.org/astros.json"
 method= "GET"
 out=resp;
run;
 
/* Assign a JSON library to the HTTP response */
libname space JSON fileref=resp;
 
/* Print result, dropping automatic ordinal metadata */
title "Who is in space right now? (as of &sysdate)";
proc print data=space.people (drop=ordinal:);
run;

JSON who is in space
But what if your JSON data isn't so simple? JSON can represent information in nested structures that can be many layers deep. These cases require some additional mapping to transform the JSON representation to a rectangular data table that we can use for reporting and analytics.

JSON map example: Most recent topics from SAS Support Communities

In a previous post I shared a PROC DS2 program that uses the DS2 JSON package to call and parse our SAS Support Communities API. The parsing process is robust, but it requires quite a bit of fore knowledge about the structure and fields within the JSON payload. It also requires many lines of code to extract each field that I want.

Here's a revised pass that uses the JSON engine:

/* split URL for readability */
%let url1=http://communities.sas.com/kntur85557/restapi/vc/categories/id/bi/topics/recent;
%let url2=?restapi.response_format=json%str(&)restapi.response_style=-types,-null,view;
%let url3=%str(&)page_size=100;
%let fullurl=&url1.&url2.&url3;
 
filename topics temp;
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
/* Let the JSON engine do its thing */
libname posts JSON fileref=topics;
title "Automap of JSON data";
 
/* examine resulting tables/structure */
proc datasets lib=posts; quit;
proc print data=posts.alldata(obs=20); run;

Thanks to the many layers of data in the JSON response, here are the tables that SAS creates automatically.

json Auto tables
There are 12 tables that contain various components of the message data that I want, plus the ALLDATA member that contains everything in one linear table. ALLDATA is good for examining structure, but not for analysis. You can see that it's basically name-value pairs with no data types/formats assigned.

json ALLDATA
I could use DATA steps or PROC SQL to merge the various tables into a single denormalized table for my reporting purposes, but there is a better way: define and apply a JSON map for the libname engine to use.

To get started, I need to rerun my JSON libname assignment with the AUTOMAP option. This creates an external file with the JSON-formatted mapping that SAS generates automatically. In my example here, the file lands in the WORK directory with the name "top.map".

filename jmap "%sysfunc(GETOPTION(WORK))/top.map";
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
libname posts JSON fileref=topics map=jmap automap=create;

This generated map is quite long -- over 400 lines of JSON metadata. Here's a snippet of the file that describes a few fields in just one of the generated tables.

"DSNAME": "messages_message",
"TABLEPATH": "/root/response/messages/message",
"VARIABLES": [
{
  "NAME": "ordinal_messages",
  "TYPE": "ORDINAL",
  "PATH": "/root/response/messages"
},
{
  "NAME": "ordinal_message",
  "TYPE": "ORDINAL",
  "PATH": "/root/response/messages/message"
},
{
  "NAME": "href",
  "TYPE": "CHARACTER",
  "PATH": "/root/response/messages/message/href",
  "CURRENT_LENGTH": 19
},
{
  "NAME": "view_href",
  "TYPE": "CHARACTER",
  "PATH": "/root/response/messages/message/view_href",
  "CURRENT_LENGTH": 134
},

By using this map as a starting point, I can create a new map file -- one that is simpler, much smaller, and defines just the fields that I want. I can reference each field by its "path" in the JSON nested structure, and I can also specify the types and formats that I want in the final data.

In my new map, I eliminated many of the tables and fields and ended up with a file that was just about 60 lines long. I also applied sensible variable names, and I even specified SAS formats and informats to transform some columns during the import process. For example, instead of reading the message "datetime" field as a character string, I coerced the value into a numeric variable with a DATETIME format:

{
  "NAME": "datetime",
   "TYPE": "NUMERIC",
  "INFORMAT": [ "IS8601DT", 19, 0 ],
  "FORMAT": ["DATETIME", 20],
  "PATH": "/root/response/messages/message/post_time/_",
  "CURRENT_LENGTH": 8
},

I called my new map file 'minimap.map' and then re-issued the libname without the AUTOMAP option:

filename minmap 'c:tempminmap.map';
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
libname posts json fileref=topics map=minmap;
proc datasets lib=posts; quit;
 
data messages;
 set posts.messages;
run;

Here's a snapshot of the single data set as a result.

JSON final data
I think you'll agree that this result is much more usable than what my first pass produced. And the amount of code is much smaller and easier to maintain than any previous SAS-based process for reading JSON.

Here's the complete program in public GitHub gist, including my custom JSON map.


* By the way, tags: JSON, REST API, SAS programming

The post Reading data with the SAS JSON libname engine appeared first on The SAS Dummy.

12月 012016
 

In my earlier post about WHERE and IF statements, I announced that the DATA step debugger has finally arrived in SAS Enterprise Guide. (I admit that I might have buried the lead in that post.) Let's use this post to talk about the new debugger and how it works.

First, let's address some important limitations. This tool is for debugging DATA step code. It can't be used to debug PROC SQL or PROC IML or SAS macro programs. Next, it can't be used to debug DATA steps that read data from CARDS or DATALINES. That's an unfortunate limitation, but it's a side effect of the way the DATA step "debug" mode works with client applications like SAS Enterprise Guide. (Workaround: load your data in a separate step, then debug your more complex DATA step logic in a subsequent step.)

Ye olde DATA step debugger

1986 called; they want their debugger back

1986 called; they want their debugger back.

If you've been around SAS programs for a while then you might remember the full-screen DATA step debugger in the SAS windowing environment. Introduced as production in SAS 6.09E (E="enhanced!"), it was basic but it did the job, relying on command-line processing to direct the debugger actions. It had only two windows: one for the source, and one for the "log", meaning the debugger console log. You could set breakpoints, variable watch conditions, examine variables and calculate values -- all with commands that you typed. (Even though I'm writing this in the past tense and it seems like I'm eulogizing, this debugger still lives on in Base SAS!)

The new DATA step debugger

The new debugging environment, introduced in SAS Enterprise Guide 7.13, has all of the features of its ancestor. And it's much more usable, with toolbars and windows that allow you to control its behavior. But keyboard junkies, don't worry -- that command line is still there too!

To activate the debugger, click the new "bug" toolbar icon in the program editor window. Once activated, you can click the bug in the left "gutter" of the program editor to begin a debug session. (You can also press F5 to debug the active DATA step.)
Starting the Debugger
Examine the screenshot below. You see the source window on top and the console window at the bottom, plus a convenient "watch" window that shows much of the content in the program data vector (PDV). That's all of the variables defined in the DATA step, plus automatic variables like _N_ and _ERROR_.

EG debugger
As you step through the DATA step, the line pointer in the source window advances to show the next line that will execute. You can use keyboard shortcuts (F10), the toolbar, or typed a typed command ("step") to execute that line and advance. With every step, the watch window is updated with the latest values of the variables in your step. When a variable changes value, it's colored red. If you want to the DATA step to break processing when a certain variable changes value, check the Watch box for that variable.

Diving deeper with advanced debugging

Here's another example of debugging a different DATA step program. This program uses a BY statement and FIRST.variable logic, and you can see the additional automatic variables (FIRST.Make and LAST.Make) that the debugger is tracking. I also used END=eof on the SET statement; that adds the eof "flag" variable into the mix during run time.

egdebug_adv
In the Debug Console window you can see that I've issued some pretty fancy commands. The DATA step debugger allows you to set breakpoints that trigger on specific conditions. For example, "b 8 when (running_price > 10000)" will break on Line 8 when the value of running_price exceeds 10,000. "b 8 after 5" will break on Line 8 after 5 passes through the DATA step. You can set and clear line-specific breakpoints by clicking in the "gutter" (that left-hand margin next to the line numbers).

The "list _all_" command reveals the details about your open data sets and files. Here's what I see during the run of my program.

list command
Other commands let you SET variable values, EXAMINE variables, CALCulate expressions, GO and JUMP to specific lines, and more. The SAS documentation contains a complete reference for DATA step debugger commands, and most of those work exactly as documented, even within SAS Enterprise Guide. Here's the list:

This old-but-still relevant SAS Global Forum paper (written by a SAS user) also covers some useful debugging concepts in SAS which you can apply in this new environment.

A personal note: eating my words

I've presented "SAS Enterprise Guide for SAS programmers" as a topic in one form or another for the past 15 years. Every so often the topic of the DATA step debugger comes up, and I've said "don't look for it anytime soon." Knowing how the full-screen debugger is closely tied to the SAS windowing environment, I didn't hold out hope for a client application like SAS Enterprise Guide to get it working. Kudos to the R&D team! They creatively found a solution with the "/ldebug" option, an even more obscure debugging approach that works in SAS batch mode. I think this feature will be tremendous productivity boost for experienced SAS programmers, and a useful learning and teaching tool for those just getting started with the DATA step.

tags: SAS Enterprise Guide, SAS programming

The post Using the DATA step debugger in SAS Enterprise Guide appeared first on The SAS Dummy.

11月 302016
 

Do you want to create customized SAS graphs by using PROC SGPLOT and the other ODS graphics procedures? An essential skill that you need to learn is how to merge, join, append, and concatenate SAS data sets that come from different sources. The SAS statistical graphics procedures (SG procedures) enable you to overlay all kinds of customized curves, markers, and bars. However, the SG procedures expect all the data for a graph to be in a single SAS data set. Therefore it is often necessary to append two or more data sets before you can create a complex graph.

This article discusses two ways to combine data sets in order to create ODS graphics. An alternative is to use the SG annotation facility to add extra curves or markers to the graph. Personally, I prefer to use the techniques in this article for simple features, and reserve annotation for adding highly complex and non-standard features.

Overlay curves

sgplotoverlay

In a previous article, I discussed how to structure a SAS data set so that you can overlay curves on a scatter plot.

The diagram at the right shows the main idea of that article. The X and Y variables contain the original data, which are the coordinates for a scatter plot. Secondary information was appended to the end of the data. The X1 and Y1 variables contain the coordinates of a custom scatter plot smoother. The X2 and Y2 variables contain the coordinates of a different scatter plot smoother.

This structure enables you to use the SGPLOT procedure to overlay two curves on the scatter plot. You use a SCATTER statement and two SERIES statements to create the graph. See the previous article for details.

Overlay markers: Wide form

In addition to overlaying curves, I sometimes want to add special markers to the scatter plot. In this article I will show how to add a marker that shows the location of the sample mean. This article shows how to use PROC MEANS to create an output data set that contains the coordinates of the sample mean, then append that data set to the original data.


Add special markers to a graph using PROC SGPLOT #SASTip
Click To Tweet


The following statements use PROC MEANS to compute the sample mean for four variables in the SasHelp.Iris data set, which contains the measurements for 150 iris flowers. To emphasize the general syntax of this computation, I use macro variables, but that is not necessary:

%let DSName = Sashelp.Iris;
%let VarNames = PetalLength PetalWidth SepalLength SepalWidth;
 
proc means data=&DSName noprint;
var &VarNames;
output out=Means(drop=_TYPE_ _FREQ_) mean= / autoname;
run;

The AUTONAME option on the OUTPUT statement tells PROC MEANS to append the name of the statistic to the variable names. Thus the output data set contains variables with names like PetalLength_Mean and SepalWidth_Mean. As shown in the diagram in the previous section, this enables you to append the new data to the end of the old data in "wide form" as follows:

data Wide;
   set &DSName Means; /* add four new variables; pad with missing values */
run;
 
ods graphics / attrpriority=color subpixel;
proc sgplot data=Wide;
scatter x=SepalWidth y=PetalLength / legendlabel="Data";
ellipse x=SepalWidth y=PetalLength / type=mean;
scatter x=SepalWidth_Mean y=PetalLength_Mean / 
         legendlabel="Sample Mean" markerattrs=(symbol=X color=firebrick);
run;
Scatter plot with markers for sample means

The first SCATTER statement and the ELLIPSE statement use the original data. Recall that the ELLIPSE statement draws an approximate confidence ellipse for the mean of the population. The second SCATTER statement uses the sample means, which are appended to the end of the original data. The second SCATTER statement draws a red marker at the location of the sample mean.

You can use this same method to plot other sample statistics (such as the median) or to highlight special values such as the origin of a coordinate system.

Overlay markers: Long form

In some situations it is more convenient to append the secondary data in "long form." In the long form, the secondary data set contains the same variable names as in the original data. You can use the SAS data step to create a variable that identifies the original and supplementary observations. This technique can be useful when you want to show multiple markers (sample mean, median, mode, ...) by using the GROUP= option on one SCATTER statement.

The following call to PROC MEANS does not use the AUTONAME option. Therefore the output data set contains variables that have the same name as the input data. You can use the IN= data set option to create an ID variable that identifies the data from the computed statistics:

/* Long form. New data has same name but different group ID */
proc means data=&DSName noprint;
var &VarNames;
output out=Means(drop=_TYPE_ _FREQ_) mean=;
run;
 
data Long;
set &DSName Means(in=newdata);
if newdata then 
   GroupID = "Mean";
else GroupID = "Data";
run;

The DATA step created the GroupID variable, which has the values "Data" for the original observations and the value "Mean" for the appended observations. This data structure is useful for calling PROC SGSCATTER, which supports the GROUP= option, but does not support multiple PLOT statements, as follows:

ods graphics / attrpriority=none;
proc sgscatter data=Long 
   datacontrastcolors=(steelblue firebrick)
   datasymbols=(Circle X);
plot (PetalLength PetalWidth)*(SepalLength SepalWidth) / group=groupID;
run;
Scatter plot matrix with markers for sample means

In conclusion, this article demonstrates a useful technique for adding markers to a graph. The technique requires that you concatenate the original data with supplementary data. Appending and merging data is a technique that is used often when creating ODS statistical graphics in SAS. It is a great technique to add to your programming toolbox.

tags: SAS Programming, Statistical Graphics, Tips and Techniques

The post Append data to add markers to SAS graphs appeared first on The DO Loop.

11月 272016
 

In the DATA step, the WHERE statement and the IF statement (a.k.a. the "subsetting IF") have similar functions. In many scenarios, they produce identical results. But new SAS programmers are taught early on that these two statements work very differently, and in important ways. To understand the differences, it helps to step through the program line-by-line to see how SAS "thinks." Fortunately, the new DATA step debugger in SAS Enterprise Guide 7.13 makes this really easy to do.

Difference between WHERE statement and IF statement

Here are the basics: the WHERE statement is applied when the DATA step is compiled. Incoming data (from a SET or MERGE statement) is filtered immediately to just those records that match the WHERE condition, so only those records are ever loaded into the program data vector (PDV). This results in fewer iterations through DATA step code, but provides no opportunity for "dynamic" decisions about which records to examine.

In contrast, the IF statement is evaluated at run time, and operates on the variables in the PDV. When the IF condition is met, the current observation is kept for eventual output. Unlike the WHERE statement, the IF statement can examine values of new variables that are defined within the step.

Consider these two DATA steps. They produce identical output of 10 records, but the first one processes only those 10 records whereas the second step processes all 19 records from the input.

data results1;
  set sashelp.class;
  /* WHERE applied at compile time  */
  /* Processes ONLY matching obs    */
  where sex='M';
run;
 
data results2;
  set sashelp.class;
  /* IF evaluated at run time  */
  /* Processes EVERY obs       */
  if sex='M';
run;

Using the DATA step debugger to understand the DATA step

The new DATA step debugger in SAS Enterprise Guide makes it very easy to illustrate how WHERE is processed differently from IF. I loaded each of the above programs into my session, then clicked the new "bug" toolbar icon to activate the debugger. Once activated, you can click the bug in the left "gutter" of the program editor to begin a debug session. (You can also press F5 to debug the active DATA step.)
Starting the Debugger
Watch this first animation of a debugger session and see what you notice about the WHERE statement.

Debugger with WHERE
Watching this little movie, I see a few things that reveal some insights.

  • The statement pointer never lands on Line 5 (the WHERE statement). That's because the WHERE statement isn't processed at run time.
  • Even though the CLASS data contains 19 records, the value of the _N_ automatic variable reaches only 11, indicating that only 10 records were processed.
  • The variable watch window uses red to indicate when a variable changes between iterations. The Sex variable never changes from 'M', and thus stays colored black through the entire session.

Let's compare that to the IF statement. Study this animation and see what stands out to you.

Debugger with IF
Here's what I see:

  • The statement pointer begins at Line 2, then 5, and moves to Line 6 (the RUN statement) only when the record has made it past the IF condition and into the output. For each observation where Sex='F', the DATA step stops processing the record and the RUN statement is skipped.
  • In this program, _N_ reaches 20 -- that's because all 19 records in SASHELP.CLASS are processed and the step exits at the end-of-file condition.

Learning more about subsetting IF, IF-THEN, WHERE, and debugging

There are several good articles about how the IF statement works, on its own and in combination with IF-THEN-ELSE constructs. Here's a recent article by SAS trainer Charu Shankar. And here's another reference that's included in a piece about the Top 10 SAS coding efficiencies.

The new DATA step debugger in SAS Enterprise Guide opens a new world of understanding for beginner and veteran SAS programmers. It has all of the functions of the "classic" debugger available in the Base SAS windowing environment, but with a much friendlier user interface, keyboard shortcuts, and useful watch windows. In a future post, I'll cover the debugging functions in more detail.

tags: SAS Enterprise Guide, SAS programming

The post Debugging the difference between WHERE and IF in SAS appeared first on The SAS Dummy.

11月 112016
 

If you obtain data from web sites, social media, or other unstandardized data sources, you might not know the form of dates in the data. For example, the US Independence Day might be represented as "04JUL1776", "07/04/1776", "Jul 4, 1776", or "July 4, 1776." Fortunately, the ANYDTDTE informat makes it easy read dates like these into SAS.

The ANYDTDTEw. informat is a flexible alternative to older informats such as DATEw., MMDDYYw., and YYMMDDw. If your dates are in a specific form, the older informats work great and serve to document that all dates must be in that standard form. If the dates are not standardized or you need to read a string like "July 4, 1776", the ANYDTDTE informat is a godsend.

The ANYDTDTE informat for reading dates

The following SAS DATA step shows that the ANYDTDTEw. format combines several older formats into a "super format" that attempts to convert a character string into a date. The ANYDTDTE format can not only replace many of the older formats, but it can be used to convert a string like "Jul 4, 1776" into a date, as follows:

data Dates;
input @1 Style $8.
      @9 Value anydtdte12.;
format Value DATE10.;
datalines;
DATE    04JUL1776
MMDDYY  07041776
MMDDYY  07/04/1776
YYMMDD  17760704 
N/A     Jul 4, 1776
N/A     July 4, 1776
;
 
proc print noobs; run;
Result of using the ANYDTDTE informat to read strings that represent dates

As you can see, the ANYDTDTE informat reads six different strings, but converts all of them to the SAS date value that corresponds to 04JUL1776.

MMDD or DDMM? How does ANYDTDTE interpret ambiguous dates?

The string 07/04/1776 can be interpreted as "April 7, 1776" or "July 4, 1776," depending upon the local convention. Europeans tend to interpret the string as DD/MM/YYYY whereas the US convention is to use MM/DD/YYYY. How does the ANYDTDTEw. informat guess which interpretation might be correct?

The answer is that the informat looks at the DATESTYLE SAS option. By default, the DATESTYLE option uses the LOCALE system option to guess which style to use. You can use PROC OPTIONS to see the value of these options, which are printed to the SAS log:

proc options option=(DATESTYLE LOCALE) value; run;
Option Value Information For SAS Option DATESTYLE
    Value: MDY
    Scope: Default
    How option value set: Locale
 
Option Value Information For SAS Option LOCALE
    Value: EN_US
...

For my system, the DATESTYLE option is set to MDY, which means that the string "07/04/1776" will be interpreted MM/DD/YYYY. If you need to read dates that obey a different convention, you can use the global OPTIONS statement to set the DATESTYLE option:

options DATESTYLE=DMY;    /* change default style convention */
/* Restore default convention: options DATESTYLE=Locale; */

Other "ANY" informats in SAS

There are two other SAS infomats that are similar to the ANYDTDTE informat:

Here's a tip to help you remember these seemingly cryptic names. The first part of the name is "ANYDT", which means that the input string can be ANY datetime (DT) value. The end of the name refers to the numerical value that is produced by the informat. The resulting value can be a date (DTE), a datetime (DTM), or a time (TME) value. Thus the three informats all have the mnemonic form ANYDTXXX where the XXX suffix refers to the value that is produced.

tags: Reading and Writing Data, SAS Programming

The post One informat to rule them all: Read any date into SAS appeared first on The DO Loop.

11月 042016
 

My river walk last week turned into a spectacular fall show. But if it rains this week in San Antonio, like the weatherman predicts, what will I do? In the coming days, I’ll be presenting at two user groups,  one in eastern Canada in Halifax, and the other all the […]

The post The difference between the Subsetting IF and the IF—THEN—ELSE—IF statement appeared first on SAS Learning Post.

10月 172016
 

SAS programmers often resort to using the X command to list the contents of file directories and to process the contents of ZIP files (or gz files on UNIX). In centralized SAS environments, the X command is unavailable to most programmers. NOXCMD is the default setting for these environments (disallowing shell commands), and SAS admins are reluctant to change it.

In this article, I'll share a SAS program that can retrieve the contents of a file directory (all of the file names), and then also report on the contents of every ZIP file within that directory -- without using any shell commands. The program uses two lesser-known tricks to retrieve the information:

  1. The FILENAME statement can be applied to a directory, and then the DOPEN, DNUM, DREAD, and DCLOSE functions can be used to retrieve information about that directory. (Check SAS Note 45805 for a better example of just this - click the Full Code tab.)
  2. The FILENAME ZIP method (added in SAS 9.4) can retrieve the names of the files within a compressed archive (ZIP or gz files). For more information, see all of my previous articles about the FILENAME ZIP access method.

I wrote the program as a SAS macro so that it should be easy to reuse. And I tried to be liberal with the comments, providing a view into my thinking and maybe some opportunities for improvement.

%macro listzipcontents (targdir=, outlist=);
  filename targdir "&targdir";
 
  /* Gather all ZIP files in a given folder                */
  /* Searches just one folder, not subfolders              */
  /* for a fancier example see                             */
  /* http://support.sas.com/kb/45/805.html (Full Code tab) */
  data _zipfiles;
    length fid 8;
    fid=dopen('targdir');
 
    if fid=0 then
      stop;
    memcount=dnum(fid);
 
    /* Save just the names ending in ZIP*/
    do i=1 to memcount;
      memname=dread(fid,i);
      /* combo of reverse and =: to match ending string */
      /* Looking for *.zip and *.gz files */
      if (reverse(lowcase(trim(memname))) =: 'piz.') OR
         (reverse(lowcase(trim(memname))) =: 'zg.') then
        output;
    end;
 
    rc=dclose(fid);
  run;
 
  filename targdir clear;
 
  /* get the memnames into macro vars */ 
  proc sql noprint;
    select memname into: zname1- from _zipfiles;
    %let zipcount=&sqlobs;
  quit;
 
  /* for all ZIP files, gather the members */
  %do i = 1 %to &zipcount;
    %put &targdir/&&zname&i;
    filename targzip ZIP "&targdir/&&zname&i";
 
    data _contents&i.(keep=zip memname);
      length zip $200 memname $200;
      zip="&targdir/&&zname&i";
      fid=dopen("targzip");
 
      if fid=0 then
        stop;
      memcount=dnum(fid);
 
      do i=1 to memcount;
        memname=dread(fid,i);
 
        /* save only full file names, not directory names */
        if (first(reverse(trim(memname))) ^='/') then
          output;
      end;
 
      rc=dclose(fid);
    run;
 
    filename targzip clear;
  %end;
 
  /* Combine the member names into a single data set        */
  /* the colon notation matches all files with "_contents" prefix */
  data &outlist.;
    set _contents:;
  run;
 
  /* cleanup temp files */
  proc datasets lib=work nodetails nolist;
    delete _contents:;
    delete _zipfiles;
  run;
 
%mend;

Use the macro like this:

%listzipcontents(targdir=c:temp, 
 outlist=work.allfiles);

Here's an example of the output.
zip file contents within the target directory

Experience has taught me that savvy SAS programmers will scrutinize my example code and offer improvements. For example, they might notice my creative use of the REVERSE function and "=:" operator to simulate and "ends with" comparison function -- and then suggest something better. If I don't receive at least a few suggestions for improvements, I'll know that no one has read the post. I hope I'm not disappointed!

tags: FILENAME ZIP, SAS programming, xcmd, ZIP files

The post List the contents of your ZIP and gz files using SAS appeared first on The SAS Dummy.