Suppose you want a list of car manufacturers from the CARS dataset. Easy! Call the %CHARLIST macro from a %PUT statement, like this: The CHARLIST macro generates a list of unique values of a selected variable from a selected dataset. So does PROC FREQ. But, if you don't need statistics, the CHARLIST [...]
Reading an external file that contains delimiters (commas, tabs, or other characters such as a pipe character or an exclamation point) is easy when you use the IMPORT procedure. It's easy in that variable names are on row 1, the data starts on row 2, and the first 20 rows are a good sample of your data. Unfortunately, most delimited files are not created with those restrictions in mind. So how do you read files that do not follow those restrictions?
You can still use PROC IMPORT to read the comma-, tab-, or otherwise-delimited files. However, depending on the circumstances, you might have to add the GUESSINGROWS= statement to PROC IMPORT or you might need to pre-process the delimited file before you use PROC IMPORT.
Note: PROC IMPORT is available only for use in the Microsoft Windows, UNIX, or Linux operating environments.
The following sections explain four different scenarios for using PROC IMPORT to read files that contain the delimiters that are listed above.
In this scenario, I use PROC IMPORT to read a comma-delimited file that has variable names on row 1 and data starting on row 2, as shown below:
proc import datafile='c:\temp\classdata.csv' out=class dbms=csv replace; run;
When I submit this code, the following message appears in my SAS® log:
NOTE: Invalid data for Age in line 28 9-10. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--- 28 Janet,F,NA,62.5,112.5 21 Name=Janet Sex=F Age=. Height=62.5 Weight=112.5 _ERROR_=1 _N_=27 NOTE: 38 records were read from the infile 'c:\temp\classdata.csv'. The minimum record length was 17. The maximum record length was 21. NOTE: The data set WORK.CLASS has 38 observations and 5 variables.
In this situation, how do you prevent the Invalid Data message in the SAS log?
By default, SAS scans the first 20 rows to determine variable attributes (type and length) when it reads a comma-, tab-, or otherwise-delimited file. Beginning in SAS® 9.1, a new statement (GUESSINGROWS=) is available in PROC IMPORT that enables you to tell SAS how many rows you want it to scan in order to determine variable attributes. In SAS 9.1 and SAS® 9.2, the GUESSINGROWS= value can range from 1 to 32767. Beginning in SAS® 9.3, the GUESSINGROWS= value can range from 1 to 2147483647. Keep in mind that the more rows you scan, the longer it takes for the PROC IMPORT to run.
The following program illustrates the use of the GUESSINGROWS= statement in PROC IMPORT:
proc import datafile='c:\temp\classdata.csv' out=class dbms=csv replace; guessingrows=100; run;
The example above includes the statement GUESSINGROWS=100, which instructs SAS to scan the first 100 rows of the external file for variable attributes. You might need to increase the GUESSINGROWS= value to something greater than 100 to obtain the results that you want.
In this scenario, my delimited file has the variable names on row 4 and the data starts on row 5. When you use PROC IMPORT, you can specify the record number at which SAS should begin reading. Although you can specify which record to start with in PROC IMPORT, you cannot extract the variable names from any other row except the first row of an external file that is comma-, tab-, or an otherwise-delimited.
Then how do you program PROC IMPORT so that it begins reading from a specified row?
To do that, you need to allow SAS to assign the variable names in the form VARx (where x is a sequential number). The following code illustrates how you can skip the first rows of data and start reading from row 4 by allowing SAS to assign the variable names:
proc import datafile='c:\temp\class.csv' out=class dbms=csv replace; getnames=no; datarow=4; run;
In this scenario, I want to read only records 6–15 (inclusive) in the delimited file. So the question here is how can you set PROC IMPORT to read just a section of a delimited file?
To do that, you need to use the OBS= option before you execute PROC IMPORT and use the DATAROW= option within PROC IMPORT.
The following example reads the middle ten rows of a CSV file, starting at row 6:
options obs=15; proc import out=work.test2 datafile= "c:\temp\class.csv" dbms=csv replace; getnames=yes; datarow=6; run; options obs=max; run;
Notice that I reset the OBS= option to MAX after the IMPORT procedure to ensure that any code that I run after the procedure processes all observations.
In this scenario, I again use PROC IMPORT to read my external file. However, I receive more observations in my SAS data set than there are data rows in my delimited file. The external file looks fine when it is opened with Microsoft Excel. However, when I use Microsoft Windows Notepad or TextPad to view some records, my data spans multiple rows for values that are enclosed in quotation marks. Here is a snapshot of what the file looks like in both Microsoft Excel and TextPad, respectively:
The question for this scenario is how can I use PROC IMPORT to read this data so that the observations in my SAS data set match the number of rows in my delimited file?
In this case, the external file contains embedded carriage return (CR) and line feed (LF) characters in the middle of the data value within a quoted string. The CRLF is an end-of-record marker, so the remaining text in the string becomes the next record. Here are the results from reading the CSV file that is illustrated in the Excel and TextPad files that are shown earlier:
That behavior is why you receive more observations than you expect. Anytime SAS encounters a CRLF, SAS considers that a new record regardless of where it is found.
A sample program that removes a CRLF character (as long as it is part of a quoted text string) is available in SAS Note 26065, "Remove carriage return and line feed characters within quoted strings."
After you run the code (from the Full Code tab) in SAS Note 26065 to pre-process the external file and remove the erroneous CR/LF characters, you should be able to use PROC IMPORT to read the external file with no problems.
For more information about PROC IMPORT, see "Chapter 35, The IMPORT Procedure" in the Base SAS® 9.4 Procedures Guide, Seventh Edition.
Would you like to format your macro variables in SAS? Good news. It's easy! Just use the %FORMAT function, like this: %let x=1111; Log %put %format(&x,dollar11.); $1,111 %put %format(&x,roman.); MCXI %put %format(&x,worddate.); January 16, 1963 %let today=%sysfunc(today()); %put %format(&today,worddate.); October 13, 2017 %put %format(Macro,$3.); Mac What?! You never [...]
Sometimes life is just too busy, and I bet many of you feel the same way. If you’re like me, you’re playing a number of roles: employee, spouse, parent, community leader, coach and so on. With all the craziness, it’s likely we’re shortchanging another role that’s critically important to our [...]
In a large simulation study, it can be convenient to have a "control file" that contains the parameters for the study. My recent article about how to simulate multivariate normal clusters demonstrates a simple example of this technique. The simulation in that article uses an input data set that contains the parameters (mean, standard deviations, and correlations) for the simulation. A SAS procedure (PROC SIMNORMAL) simulates data based on the parameters in the input data set.
This is a powerful paradigm. Instead of hard-coding the parameters in the program (or as macro variables), the parameters are stored in a data set that is processed by the program. This is sometimes called data-driven programming. (Some people call it dynamic programming, but there is an optimization technique of the same name so I will use the term "data-driven.") In a data-driven program, when you want to run the program with new parameters, you merely modify the data set that contains the control parameters.
I have previously written about a different way to control a batch program by passing in parameters on the command line when you invoke the SAS program.
Static programming and hard-coded parameters
Before looking at data-driven programming, let's review the static approach. I will simulate clusters of univariate normal data as an example.
Suppose that you want to simulate normal data for three different groups. Each group has its own sample size (N), mean, and standard deviation. In my book Simulating Data with SAS (p. 206), I show how to simulate this sort of ANOVA design by using arrays, as follows.
/* Static simulation: Parameters embedded in the simulation program */ data AnovaStatic; /* define parameters for three simulated group */ array N _temporary_ (50, 50, 50); /* sample sizes */ array Mean _temporary_ (14.6, 42.6, 55.5); /* center for each group */ array StdDev _temporary_ ( 1.7, 4.7, 5.5); /* spread for each group */ call streaminit(12345); do k = 1 to dim(N); /* for each group */ do i = 1 to N[k]; /* simulate N[k] observations */ x = rand("Normal", Mean[k], StdDev[k]); /* from k_th normal distribution */ output; end; end; run;
The DATA step contains two loops, one for the groups and the other for the observations within each group. The parameters for each group are stored in arrays. Notice that if you want to change the parameters (including the number of groups), you need to edit the program. I call this method "static programming" because the behavior of the program is determined at the time that the program is written. This is a perfectly acceptable method for most applications. It has the advantage that you know exactly what the program will do by looking at the program.
Data-driven programming: Put parameters in a file
An alternative is to put the parameters for each group into a file or data set. If the k_th row in the data set contains the parameters for the k_th group, then the implicit loop in the DATA step will iterate over all groups, regardless of the number of groups. The following DATA step creates the parameters for three groups, which are read and processed by the second DATA step. The parameter values are the same as for the static example, but are transposed and processed row-by-row instead of via arrays:
/* Data-driven simulation: Parameters in a data set, processed by the simulation program */ data params; /* define parameters for each simulated group */ input N Mean StdDev; datalines; 50 14.6 1.7 50 42.6 4.7 50 55.5 5.5 ; data AnovaDynamic; call streaminit(12345); set params; /* implicit loop over groups k=1,2,... */ do i = 1 to N; /* simulate N[k] observations */ x = rand("Normal", Mean, StdDev); /* from k_th normal distribution */ output; end; run;
Notice the difference between the static and dynamic techniques. The static technique simulates data from three groups whose parameters are specified in temporary arrays. The dynamic technique simulates data from an arbitrary number of groups. Currently, the PARAMS data specifies three groups, but if I change the PARAMS data set to represent 10 or 1000 groups, the AnovaDynamic DATA step will simulate data from the new design without any modification.
Generate the parameters from real data
The data-driven technique is useful when the parameters are themselves the results of an analysis. For example, a common simulation technique is to generate the moments of real data (mean, variance, skewness, and so forth) and to use those statistics in place of the population parameters that they estimate. (See Chapter 16, "Moment Matching," in Simulating Statistics with SAS.)
The following call to PROC MEANS generates the sample mean and standard deviation for real data and writes those values to a data set:
proc means data=sashelp.iris N Mean StdDev stackods; class Species; var PetalLength; ods output Summary=params; run;
The output data set from PROC MEANS creates a PARAMS data set that contains the variables (N, MEAN, and STDDEV) that are read by the simulation program. Therefore, you can immediately run the AnovaDynamic DATA step to simulate normal data from the sample statistics. A visualization of the resulting simulated data is shown below.
You can run PROC MEANS on other data and other variables and the AnovaDynamic step will continue to work without any modification. The simulation is controlled entirely by the values in the "control file," which is the PARAMS data set.
You can generalize this technique by wrapping the program in a SAS macro in which the name of the parameter file and the name of the simulated data set are provided at run time. With a macro implementation, you can read from multiple input files and write to multiple output data sets. You could use such a macro, for example, to break up a large simulation study into smaller independent sub-simulations, each controlled by its own file of input parameters. In a gridded environment, each sub-simulation can be processed independently and in parallel, thus reducing the total time required to complete the study.
Although this article discusses control files in the context of statistical simulation, other applications are possible. Have you used a similar technique to control a program by using an input file that contains the parameters for the program? Leave a comment.
The Western Users of SAS Software 2017 conference is coming to Long Beach, CA, September 20-22. I have been to a lot of SAS conferences, but WUSS is always my favorite because it is big enough for me to learn a lot, but small enough to be really friendly.
If you come I hope you will catch my presentations. If you want a preview or if you can’t come, click the links below to download the papers.
On Wednesday, I will once again present SAS Essentials, a whirlwind introduction to SAS programming in just three hours specially designed for people who are new to SAS.
Then on Friday Lora Delwiche will present a Hands-On Workshop about SAS Studio, a new SAS interface that runs in a web browser.
I hope to see you there!
A while back, I wrote about the proliferation of interfaces for writing SAS programs. I am reposting that blog here (with a few changes) because a lot of SAS users still don’t understand that they have a choice.
These days SAS programmers have more choices than ever before about how to run SAS. They can use the old SAS windowing enviroment (often called Display Manager because, face it, SAS windowing environment is way too vague), or SAS Enterprise Guide, or the new kid on the block: SAS Studio. All of these are included with Base SAS.
I recently asked a SAS user, “Which interface do you use for SAS programming?”
She replied, “Interface? I just install SAS and use it.”
“You’re using Display Manager,” I explained, but she had no idea what I was talking about.
Trust me. This person is an extremely sophisticated SAS user who does a lot of leading-edge mathematical programming, but she didn’t realize that Display Manager is not SAS. It is just an interface to SAS.
This is where old timers like me have an advantage. If you can remember running SAS in batch, then you know that Display Manager, SAS Enterprise Guide, and SAS Studio are just interfaces to SAS–wonderful, manna from heaven–but still just interfaces. They are optional. It is possible to write SAS programs in an editor such as Word or Notepad++, and copy-and-paste into one of the interfaces or submit them in batch. In fact, here is a great blog by Leonid Batkhan describing how to use your web browser as a SAS code editor.
Each of these interfaces has advantages and disadvantages. I’m not going to list them all here, because this is a blog not an encyclopedia, but the tweet would be
“DM is the simplest, EG has projects, SS runs in browsers.”
I have heard rumors that SAS Institute is trying to develop an interface that combines the best features of all three. So someday maybe one of these will displace the others, but at least for the near future, all three of these interfaces will continue to be used.
So what’s your SAS interface?
It's time to share another tip about working with ZIP files in SAS. Since I first wrote about FILENAME ZIP to list and extract files from a ZIP archive, readers have been asking for more. Specifically, they want additional details about the files that are contained in a ZIP, including the original file datetime stamps, file size, and compressed size. Thanks to a feature that was quietly added into SAS 9.4 Maintenance 3, you can use the FINFO function to retrieve these details. In this article, I share a SAS macro program that does the job.
Here's an abridged example of the output. If you need to create something like this without the use of external ZIP tools like 7-Zip or WinZip (which are often unavailable in controlled environments), read on.
You can download the full program from my public gist on GitHub: zipfiles_list_details.sas
ZIPpy details: a solution in three macros
Here's my basic approach to this problem:
- First, create a list of all of the ZIP files in a directory and all of the file "members" that are compressed within. I've already shared this technique in a previous article. Like an efficient (or lazy) programmer, I'm just reusing that work. That's macro routine #1 (%listZipContents).
- With this list in hand, iterate through each ZIP file member, "open" the file with FOPEN, and gather all of the available file attributes with FINFO. I've divided this into two macros for readability. %getZipMemberInfo (macro routine #2) retrieves all of the file details for a single member and stores them in a data set. %getZipDetails (macro routine #3) iterates through the list of ZIP file members, calls %getZipMemberInfo on each, and concatenates the results into a single output data set.
Here's a sample usage:
%listzipcontents (targdir=C:\Projects\ZIPPED_Examples, outlist=work.zipfiles); %getZipDetails (inlist=work.zipfiles, outlist=work.zipdetails);
I tried to add decent comments to my program so that interested coders can study and adapt as needed. Here's a snippet of code that uses the FINFO function, which is really the important part for retrieving these file details.
/* Assumes an assignment like: FILENAME F ZIP "C:\ZIPPED_Examples\SudokuSolver_src.zip" member="src/AboutThisProject.txt"; */ fId = fopen("&f","S"); if fID then do; infonum=foptnum(fid); do i=1 to infonum; infoname=foptname(fid,i); select (infoname); when ('Filename') filename=finfo(fid,infoname); when ('Member Name') membername=finfo(fid,infoname); when ('Size') filesize=input(finfo(fid,infoname),15.); when ('Compressed Size') compressedsize=input(finfo(fid,infoname),15.); when ('CRC-32') crc32=finfo(fid,infoname); when ('Date/Time') filetime=input(finfo(fid,infoname),anydtdtm.); end; end; compressedratio = compressedsize / filesize; output; fId = fClose( fId );
The FINFO function in SAS provides access to file attributes and their values for a given file that you've accessed using the FOPEN function. The available file attributes can differ according to the type of file (FILENAME access method) that is used. ZIP files, as you can guess, have some attributes that are specific to them: "Compressed Size", "CRC-32", and others. This code checks for all of the available attributes and keeps those that we need for our detailed output. (And see the use of the SELECT/WHEN statement? So much more readable than a bunch of IF/THEN/ELSEs.)
Look, I'm not going to claim that my approach to this problem is the most elegant or most efficient -- but it works. If it can be improved, then I'm sure I'll hear from a few of you experts out there. Bring it on!
For more about ZIP files in SAS
- How to create ZIP files with ODS PACKAGE ZIP (available since SAS 9.2)
- How to "unzip" and read ZIP files using FILENAME ZIP (SAS 9.4 and later)
- How to create and update ZIP files with FILENAME ZIP (SAS 9.4 and later)
- List the contents of ZIP files with FILENAME ZIP (SAS 9.4 and later)
The post Using FILENAME ZIP and FINFO to list the details in your ZIP files appeared first on The SAS Dummy.
News flash: My favorite SAS code editor is SAS Enterprise Guide. However, my favorite general purpose text editor is Notepad++, and I often find myself using that tool for viewing SAS log files and for making small modifications to SAS programs. Judging from the popularity of this SAS Support Communities discussion, I'm not alone. In this post, I'll share the steps for turning Notepad++ into a more useful home for SAS programs.
You can download Notepad++ for Windows from here -- you can use it for free, no cost. That's one reason that it's one of the first tools that I install on any new PC I get my hands on!
1. Associate SAS files with Notepad++
You accomplish this in the usual way with Windows. In Windows Explorer, right-click (or SHIFT+right-click depending on your setup) on a .SAS file (SAS program), and select Open with...
► You might see Notepad++ in the menu, but don't select it, Instead, select Choose another app.
► In this menu, select Notepad++ and check the "Always use this app" checkbox.
Repeat this step for SAS program logs (.LOG files) and listing output (.LST files) if you want.
2. Run a SAS program from Notepad++
You can add new program actions to the Run menu in Notepad++. Here's how to add a command to run a SAS program, if you have SAS for Windows installed. Note: These steps assume that the SAS program is open in Notepad++ and is saved in a file.
► Select Run... from the Run menu. In the program to run field, enter this command:
-sysin "$(FULL_CURRENT_PATH)" -log "$(CURRENT_DIRECTORY)\$(NAME_PART).log"
all on one line. You might need to adjust the SAS.EXE path for your install. The command options use some Notepad++ environment variables to direct the SAS log and listing output to the same path as the SAS program file.
Click Save (not Run).
Optionally, assign a shortcut key to the action, and name it "Run program file in SAS" (or whatever you want). This adds the command to your Run menu.
When you select it, Notepad++ will launch SAS, run your program in batch, and direct the output to the same folder where the program is stored.
3. Adding SAS syntax color coding to Notepad++
It's simple to "teach" Notepad++ to recognize the keywords from SAS and other languages. You can download new language definitions files from here -- follow the instructions on the page to have your Notepad++ recognize them. I've created an expanded definition file that includes more SAS keywords (many, many more!) -- you can grab that from my GitHub repo here.
Here's what SAS code looks like in my Notepad++:
I'm sure that some of you have spent more time than I have in creating a souped-up Notepad++ environment, or perhaps you've taken it to another level with other popular editors like Sublime or Vim or UltraEdit. If you have other tips to share, I'd love to hear from you in the comments.
SAS programmers have high expectations for their coding environment, and why shouldn't they? Companies have a huge investment in their SAS code base, and it's important to have tools that help you understand that code and track changes over time. Few things are more satisfying as a SAS program that works as designed and delivers perfect results. (Oh, hyperbole you say? I don't think so.) But when your program isn't working the way it should, there are two features that can help you get back on track: a code debugger, and program revision history. Both of these capabilities are built into SAS Enterprise Guide. Program history was added in v7.1, and the debugger was added in v7.13.
I've written about the DATA step debugger before -- both as a teaching tool and as a productivity tool. In this article, I'm sharing a demo of the debugger's features, led by SAS developer Joe Flynn. Before joining the SAS Enterprise Guide development team, Joe worked in SAS Technical Support. He's very familiar with "bugs," and reported his share of them to SAS R&D. Now -- like every programmer -- Joe makes the bugs. But of course, he fixes most of them before they ever see the light of day. How does he do that? Debugging.
This video is only about 8 minutes long, but it's packed with good information. In the debugger demo, you'll learn how you can use standard debugging methods, such as breakpoints, step over and step through, watch variables, jump to, evaluate expression, and more. There is no better way to understand exactly what is causing your DATA step to misbehave.
In the program history demo (the second part of the video), you'll learn how team members can collaborate using standard source management tools (such as Git). If you establish a good practice of storing code in a central place with solid source management techniques, SAS Enterprise Guide can help you see who changed what, and when. SAS Enterprise Guide also offers a built-in code version comparison tool, which enhances your ability to find the breaking changes. You can also use the code comparison technique on its own, outside of the program history feature.
Take a few minutes to watch the video, and then try out the features yourself. You don't need a Git installation to play with program history at the project level, though it helps when you want to extend that feature to support team collaboration.
- Using the DATA step debugger in SAS Enterprise Guide
- Debugging the difference between WHERE and IF in SAS
- How to compare SAS programs in SAS Enterprise Guide
- VIDEO: Casey Smith's main stage demo of the DATA step debugger at SAS Global Forum
- TUTORIAL: New features for SAS programmers in SAS Enterprise Guide 7.1
The post Code debugging and program history in SAS Enterprise Guide appeared first on The SAS Dummy.