A ghoulish Halloween Boo to all my readers! Hope my costume freaks you out, but even if it doesn't, I’m positive PROC FREQ will in a few amazing ways! Today’s Programming 2: Data Manipulation Techniques class asked about the power of PROC FREQ. Since I stopped to explain some of it's benefits to [...]
This article demonstrates a SAS programming technique that I call Kuhfeld's template modification technique. The technique enables you to dynamically modify an ODS template and immediately call the modified template to produce a new graph or table. By following the five steps in this article, you can implement the technique in less than 20 lines of SAS code.
Kuhfeld's Template Modification Technique (TMT)
The template modification technique (TMT) is one of several advanced techniques that Warren Kuhfeld created and promoted in a series of articles that culminated with Kuhfeld (2016). (Slides are also available.) The technique that I call the TMT is a DATA _NULL_ program that modifies a SAS-supplied template, uses CALL EXECUTE to compiles it "on the fly," and immediately calls the modified template to render a new graph (pp. 12ff in the paper; p. 14 of his slides).
When I attended Warren's 2016 presentation, I felt a bit overwhelmed by the volume of information. It was difficult for me to see the forest because of all the interconnected "trees." Graph customization is easier to understand if you restrict your attention only to GTL templates. You can make simple modifications to a template (for example, changing the value of an option) by using the following five steps:
- Prepend a private item store to the ODS search path. The private item store will store the modified template.
- Write the existing template to a temporary text file.
- Run a DATA step that reads the existing template one line at a time. Check each line to see if it contains the option that you want to change. If so, modify the line to change the template's behavior.
- Compile the modified template into the private item store.
- Call PROC SGRENDER to render the graph by using the new template.
This article is intentionally brief. For more information see the chapter "ODS Graphics Template Modification" in the SAS/STAT documentation.
Most of Kuhfeld's examples use the TMT to modify ODS templates that are distributed with SAS. These templates can be complex and often contain conditional logic. In an effort to produce a simpler example, the next section creates a simple template that uses a heat map to display a response variable on a uniform grid.
Example: A template that specifies a color ramp
Programmers who use the GTL to create graphs have probably encountered dynamic variables and run-time macro variables. You can use these to specify certain options at run time, rather than hard-coding information into a template. However, the documentation states that "dynamic variables ... cannot resolve to statement or option keywords." In particular, the GTL does not support changing the color ramp at run time: whatever color ramp is hard-coded in the template is the color ramp that is used. (Note: If you have SAS 9.4M3, the HEATMAPPARM statement in PROC SGPLOT supports specifying a color ramp at run time.)
However, you can use Kuhfeld's TMT to modify the color ramp at run time. The following GTL defines a template named "HEATMAP." The template uses the HEATMAPPARM statement and hard-codes the COLORMODEL= option to be the "ThreeColorRamp" in the current ODS style. The template is adapted from the article "Creating a basic heat map in SAS."
/* Construct example template: a basic heat map with continuous color ramp */ proc template; define statgraph HEATMAP; dynamic _X _Y _Z; /* dynamic variables */ begingraph; layout overlay; heatmapparm x=_X y=_Y colorresponse=_Z / /* specify variables at run time */ name="heatmap" primary=true xbinaxis=false ybinaxis=false colormodel = THREECOLORRAMP; /* <== hard-coded color ramp */ continuouslegend "heatmap"; endlayout; endgraph; end; run;
When the PROC TEMPLATE step runs, the HEATMAP template is stored in a location that is determined by your ODS search path. You can run ods path show; to display the search path. You can discover where the HEATMAP template is stored by running proc template; list HEATMAP; run;.
The following DATA step defines (x,y) data on a uniform grid and defines z to be a cubic function of x and y. The heat map shows an approximate contour plot of the cubic function:
/* Construct example data: A cubic function z = f(x,y) on regular grid */ do x = -1 to 1 by 0.05; do y = -1 to 1 by 0.05; z = x**3 - y**2 - x + 0.5; output; end; end; run; /* visualize the data using the built-in color ramp */ proc sgrender data=Cubic template=Heatmap; dynamic _X='X' _Y='Y' _Z='Z'; run;
The remainder of this article shows how to use Kuhfeld's TMT to create a new template which is identical to the HEATMAP template except that it uses a different color ramp. Kuhfeld's technique copies the existing template line by line but replaces the text "THREECOLORRAMP" with text that defines a new custom color ramp.
In the following program, capital letters are used for the name of the existing template, the name of the new template, and the value of the new color ramp. Most of the uncapitalized terms are "boilerplate" for the TMT, although each application will require unique logic in the DATA _NULL_ step.
Step 1: Prepend a private item store to the ODS search path
PROC TEMPLATE writes templates to the first (writable) item store on the ODS search path. PROC SGRENDER looks through the ODS search path until it locates a template with a given name. The first step is to prepend a private item store to the search path, as shown by the following statement:
/* 1. Modify the search path for ODS. Prepend an item store in WORK */ ods path (prepend) work.modtemp(update);
The ODS PATH statement ensures that the modified template will be stored in WORK.MODTEMP. You can provide any name you want. The name WORK.TEMPLAT is a convention that is used in the SAS documentation. Because you might already have WORK.TEMPLAT on your ODS path, I used a different name. (Note: Any item stores in WORK are deleted when SAS exits.) The SAS/STAT documentation provides more information about the template search path.
Step 2: Write the existing template to a temporary text file
You need to write the source code for the HEATMAP template to a plain text file so that it can be read by the SAS DATA step. You can use the FILENAME statement to point to a hard-coded location (such as filename t "C:/temp/tmplt.txt";), but I prefer to use the SAS keyword TEMP, which is more portable. The TEMP keyword tells SAS to create a temporary file name in a location that is independent of the operating system and the file system:
/* 2. Write body of existing template to text file */ filename _tmplt TEMP; /* create temporary file and name */ proc template; source HEATMAP / file=_tmplt; /* write template source to text file */ run;
If you omit the FILE= option, the template source is written to the SAS log. The text file contains only the body of the HEATMAP template.
Steps 3 and 4: Create a new template from the old template
The next step reads and modifies the existing template line by line. There are many SAS character functions that enable you to discover whether some keyword exists in a line of text. Examples include the INDEX and FIND family of functions, the PRXCHANGE function (which uses regular expressions), and the TRANWRD function.
The goal of the current example is to replace the word "THREECOLORRAMP" by another string. The TRANWRD function replaces the string if it is found; the line is unchanged if the word is not found. Therefore it is safe to call the TRANWRD function on every line in the existing template.
You can use the CALL EXECUTE function to place each (possibly modified) line of the template onto a queue, which will be executed when the DATA step completes. However, the text file does not start with a PROC TEMPLATE statement nor end with a RUN statement. You must add those statements to the execution queue in order to compile the template, as follows:
/* one possible way to specify a new color ramp */ %let NEWCOLORRAMP = (blue cyan yellow red); /* 3. Read old template; make modifications. 4. Use CALL EXECUTE to compile (modified) template with a new name. */ data _null_; /* Modify graph template to replace default color ramp */ infile _tmplt end=eof; /* read from text file; last line sets EOF flag to true */ input; /* read one line at a time */ if _n_ = 1 then call execute('proc template;'); /* add 'PROC TEMPLATE;' before */ if findw(_infile_, 'define statgraph') then _infile_ = "define statgraph NEWHEATMAP;"; /* optional: assign new template name */ /* use TRANWRD to make substitutions here */ _infile_ = tranwrd(_infile_, 'THREECOLORRAMP', "&NEWCOLORRAMP"); /* replace COLORMODEL= option */ call execute(_infile_); /* push line onto queue; compile after DATA step */ if eof then call execute('run;'); /* add 'RUN;' at end */ run;
The DATA step copies the existing template source, but modifies the template name and the COLORMODEL= option. The CALL EXECUTE statements push each (potentially modified) statement onto a queue that is executed when the DATA step exits. The result is that PROC TEMPLATE compiles a new template and stores it in the WORK.MODTEMP item store. Recall that the TRANWRD function and other string-manipulation functions are case sensitive, so be sure to use the exact capitalization that exists in the template. In other words, this technique requires that you know details of the template that you wish to modify!
If you are using this technique to modify one of the built-in SAS templates that are used by SAS procedure, do not modify the template name.
Inspect the modified template
Before you attempt to use the new template, you might want to confirm that it is correct. The following call to PROC TEMPLATE shows the location and creation date for the new template. The source for the template is displayed in the log so that you can verify that the substitutions were made correctly.
proc template; list NEWHEATMAP / stats=created; /* location of the modified template */ source NEWHEATMAP; /* confirm that substitutions were performed */ run;
Step 5: Render the graph by using the new template
The final step is to call PROC SGRENDER to use the newly created template to create the new graph:
/* 5. Render graph by using new template */ proc sgrender data=Cubic template=NEWHEATMAP; dynamic _X='X' _Y='Y' _Z='Z'; run;
The output demonstrates that the original color ramp ("ThreeColorRamp") has been replaced by a four-color ramp that uses the colors blue, cyan, yellow, and red.
When is this necessary?
You might wonder if the result is worth all this complexity and abstraction. Couldn't I have merely copied the template into a program editor and changed the color ramp? Yes, you can use a program editor to modify a template for your personal use. However, this programming technique enables you to write macros, tasks, and SAS/IML functions that other people can use. For example, you could encapsulate the program in this article into a %CreateHeatmap macro that accepts a color ramp and data set as an argument. Or in SAS/IML you can write modules such as the HEATMAPCONT and HEATMAPDISC subroutines, which enable SAS/IML programmers to specify any color ramp to visualize a matrix.
If you work in a regulated industry and are modifying templates that SAS distributes, this technique provides reproducibility. A reviewer or auditor can run a program to reproduce a graph.
Kuhfeld's template modification technique (TMT) enables you to write a SAS program that modifies an ODS template. The TMT combines many SAS programming techniques and can be difficult to comprehend when you first see it. This article breaks down the technique into five simpler steps. You can use the TMT to edit ODS templates that are distributed by SAS and used in SAS procedures, or (as shown in this article) to change options at run time for a template that you wrote yourself. You can download the SAS program for this article.
Happy Halloween to all my readers. I hope you'll agree that Kuhfeld's TRICKS are a real TREAT!
The post A SAS programming technique to modify ODS templates appeared first on The DO Loop.
Suppose you want a list of car manufacturers from the CARS dataset. Easy! Call the %CHARLIST macro from a %PUT statement, like this: The CHARLIST macro generates a list of unique values of a selected variable from a selected dataset. So does PROC FREQ. But, if you don't need statistics, the CHARLIST [...]
Reading an external file that contains delimiters (commas, tabs, or other characters such as a pipe character or an exclamation point) is easy when you use the IMPORT procedure. It's easy in that variable names are on row 1, the data starts on row 2, and the first 20 rows are a good sample of your data. Unfortunately, most delimited files are not created with those restrictions in mind. So how do you read files that do not follow those restrictions?
You can still use PROC IMPORT to read the comma-, tab-, or otherwise-delimited files. However, depending on the circumstances, you might have to add the GUESSINGROWS= statement to PROC IMPORT or you might need to pre-process the delimited file before you use PROC IMPORT.
Note: PROC IMPORT is available only for use in the Microsoft Windows, UNIX, or Linux operating environments.
The following sections explain four different scenarios for using PROC IMPORT to read files that contain the delimiters that are listed above.
In this scenario, I use PROC IMPORT to read a comma-delimited file that has variable names on row 1 and data starting on row 2, as shown below:
proc import datafile='c:\temp\classdata.csv' out=class dbms=csv replace; run;
When I submit this code, the following message appears in my SAS® log:
NOTE: Invalid data for Age in line 28 9-10. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--- 28 Janet,F,NA,62.5,112.5 21 Name=Janet Sex=F Age=. Height=62.5 Weight=112.5 _ERROR_=1 _N_=27 NOTE: 38 records were read from the infile 'c:\temp\classdata.csv'. The minimum record length was 17. The maximum record length was 21. NOTE: The data set WORK.CLASS has 38 observations and 5 variables.
In this situation, how do you prevent the Invalid Data message in the SAS log?
By default, SAS scans the first 20 rows to determine variable attributes (type and length) when it reads a comma-, tab-, or otherwise-delimited file. Beginning in SAS® 9.1, a new statement (GUESSINGROWS=) is available in PROC IMPORT that enables you to tell SAS how many rows you want it to scan in order to determine variable attributes. In SAS 9.1 and SAS® 9.2, the GUESSINGROWS= value can range from 1 to 32767. Beginning in SAS® 9.3, the GUESSINGROWS= value can range from 1 to 2147483647. Keep in mind that the more rows you scan, the longer it takes for the PROC IMPORT to run.
The following program illustrates the use of the GUESSINGROWS= statement in PROC IMPORT:
proc import datafile='c:\temp\classdata.csv' out=class dbms=csv replace; guessingrows=100; run;
The example above includes the statement GUESSINGROWS=100, which instructs SAS to scan the first 100 rows of the external file for variable attributes. You might need to increase the GUESSINGROWS= value to something greater than 100 to obtain the results that you want.
In this scenario, my delimited file has the variable names on row 4 and the data starts on row 5. When you use PROC IMPORT, you can specify the record number at which SAS should begin reading. Although you can specify which record to start with in PROC IMPORT, you cannot extract the variable names from any other row except the first row of an external file that is comma-, tab-, or an otherwise-delimited.
Then how do you program PROC IMPORT so that it begins reading from a specified row?
To do that, you need to allow SAS to assign the variable names in the form VARx (where x is a sequential number). The following code illustrates how you can skip the first rows of data and start reading from row 4 by allowing SAS to assign the variable names:
proc import datafile='c:\temp\class.csv' out=class dbms=csv replace; getnames=no; datarow=4; run;
In this scenario, I want to read only records 6–15 (inclusive) in the delimited file. So the question here is how can you set PROC IMPORT to read just a section of a delimited file?
To do that, you need to use the OBS= option before you execute PROC IMPORT and use the DATAROW= option within PROC IMPORT.
The following example reads the middle ten rows of a CSV file, starting at row 6:
options obs=15; proc import out=work.test2 datafile= "c:\temp\class.csv" dbms=csv replace; getnames=yes; datarow=6; run; options obs=max; run;
Notice that I reset the OBS= option to MAX after the IMPORT procedure to ensure that any code that I run after the procedure processes all observations.
In this scenario, I again use PROC IMPORT to read my external file. However, I receive more observations in my SAS data set than there are data rows in my delimited file. The external file looks fine when it is opened with Microsoft Excel. However, when I use Microsoft Windows Notepad or TextPad to view some records, my data spans multiple rows for values that are enclosed in quotation marks. Here is a snapshot of what the file looks like in both Microsoft Excel and TextPad, respectively:
The question for this scenario is how can I use PROC IMPORT to read this data so that the observations in my SAS data set match the number of rows in my delimited file?
In this case, the external file contains embedded carriage return (CR) and line feed (LF) characters in the middle of the data value within a quoted string. The CRLF is an end-of-record marker, so the remaining text in the string becomes the next record. Here are the results from reading the CSV file that is illustrated in the Excel and TextPad files that are shown earlier:
That behavior is why you receive more observations than you expect. Anytime SAS encounters a CRLF, SAS considers that a new record regardless of where it is found.
A sample program that removes a CRLF character (as long as it is part of a quoted text string) is available in SAS Note 26065, "Remove carriage return and line feed characters within quoted strings."
After you run the code (from the Full Code tab) in SAS Note 26065 to pre-process the external file and remove the erroneous CR/LF characters, you should be able to use PROC IMPORT to read the external file with no problems.
For more information about PROC IMPORT, see "Chapter 35, The IMPORT Procedure" in the Base SAS® 9.4 Procedures Guide, Seventh Edition.
Would you like to format your macro variables in SAS? Good news. It's easy! Just use the %FORMAT function, like this: %let x=1111; Log %put %format(&x,dollar11.); $1,111 %put %format(&x,roman.); MCXI %put %format(&x,worddate.); January 16, 1963 %let today=%sysfunc(today()); %put %format(&today,worddate.); October 13, 2017 %put %format(Macro,$3.); Mac What?! You never [...]
Sometimes life is just too busy, and I bet many of you feel the same way. If you’re like me, you’re playing a number of roles: employee, spouse, parent, community leader, coach and so on. With all the craziness, it’s likely we’re shortchanging another role that’s critically important to our [...]
In a large simulation study, it can be convenient to have a "control file" that contains the parameters for the study. My recent article about how to simulate multivariate normal clusters demonstrates a simple example of this technique. The simulation in that article uses an input data set that contains the parameters (mean, standard deviations, and correlations) for the simulation. A SAS procedure (PROC SIMNORMAL) simulates data based on the parameters in the input data set.
This is a powerful paradigm. Instead of hard-coding the parameters in the program (or as macro variables), the parameters are stored in a data set that is processed by the program. This is sometimes called data-driven programming. (Some people call it dynamic programming, but there is an optimization technique of the same name so I will use the term "data-driven.") In a data-driven program, when you want to run the program with new parameters, you merely modify the data set that contains the control parameters.
I have previously written about a different way to control a batch program by passing in parameters on the command line when you invoke the SAS program.
Static programming and hard-coded parameters
Before looking at data-driven programming, let's review the static approach. I will simulate clusters of univariate normal data as an example.
Suppose that you want to simulate normal data for three different groups. Each group has its own sample size (N), mean, and standard deviation. In my book Simulating Data with SAS (p. 206), I show how to simulate this sort of ANOVA design by using arrays, as follows.
/* Static simulation: Parameters embedded in the simulation program */ data AnovaStatic; /* define parameters for three simulated group */ array N _temporary_ (50, 50, 50); /* sample sizes */ array Mean _temporary_ (14.6, 42.6, 55.5); /* center for each group */ array StdDev _temporary_ ( 1.7, 4.7, 5.5); /* spread for each group */ call streaminit(12345); do k = 1 to dim(N); /* for each group */ do i = 1 to N[k]; /* simulate N[k] observations */ x = rand("Normal", Mean[k], StdDev[k]); /* from k_th normal distribution */ output; end; end; run;
The DATA step contains two loops, one for the groups and the other for the observations within each group. The parameters for each group are stored in arrays. Notice that if you want to change the parameters (including the number of groups), you need to edit the program. I call this method "static programming" because the behavior of the program is determined at the time that the program is written. This is a perfectly acceptable method for most applications. It has the advantage that you know exactly what the program will do by looking at the program.
Data-driven programming: Put parameters in a file
An alternative is to put the parameters for each group into a file or data set. If the k_th row in the data set contains the parameters for the k_th group, then the implicit loop in the DATA step will iterate over all groups, regardless of the number of groups. The following DATA step creates the parameters for three groups, which are read and processed by the second DATA step. The parameter values are the same as for the static example, but are transposed and processed row-by-row instead of via arrays:
/* Data-driven simulation: Parameters in a data set, processed by the simulation program */ data params; /* define parameters for each simulated group */ input N Mean StdDev; datalines; 50 14.6 1.7 50 42.6 4.7 50 55.5 5.5 ; data AnovaDynamic; call streaminit(12345); set params; /* implicit loop over groups k=1,2,... */ do i = 1 to N; /* simulate N[k] observations */ x = rand("Normal", Mean, StdDev); /* from k_th normal distribution */ output; end; run;
Notice the difference between the static and dynamic techniques. The static technique simulates data from three groups whose parameters are specified in temporary arrays. The dynamic technique simulates data from an arbitrary number of groups. Currently, the PARAMS data specifies three groups, but if I change the PARAMS data set to represent 10 or 1000 groups, the AnovaDynamic DATA step will simulate data from the new design without any modification.
Generate the parameters from real data
The data-driven technique is useful when the parameters are themselves the results of an analysis. For example, a common simulation technique is to generate the moments of real data (mean, variance, skewness, and so forth) and to use those statistics in place of the population parameters that they estimate. (See Chapter 16, "Moment Matching," in Simulating Statistics with SAS.)
The following call to PROC MEANS generates the sample mean and standard deviation for real data and writes those values to a data set:
proc means data=sashelp.iris N Mean StdDev stackods; class Species; var PetalLength; ods output Summary=params; run;
The output data set from PROC MEANS creates a PARAMS data set that contains the variables (N, MEAN, and STDDEV) that are read by the simulation program. Therefore, you can immediately run the AnovaDynamic DATA step to simulate normal data from the sample statistics. A visualization of the resulting simulated data is shown below.
You can run PROC MEANS on other data and other variables and the AnovaDynamic step will continue to work without any modification. The simulation is controlled entirely by the values in the "control file," which is the PARAMS data set.
You can generalize this technique by wrapping the program in a SAS macro in which the name of the parameter file and the name of the simulated data set are provided at run time. With a macro implementation, you can read from multiple input files and write to multiple output data sets. You could use such a macro, for example, to break up a large simulation study into smaller independent sub-simulations, each controlled by its own file of input parameters. In a gridded environment, each sub-simulation can be processed independently and in parallel, thus reducing the total time required to complete the study.
Although this article discusses control files in the context of statistical simulation, other applications are possible. Have you used a similar technique to control a program by using an input file that contains the parameters for the program? Leave a comment.
The Western Users of SAS Software 2017 conference is coming to Long Beach, CA, September 20-22. I have been to a lot of SAS conferences, but WUSS is always my favorite because it is big enough for me to learn a lot, but small enough to be really friendly.
If you come I hope you will catch my presentations. If you want a preview or if you can’t come, click the links below to download the papers.
On Wednesday, I will once again present SAS Essentials, a whirlwind introduction to SAS programming in just three hours specially designed for people who are new to SAS.
Then on Friday Lora Delwiche will present a Hands-On Workshop about SAS Studio, a new SAS interface that runs in a web browser.
I hope to see you there!
A while back, I wrote about the proliferation of interfaces for writing SAS programs. I am reposting that blog here (with a few changes) because a lot of SAS users still don’t understand that they have a choice.
These days SAS programmers have more choices than ever before about how to run SAS. They can use the old SAS windowing enviroment (often called Display Manager because, face it, SAS windowing environment is way too vague), or SAS Enterprise Guide, or the new kid on the block: SAS Studio. All of these are included with Base SAS.
I recently asked a SAS user, “Which interface do you use for SAS programming?”
She replied, “Interface? I just install SAS and use it.”
“You’re using Display Manager,” I explained, but she had no idea what I was talking about.
Trust me. This person is an extremely sophisticated SAS user who does a lot of leading-edge mathematical programming, but she didn’t realize that Display Manager is not SAS. It is just an interface to SAS.
This is where old timers like me have an advantage. If you can remember running SAS in batch, then you know that Display Manager, SAS Enterprise Guide, and SAS Studio are just interfaces to SAS–wonderful, manna from heaven–but still just interfaces. They are optional. It is possible to write SAS programs in an editor such as Word or Notepad++, and copy-and-paste into one of the interfaces or submit them in batch. In fact, here is a great blog by Leonid Batkhan describing how to use your web browser as a SAS code editor.
Each of these interfaces has advantages and disadvantages. I’m not going to list them all here, because this is a blog not an encyclopedia, but the tweet would be
“DM is the simplest, EG has projects, SS runs in browsers.”
I have heard rumors that SAS Institute is trying to develop an interface that combines the best features of all three. So someday maybe one of these will displace the others, but at least for the near future, all three of these interfaces will continue to be used.
So what’s your SAS interface?
It's time to share another tip about working with ZIP files in SAS. Since I first wrote about FILENAME ZIP to list and extract files from a ZIP archive, readers have been asking for more. Specifically, they want additional details about the files that are contained in a ZIP, including the original file datetime stamps, file size, and compressed size. Thanks to a feature that was quietly added into SAS 9.4 Maintenance 3, you can use the FINFO function to retrieve these details. In this article, I share a SAS macro program that does the job.
Here's an abridged example of the output. If you need to create something like this without the use of external ZIP tools like 7-Zip or WinZip (which are often unavailable in controlled environments), read on.
You can download the full program from my public gist on GitHub: zipfiles_list_details.sas
ZIPpy details: a solution in three macros
Here's my basic approach to this problem:
- First, create a list of all of the ZIP files in a directory and all of the file "members" that are compressed within. I've already shared this technique in a previous article. Like an efficient (or lazy) programmer, I'm just reusing that work. That's macro routine #1 (%listZipContents).
- With this list in hand, iterate through each ZIP file member, "open" the file with FOPEN, and gather all of the available file attributes with FINFO. I've divided this into two macros for readability. %getZipMemberInfo (macro routine #2) retrieves all of the file details for a single member and stores them in a data set. %getZipDetails (macro routine #3) iterates through the list of ZIP file members, calls %getZipMemberInfo on each, and concatenates the results into a single output data set.
Here's a sample usage:
%listzipcontents (targdir=C:\Projects\ZIPPED_Examples, outlist=work.zipfiles); %getZipDetails (inlist=work.zipfiles, outlist=work.zipdetails);
I tried to add decent comments to my program so that interested coders can study and adapt as needed. Here's a snippet of code that uses the FINFO function, which is really the important part for retrieving these file details.
/* Assumes an assignment like: FILENAME F ZIP "C:\ZIPPED_Examples\SudokuSolver_src.zip" member="src/AboutThisProject.txt"; */ fId = fopen("&f","S"); if fID then do; infonum=foptnum(fid); do i=1 to infonum; infoname=foptname(fid,i); select (infoname); when ('Filename') filename=finfo(fid,infoname); when ('Member Name') membername=finfo(fid,infoname); when ('Size') filesize=input(finfo(fid,infoname),15.); when ('Compressed Size') compressedsize=input(finfo(fid,infoname),15.); when ('CRC-32') crc32=finfo(fid,infoname); when ('Date/Time') filetime=input(finfo(fid,infoname),anydtdtm.); end; end; compressedratio = compressedsize / filesize; output; fId = fClose( fId );
The FINFO function in SAS provides access to file attributes and their values for a given file that you've accessed using the FOPEN function. The available file attributes can differ according to the type of file (FILENAME access method) that is used. ZIP files, as you can guess, have some attributes that are specific to them: "Compressed Size", "CRC-32", and others. This code checks for all of the available attributes and keeps those that we need for our detailed output. (And see the use of the SELECT/WHEN statement? So much more readable than a bunch of IF/THEN/ELSEs.)
Look, I'm not going to claim that my approach to this problem is the most elegant or most efficient -- but it works. If it can be improved, then I'm sure I'll hear from a few of you experts out there. Bring it on!
For more about ZIP files in SAS
- How to create ZIP files with ODS PACKAGE ZIP (available since SAS 9.2)
- How to "unzip" and read ZIP files using FILENAME ZIP (SAS 9.4 and later)
- How to create and update ZIP files with FILENAME ZIP (SAS 9.4 and later)
- List the contents of ZIP files with FILENAME ZIP (SAS 9.4 and later)
The post Using FILENAME ZIP and FINFO to list the details in your ZIP files appeared first on The SAS Dummy.