10月 172016
 

SAS programmers often resort to using the X command to list the contents of file directories and to process the contents of ZIP files (or gz files on UNIX). In centralized SAS environments, the X command is unavailable to most programmers. NOXCMD is the default setting for these environments (disallowing shell commands), and SAS admins are reluctant to change it.

In this article, I'll share a SAS program that can retrieve the contents of a file directory (all of the file names), and then also report on the contents of every ZIP file within that directory -- without using any shell commands. The program uses two lesser-known tricks to retrieve the information:

  1. The FILENAME statement can be applied to a directory, and then the DOPEN, DNUM, DREAD, and DCLOSE functions can be used to retrieve information about that directory. (Check SAS Note 45805 for a better example of just this - click the Full Code tab.)
  2. The FILENAME ZIP method (added in SAS 9.4) can retrieve the names of the files within a compressed archive (ZIP or gz files). For more information, see all of my previous articles about the FILENAME ZIP access method.

I wrote the program as a SAS macro so that it should be easy to reuse. And I tried to be liberal with the comments, providing a view into my thinking and maybe some opportunities for improvement.

%macro listzipcontents (targdir=, outlist=);
  filename targdir "&targdir";
 
  /* Gather all ZIP files in a given folder                */
  /* Searches just one folder, not subfolders              */
  /* for a fancier example see                             */
  /* http://support.sas.com/kb/45/805.html (Full Code tab) */
  data _zipfiles;
    length fid 8;
    fid=dopen('targdir');
 
    if fid=0 then
      stop;
    memcount=dnum(fid);
 
    /* Save just the names ending in ZIP*/
    do i=1 to memcount;
      memname=dread(fid,i);
      /* combo of reverse and =: to match ending string */
      /* Looking for *.zip and *.gz files */
      if (reverse(lowcase(trim(memname))) =: 'piz.') OR
         (reverse(lowcase(trim(memname))) =: 'zg.') then
        output;
    end;
 
    rc=dclose(fid);
  run;
 
  filename targdir clear;
 
  /* get the memnames into macro vars */ 
  proc sql noprint;
    select memname into: zname1- from _zipfiles;
    %let zipcount=&sqlobs;
  quit;
 
  /* for all ZIP files, gather the members */
  %do i = 1 %to &zipcount;
    %put &targdir/&&zname&i;
    filename targzip ZIP "&targdir/&&zname&i";
 
    data _contents&i.(keep=zip memname);
      length zip $200 memname $200;
      zip="&targdir/&&zname&i";
      fid=dopen("targzip");
 
      if fid=0 then
        stop;
      memcount=dnum(fid);
 
      do i=1 to memcount;
        memname=dread(fid,i);
 
        /* save only full file names, not directory names */
        if (first(reverse(trim(memname))) ^='/') then
          output;
      end;
 
      rc=dclose(fid);
    run;
 
    filename targzip clear;
  %end;
 
  /* Combine the member names into a single data set        */
  /* the colon notation matches all files with "_contents" prefix */
  data &outlist.;
    set _contents:;
  run;
 
  /* cleanup temp files */
  proc datasets lib=work nodetails nolist;
    delete _contents:;
    delete _zipfiles;
  run;
 
%mend;

Use the macro like this:

%listzipcontents(targdir=c:temp, 
 outlist=work.allfiles);

Here's an example of the output.
zip file contents within the target directory

Experience has taught me that savvy SAS programmers will scrutinize my example code and offer improvements. For example, they might notice my creative use of the REVERSE function and "=:" operator to simulate and "ends with" comparison function -- and then suggest something better. If I don't receive at least a few suggestions for improvements, I'll know that no one has read the post. I hope I'm not disappointed!

tags: FILENAME ZIP, SAS programming, xcmd, ZIP files

The post List the contents of your ZIP and gz files using SAS appeared first on The SAS Dummy.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)