A few months ago I released the Copy Files task for use with SAS Enterprise Guide. The task allows you to transfer any files between your PC and a SAS Workspace session, much like an FTP process. It doesn't rely on FTP though; it uses a combination of SAS code, Windows APIs, and SAS Integration Technologies to get the job done.
It's proven to be a very popular task, because it can be useful in so many situations. It even earned a mention in a SAS Global Forum paper this year (and no, it wasn't a paper that I wrote).
Today I'm going to point out the things that the task doesn't do so well. Or at least, that it didn't do well until I made some updates. My changes were based on two "complaints" from several SAS users.
Read on for the details. But if you don't care and you just want the latest version of the task, you can download it from here.
Complaint #1: Wildcards that are a little too "wild"
The task allows you to use wildcard characters in your file specifications so that you can match multiple files to transfer. A problem occurs though, when your file specification looks like this:
Can you guess the problem? What if I told you that the task stores your file specification in a SAS macro variable? Yep, it's that "/*" sequence in the value that trips things up, because SAS interprets it as the start of a comment. Left unchecked, this sabotages the remainder of the SAS code that is included in the process.
The SAS macro experts are already shouting out the answer to fix this: use %STR to wrap the slash and "hide" the token from the SAS parser. That's a great idea! Except that the task relies on the SAS "internal" value for this value --and not the displayed value -- when it comes time to process. These values are different when %STR wraps a special character like the forward slash. The macro facility changes out this character with a hexadecimal character called a delta character.
To illustrate, I used another popular custom task -- the SAS Macro Variable Viewer -- to show the inner value of a SAS macro variable:
Notice the funky arrow characters. Is that what you were expecting?
Now the task detects the presence of a forward slash (and some other special characters) and will automatically add the %STR so you don't have to. (But you can still use %STR if you want to.) And it correctly detects the delta characters, if present, to convert them back to their correct form before trying to use the value.
Complaint #2: Fixing line-ending characters but breaking other stuff
Users of FTP might be familiar with binary versus ASCII mode for file transfers. Because UNIX line-endings are different than Windows line-endings for text files, transferring a file in ASCII mode helps to ensure proper line-ending behavior for the target host.
The Copy Files task transfers ALL files using a binary mode. Why? Because in today's global workplace even text-based files often don't adhere to the limited English-centric ASCII standard. Attempting a text-based file transfer could result in encoding mismatches, so it's much safer to transfer content as "binary blobs".
But you still want your text files to have the proper line endings for the target host. To answer that, the Copy Files task offers a "Fix line-ending characters" option that does the following:
- Scans the file to determine whether it's a text file. (This relies on the file content and not on special file extensions such as .TXT or .CSV.).
- Rewrites the file and replaces the line-ending characters as needed for the target file system (Windows or UNIX).
The problem was that in rewriting the file (using Windows-based StreamReader and StreamWriter functions), the Copy Files task was changing the file encoding to UTF-8. That encoding works fine on Windows and most users didn't even notice. But some users sent me output from file dump tools and comparisons that showed the byte-order mark characters that were added to the file. (SAS users: I knew I could count on you!)
To address this, I changed the "fix line endings" process to use lower level I/O functions that simply scan through the text files as a binary stream, byte-for-byte, and change the line endings as needed. Trying to decide on proper encoding is risky business, so I decided to leave the character encoding untouched.
In addition to my own testing, a couple of users out there have confirmed that my changes fix the issues -- at least for now. Thanks for that! If you want to try the latest, get it now from here: