SAS tips

Peek at your data using VBScript, OLE DB, and the SAS local data provider

 ole db, sas programming, SAS tips  Peek at your data using VBScript, OLE DB, and the SAS local data provider已关闭评论
11月 042010
 
Let's pretend for a moment that you don't have SAS installed on all of your office computers. But you've got some great content locked away inside SAS data sets. Is there a way to get to the content of that data, without having to push the data through a SAS tier?

There is a way! All that you need is the SAS OLE DB Local Data Provider and some programming know-how. You can download the provider from SAS. And the know-how? The basic recipes are in the SAS 9.2 Providers for OLE DB Cookbook. (There's a version for SAS 9.1 as well.)

Note: there are limitations when using the Local Data Provider. Because there is no SAS session in this mix, you don't get the benefit of SAS data services. For example, it can't handle your user-defined formats. You might sacrifice precision for some numeric values. You don't have the full cross-platform support that SAS provides.

Here is an example script that verifies that the provider is installed, opens a data set file, and reports on the row count and column names/types. It's a very simple example that doesn't get very fancy and doesn't include proper error checking, but I hope that it shows you the possibilities.

To run the example on your Windows machine:

  1. Copy the program into a local file (for example, c:\datatest\ShowColumns.vbs).
  2. Change the filename and path variables in the script to point to data files on your machine.
  3. Open a command prompt window.
  4. Run the example program with cscript.exe. For example:

    cscript.exe c:\datatest\ShowColumns.vbs
     
    Note that this can work with the 32-bit or 64-bit versions of the SAS OLE DB providers. If you are on a 64-bit machine and you want to use the 32-bit provider, be sure to run the 32-bit version of cscript.exe:

    %windir%\SysWow64\cscript.exe c:\datatest\ShowColumns.vbs
     
Here's the program:

' Change these to report on your data set
' Change path to the directory where your data set resides
' Change filename to the root name of the sas7bdat file (no extension)
path = "C:\Program Files\SAS\EnterpriseGuide\4.3\Sample\Data"
filename = "Candy_Sales_Summary"

' Check registry for SAS Local Provider
Set WSHShell = CreateObject("WScript.Shell")
clsID = WSHShell.RegRead("HKCR\SAS.LocalProvider\CLSID\")
WScript.Echo "DIAGNOSTICS: SAS.LocalProvider CLSID is " & clsID
inProcServer = WSHShell.RegRead("HKCR\CLSID\"  & clsID & "\InprocServer32\")
WScript.Echo "DIAGNOSTICS: Registered InprocServer32 DLL is " & inProcServer

' Constants for ADO calls
Const adOpenDynamic = 2
Const adLockOptimistic = 3
Const adCmdTableDirect = 512

' Instantiate the provider object
Set obConnection = CreateObject("ADODB.Connection")
Set obRecordset = CreateObject("ADODB.Recordset")

obConnection.Provider = "SAS.LocalProvider"
obConnection.Properties("Data Source") = path
obConnection.Open
obRecordset.Open filename, obConnection, adOpenDynamic, adLockOptimistic, adCmdTableDirect

 'Report on Fields in this data set
WScript.Echo ""
 WScript.Echo "Opened data " & filename & ", Record count: " & obRecordset.RecordCount
 For Each Field In obRecordset.Fields
   If Field.Type = 5 Then pType = "Numeric"
   If Field.Type = 200 Then pType = "Character"
   WScript.Echo Field.Name & " " & pType
 Next

obRecordset.Close
obConnection.Close
 
Output from my example:

Microsoft (R) Windows Script Host Version 5.8
Copyright (C) Microsoft Corporation. All rights reserved.

DIAGNOSTICS: SAS.LocalProvider CLSID is {0221264D-F909-4872-B7E6-B108D3FC8E8B}
DIAGNOSTICS: Registered InprocServer32 DLL is C:\Program Files\SAS\SharedFiles\SASProvidersForOLEDB\sasafloc0902.dll

Opened data Candy_Sales_Summary, Record count: 15000
Name Character
Region Character
OrderID Numeric
ProdID Numeric
Customer Numeric
Type Character
Product Character
Category Character
Subcategory Character
Retail_Price Numeric
Units Numeric
Discount Character
Date Numeric
Fiscal_Year Character
Fiscal_Quarter Character
Fiscal_Month_Num Numeric
Sale_Amount Numeric
 
10月 232010
 
The question came up on the SAS Enterprise Guide discussion forum: which do you prefer, List Report Wizard (PROC REPORT) or Summary Tables (PROC TABULATE)? And as with most SAS-related questions, the proper response is: "it depends." If you put these two PROCs in the ring with a Google Fight, PROC REPORT is a clear winner. Cynthia@SAS delivered a very thoughtful answer. She first carefully separated out the "point-and-click" versus "code-it-yourself" concerns, highlighting the capabilities and limitations of the tasks within SAS Enterprise Guide. Then, for the people who want follow a programming approach, Cynthia offers these insights:
But for the beginner, I think the biggest issue that I've heard about (and it surprised me) was whether they wanted the box area in the upper left hand corner of the table. For some folks, it doesn't matter that the BOX is there -- but I've had some students switch to PROC REPORT just because they didn't like the big empty box -- seriously -- the BOX area was a deal breaker for them -- they wanted the look that they got with PROC REPORT. The second issue that I've heard about is changing the code -- some people find the PROC REPORT code easier to understand than the TABULATE code -- if and only if they have to change the code. The TABULATE code is more abbreviated and "algebra-like" with all the * and = in the TABLE statement. Even though the REPORT code is more verbose -- the existence of a COLUMN statement makes it clear which variable will come first on the report row, which second, which third, etc, etc. And then the explicit DEFINE statement makes it easy to see what statistic will get used for a numeric variable, the usage for a category variable (ORDER, GROUP or ACROSS).
So, as you can see, sometimes it actually does come down to a "BOXing" match. <groan...sorry>

Tuning your Windows 2008 environment for SAS

 file cache, SAS tips, SGIO, technical support paper, tuning, Windows 2008  Tuning your Windows 2008 environment for SAS已关闭评论
10月 082010
 
If you're an administrator of systems that run SAS, you have probably already read up on How to Maintain Happy SAS 9 Users. If not, go read that first. Now, are you looking to squeeze the best performance out of SAS on Microsoft Windows 2008 Server? Be sure to read this latest paper with additional configuration and tuning guidelines. The paper describes how the Windows operating-system file cache can have a big impact on performance, especially under large I/O workloads. There are some tuning steps that you can apply; they are a combination of Microsoft Windows updates, configuration settings, and SAS system options. The paper is the result of collaboration among SAS Technical Support, SAS R&D, and SAS customers, and Microsoft. It contains lots of good advice, plus links to related papers for further details.
10月 012010
 
Since its 4.2 release, SAS Enterprise Guide has been able to import Microsoft Excel 2007 and 2010 spreadsheet files (usually encountered as .XLSX files). But while SAS Enterprise Guide can export XLS files (which are compatible with all versions of Microsoft Excel), it does not have the ability to export to a native .XLSX file. Customers see this as a limitation. The Microsoft Excel 2007 format allows for more rows and columns than traditional XLS files can support, and SAS Enterprise Guide does not offer a point-and-click method to create output results in this new format. In SAS 9.2, you can use PROC EXPORT or the EXCEL LIBNAME engine to write data in this new format, but using PROC EXPORT requires the customer to write a custom SAS program, and both PROC EXPORT and the EXCEL LIBNAME engine require you to license and install SAS/ACCESS to PC Files. New Custom Task: Export Data to Microsoft Excel 2007/2010 To help customers to work around this limitation, SAS R&D has created a custom task that focuses on one thing: export your SAS data to a Microsoft Excel 2007 (.XLSX) file. Here is an example screen shot of the task:
The task supports the following features/options:
  • Works only with SAS Enterprise Guide 4.3
  • Export to XLSX or XLS file
  • Specify the name of the worksheet in the exported file
  • Specify which columns to include in the exported file
  • Specify the output format (as a SAS format) for each column. This is especially useful for applying special treatment to date and date-time values to preserve the correct format in the spreadsheet.
  • Optionally place field headings on the first row of the exported sheet. You can also use labels instead of variable names.
  • Optionally create multiple worksheets if the row count exceeds the maximum supported by Microsoft Excel
  • Optionally create multiple worksheets if the column count exceeds the maximum supported by Microsoft Excel
Note: This task uses "native" technology to do its work: SAS/ACCESS to PC Files is not required. However, this also means that the task does not generate a SAS program that can be run independently outside of SAS Enterprise Guide or within a stored process. Download the custom task from this location (ZIP file, 473KB). Read the README PDF file in the ZIP archive to learn how to install and use it with SAS Enterprise Guide 4.3.

What to do IN BETWEEN dates

 data step, in operator, sas programming, SAS tips  What to do IN BETWEEN dates已关闭评论
9月 302010
 
I saw a suggestion arrive from a SAS customer who would like to see the IN operator extended to allow ranges of date values. For example, you can currently write a program that checks for values IN a collection or range of numbers:

data check;
  if x in (1:10) then result="match";
run;
 
This matches on the set of numbers 1 through 10, inclusive. The customer would like to see something like this supported, to match on the dates that fall within a given range.

data check;
   if x in ('01JAN2010'd : '01FEB2010'd) then result="match";
run;
 
It's a great suggestion. But in the meantime, you can satisfy this simple example by using the shorthand for the AND operator:

data check;
  if '01JAN2010'd <= x <='01FEB2010'd then result="match";
run;
 
There are many other ways to "skin this cat", including using PROC SQL and the BETWEEN condition. What's your favorite?