3月 302016

EnvironmentManagerThe SAS Environment Manager Report Center is a set of SAS stored processes and SAS macros which leverage the SAS Data Mart for the purpose of monitoring and auditing a SAS installation. Full documentation on the structure and functioning of the Data Mart can be found in the SAS Environment Manager 2.5 User’s Guide. This blog discusses functionality available as of the SAS 9.4 M3 release.

The intent of the Report Center is to provide most of the reporting that a SAS administrator might need “out of the box.” However, there’s lots of customization possible without doing any coding at all. You do, however, need to know something about your data, and, it helps if you’ve also got some knowledge of SAS reporting and graphing procedures. Those procedures would include PROC TABULATE, PROC TEMPLATE (for plots), PROC GPLOT, and PROC REPORT. This blog will explore a few things I’ve discovered about using the parameters to produce customized reports. In later blogs I’ll discuss using some of the reporting templates provided.

The first step is get a feel for the organization of the report center. Here’s a table to summarize what’s in it:

Opening Up the SAS Environment Manager Report Center

Often the first portions of the Report Center to be exploited are these three:

  • ARM Performance Reports
  • Audit Reports
  • ACM Reports

So, we’ll use these three as examples.

If you open ARM Performance Reports, you’ll find all the data about SAS artifacts, how much they are being used, and who is using them. SAS artifacts include such items as Stored Process Servers, Workspace Servers, tables, cubes, stored processes, OS directory usage, SAS procedure usage, and stored process times elapsed. They also report which users are using which types of SAS artifact. You can get Workspace Server load distribution by hour, and number of workspace server sessions broken down by day. This is a great first step for a SAS administrator to know what’s going on in his/her production deployment.

As for customizing, you’ll see this menu for all the ARM Performance Reports (and nearly all other reports as well).  You get to choose:

1.     Number of observations to print in report

2.     Subsetting, by begin/end date of data, WHERE clause

3.     Look and feel:  output format, ODS style

4.     Debugging options: show SAS log, macro calls/parameters, and macro generation (using the debug level selector)

Opening Up the SAS Environment Manager Report Center2

Careful: The WHERE clause is free-form, so you must know your variables names, types, and reasonable or valid values, or you will get errors.

The next interesting collection of reports in the Report Center is the Audit Reports, where you’ll find reports on all the events that could be scraped from SAS logs, such as:

  • Authentication errors
  • Group membership changes
  • Changes in user accounts
  • Other metadata changes
  • Client connections by userID
  • Most recent login for each userID

The most complex set of data is reported on by the ACM Reports (Agent Collected Metrics). This encompasses all the data that the SAS Environment Manager collects in its normal operations. As you would expect, a great deal of this data is the same as what you’d see when examining the metrics in SAS Environment Manager using the Resources->Browse menus.

ACM reports typically provide far more parameter options beyond the usual Titles, Footnotes, Date/Time ranges, and WHERE clause. This makes them far more customizable once you’re familiar with the additional parameters. For example, you typically have a prompt screen like the following:

Opening Up the SAS Environment Manager Report Center3

You can choose your data set, variables to display, many plotting options, X-axis and Y-axis variables and labels, BY groupings, layout and axis options, and many more. Plus, of course, the standard selectors for time/date, WHERE clauses, Titles and Footnotes.

So the lesson here is:  Know your data, know something about plotting/graphing in SAS and what the terms mean, be willing to experiment and have a little patience. With these assets you can get literally hundreds of great tables and graphs.

tags: SAS Administrators, SAS Environment Manager, SAS Professional Services

Opening Up the SAS Environment Manager Report Center was published on SAS Users.

3月 292016

Visual Analytics audit data collectionIn a couple of my previous blogs I discussed how to audit who made changes to data in a SAS environment. In the last couple of weeks I have been asked how to do the same thing for SAS Visual Analytics reports and explorations.  The Visual Analytics administrator overview report doesn’t answer the question, it deals mainly with usage of LASR servers and data. The same can be said for the VALASR kit reports that are available in SAS Environment Manager.

These two assets can help answer usage questions like what in-memory tables are being used and by whom, but not who is making changes to reports and exploration.

Looking at SAS Environment Manager Service Architecture report center there is not a report that will directly answer the question either. However, the good news is, the data to answer the question is being collected and stored in the Environment manager data mart.

EV Service Architecture enables the Audit.Meta.Updates logger on the metadata server, the result is metadata server Audit logs record, for all public objects:

  • Who modified the object
  • Date and time of the modification
  • Was the object added, deleted or updated

The audit logs are located in <config>Lev1SASMetaMetadataServerAuditLogs and start with the string Audit.

Here is an example of a line from the log:

2016-02-05T13:58:56,892 INFO [00016303] 573:henri@meta01 – Audit Public Object Type=SAS report (2G) Name=AnnualSalarybyDepartment ObjId=A5FMYBKR.AX00006Q has been updated.

NOTE: public objects are loosely defined as objects that are modeled in metadata and participate in the Information Service and/or Logical Object frameworks. Many, but not all, public objects live in the SAS Folders tree structure.

The SAS Environment Manager log parsing, collection and ETL process reads the information from the metadata audit logs and stores it in the environment manager data mart. The data is stored in the ARTIFACT.AUDIT_TRANSACTIONS table.

From there, if the feed to Visual Analytics is enabled, the data will be fed to the EVDMLA.AUDIT_TRANSACTIONS in-memory table, where you can use Visual Analytics Explorer or Report designer to view the audit trail of a report, exploration or any public object.

The columns of interest are:

  • Active Userid: the user who made the change.
  • Audit Record Event: the event that occurred, for changes to public objects the relevant events are updated, added and deleted.
  • Audit Record Type: there are multiple types of audit record in the table e.g. Login, access control change etc. The one we are interested in is AuditPublicObject which records changes made to public objects.
  • ObjectID: the metadata object id (can be used to get more information about the object from metadata)
  • Object Type: the type of the public object, for visual analytics reports that is “SAS report (2G” and “Visual exploration”
  • Startdt: the date and time that the event happened.
  • Identity Name:  the name of the object in metadata.

Visual Analytics Reports and Explorations

NOTE: often you will see multiple records for an update with a slightly different timestamp. This is because one save from the interface can generate multiple updates on objects in the metadata server.

The exploration below shows a graph and table displaying what audit events have occurred for Visual Analytics reports and explorations. The table shows, for each report and exploration, an audit trail of who updated the report and when the update occurred. If an object is deleted that will also be recorded. Unfortunately the detail of what changed e.g. a column was added, or a calculation was changed, is not recorded.

Visual Analytics Reports and Explorations2

In another example, this basic Visual Analytics report allows an administrator to select a report or exploration and lists the update history for the selected object.

Visual Analytics Reports and Explorations3

The bottom line, with SAS Environment Manager service architecture setup, the data mart contains information about who is adding, updating and deleting public objects in SAS metadata. If the feed to Visual Analytics is enabled, VA can be used to report an audit trail for reports and explorations.


Who changed my SAS Visual Analytics reports and explorations? was published on SAS Users.

3月 232016

VisualAnalyticsHTML5You may have noticed that when using date information in SAS Visual Analytics, that the date data values can be displayed in a variety of ways. You may see your dates displayed like Jan1916, 03Jun1915, or 03/12/16, for example. In this blog I’ll help you understand SAS date and time values, and review some of the features of SAS Visual Analytics relating to date and time values.

To help illustrate some of the date features in Visual Analytics Designer, I’ve created a SAS table containing multiple date columns, each containing the same data values, but having different associated formats, along with a datetime column and a time column with associated formats. For analysis purposes, there is also an amount column.

The following figure shows how the data is stored and how the data values would be represented in a simple printed report, for example.

Date Formats in Visual Analytics Designer

How date and time formats are used in designer

When the table is loaded to memory and added to a designer report as a data source, the columns with date, datetime, and time formats are interpreted as date, datetime and time category data items, as shown in the next figure. Notice that the date column with no format, however, is interpreted as a numeric measure data item.  The formats in designer are as shown below.
Date Formats in Visual Analytics Designer2

This is a list table showing how the values display with the associated formats.
Date Formats in Visual Analytics Designer3

So what are those numbers in the date column all about?

The numbers in the date column represent a SAS data value with no format associated.  This means it is simply a numeric value representing ‘number of days’.  Date values are stored internally as numbers in order to enable calculations on dates, such as the number of days between two dates.
A SAS date value is a numeric value representing the number of days between January 1, 1960 and a specified date.

A SAS datetime value represents the number of seconds between January 1, 1960 and a specified date and time.

A SAS time value represents the number of seconds since midnight of the current day. These values are between 0 and 86400.

For SAS tables, however, date values can be displayed using a variety of different formats (almost 100!)  There are numerous formats available for displaying datetime and time values also.  You are able to take advantage of many of these formats in SAS Visual Analytics—because, after all, who really wants to sit around counting the number of days in order to interpret date values in a report?

Changing the date and time formats in designer

When changing the format for any of the date data items, you can choose from this list:
Date Formats in Visual Analytics Designer4

Changing the format for the datetime data item displays this list:
Date Formats in Visual Analytics Designer5

A data item with time information stored can use these formats:

Date Formats in Visual Analytics Designer6

Another problem to consider:  Suppose your date information was loaded to memory without an associated date format at all, in the case of the date column in the data above.  This is column is interpreted in designer as a numeric data value.  You can solve this problem easily by creating a new calculated data item using the TreatAs Numeric(advanced) operator.  The operator enables you to specify a calculation that treats a value as a different data type.  In this case, the expression below specifies that the date data item should be treated as a date value.  The values of the new date(converted) data item are shown below.  You can also then change the format of the new date(converted) data item if you wish.
Date Formats in Visual Analytics Designer7

Date hierarchies in designer

In designer you can create a default date hierarchy from any date or datetime data item whose format displays a year by simply right-clicking on the data item and selecting the Create Date Hierarchy option.  A date hierarchy consists of year, quarter, month, day.  To create a time hierarchy right-click on a time data item and select Create Time Hierarchy. The time data item results in a hierarchy of hour, minute, second.

These hierarchies are the default hierarchies created for the date1, date3, and time, and datetime data items from the example data above.
Date Formats in Visual Analytics Designer8

Note:  You may have noticed that, in explorer, for a datetime item, you have the option of creating a Date and Time Hierarchy for that data item, consisting of Year, Quarter, Month, Day, Hour, Minute, Second.  The Date and Time Hierarchy is not available in the designer. So the default hierarchy for the datetime data item above consists of only Year, Quarter, Month, Day—the same as for the date1 and date3 hierarchies.

If the format associated with a date only displays Month, (the MONTH. Format associated with date2, for example), then in designer, you won’t be able to create a hierarchy from the data item until you change the format to one that displays year also.   If a hierarchy is not supported for a data item, then the Create Hierarchy option is not displayed when you right-click on the data item.

Derived data items for measure based on data or time intervals

Having data items whose format displays a year value also enables you to create derived items for reports that are based on time intervals. When you click on one of the intervals below, a selection list of data items display.  The only data items that are supported for the intervals are those that contain a year—so that’s data1, data3, and datetime in this data.
Date Formats in Visual Analytics Designer9

The formulas for these derived items use periodic operators that aggregate values over time: RelativePeriod, ParallelPeriod, and CumulativePeriod.

The special derived data items feature creates the formulas for these aggregations for you, but the same Periodic operators are also available for creating your own New Aggregated Measures:
Date Formats in Visual Analytics Designer10

Some of these derived columns are illustrated here using data that contains a Date by Month data item and an Expenses data item.  The data spans 2010-2011.
Date Formats in Visual Analytics Designer11

Date Formats in Visual Analytics Designer12

You can edit each of the derived data items to view the calculation and to make changes, if necessary.

The calculation for Expenses(Difference from Previous Parallel Period) is shown below, as an example.

Date Formats in Visual Analytics Designer13

_Sum_ is the aggregation that is applied to the measure, Expenses.

The data item for the periodic calculation is Date by Month.

The inner interval (smaller time period) for the aggregation is _Inferred_ , which means it has been determined from the report object. You could change to specific values of _ByMonth_ , _ByQuarter_ , or _ByYear_ .

The outer interval (larger time period) is also _Inferred_ , with other possible values being _ByMonth_ , _ByQuarter_ , or _ByYear_ .

The number of outer intervals to offset from the current period is 0. This means that the period from the current outer interval (Year) is used.

The scope for the period is _Full_ , meaning that the values are aggregated for the entire period. Alternate values are _ToDate_ (up to a specific day in the current period), or _ToToday_ (aggregates only up to the equivalent of today’s position in the current interval).  If you change the scope to _ToDate_ , you would need to specify a specific date in the last field.

So considering the above descriptions, for the list table, the expression below would produce the same resulting values for Expenses(Difference from Previous Parallel Period).
Date Formats in Visual Analytics Designer14

Periodic operators can return missing values under certain scenarios.  These scenarios, along with complete details on all of the date operators are documented Appendix 5 of the SAS Visual Analyics 7.3: User’s Guide.

Hopefully, this blog provides you with a better understanding of the basics of dates and times, and enable you to use some of the great features for dates in SAS Visual Analytics.








tags: SAS Professional Services, SAS Programmers, SAS Visual Analytics

Making Best Use of Date Formats in Visual Analytics Designer was published on SAS Users.

3月 172016

SAS Global ForumIn my last post, Expert tips for first-time attendees of SAS Global Forum 2016, I asked members of the SAS Global Users Group Executive Board what advice they had for users attending SAS Global Forum for the first time. In this post, they share tips for returning attendees.

Each year, approximately 50% of SAS Global Forum attendees are first-timers. With attendance expected to exceed 5,000 for this year’s conference in Las Vegas, that means the other half – literally thousands of users – will be returning for at least their second conference experience. Some have made it a yearly ritual. Dr. Herbert Kirk, for example, will be attending his 37th SAS Global Forum in Las Vegas in April. Kirk is the former Vice President of SAS Education and currently an Executive in Residence at the NCSU Institute of Advance Analytics. He’s just one of several dozen users who have attended at least 30 conferences. Hundreds more will be attending SAS Global Forum for at least the tenth time.


Herbert Kirk

“Even though I’ve used SAS for more than 40 years, I always learn something new at SAS Global Forum,” said Kirk. “It’s amazing to see the innovative ways SAS is being used around the world. Plus, I get to meet a new generation of users and learn about the impressive things they’re doing with SAS. The conference really is a must for SAS professionals.”

With so many SAS users making SAS Global Forum an annual part of their career development, I asked the SAS Users Group Executive Board why so many users return year after year.

Here are some of their responses:

Stay fresh and innovative

“In our daily jobs it is hard to keep from getting stale,” said SAS Global Forum 2005 Chair Greg Nelson. “The conference provides an opportunity to stay fresh, network with others and spur innovation in our own ways of thinking. You just can’t get that from a book, blog or article.”

SAS Global Forum 2015 Chair Tyler Smith said the conference gives longtime users a current view of the ever-evolving world of analytics. “These days are not your parents ANOVAs or Logistic Regressions! Staying in tune with changing skills, capabilities, enablers, and key figures are critical for the development of all levels of expertise.”

Debbie Buck, chair of the 2011 conference, may have summed it up best. “I think that keeping current is the highest priority for users in attending SAS Global Forum. Learning how to utilize applications for non-obvious solutions are definitely something that users may not realize.”

Get motivated

Art Carpenter, chair of the 2003 conference, said SAS Global Forum is a great place for longtime SAS users to get revitalized. “The conference gives you a chance to rejuvenate and to explore the wealth of SAS knowledge that is shared by SAS users. Many of the most experienced programmers and users of SAS attend SAS Global Forum, and you can meet and learn from them all.”

SAS Global Forum 2014 chair Marje Fecht said each year brings together a unique set of users and learning opportunities. “SAS Global Forum has always provided an excellent opportunity to learn, to share, and to network. The collective business and technical knowledge of thousands of SAS users means that you will come home with new ideas, experiences, and skills each and every time you attend.

Learn what SAS has planned for the future

Vice President of SAS Publications Kathy Council said SAS Global Forum is a great place for returning attendees to learn about new SAS products, services and program. “SAS is a dynamic player in analytics and we’re constantly responding to the needs of our customers. It is very important to keep up with what SAS plans for the future.”

Frank Fry, 2001 chair, agreed with Council. He said The Quad, a space that invites attendees to gather, mingle, share and learn from their SAS peers, is the perfect place to learn about what SAS and its partners have planned for the future. “You can get valuable information from SAS staff and other attendees on the Quad.”

Continue to build your professional network

“Networking!” said the 2000 Chair Nancy Wilson when asked why attendees would want to attend SAS Global Forum each year. “When I am facing a new coding challenge the SAS users I met at conferences become resources. The ingenuity of user contributions are brilliant examples of real business uses of the SAS tools – and their success are published for us to leverage!”

“Networking and content are two things that keep me coming back year after year,” said Robert Fecht, the 2007 SAS Global Forum chair. “Select key presentations that you want to attend, and try to participate in all the networking activities that the location has to offer.”

Rick Mitchell, chair of SAS Global Forum 2013, encourages returning attendees, who often come for the content, to try to attend as many non-presentation activities as they can. “Network with people! There are so many people with different backgrounds who can give you new perspectives and learning experiences that may change your life.”

See old friends

Fry offered from parting advice for returning attendees: “Don’t forget to reconnect with old friends. I have made some real friends where we can share tips or other information. Some that I met in the 80's still connect by email, text or phone. What a great experience this has been.”

Lori Griffin, 2009 chair said SAS Global Forum can feel somewhat like one big family reunion: “This is one venue that has a family feel which fosters learning and networking.”

If you’re planning to attend your 2nd (or 40th) SAS Global Forum we’d love to hear what advice you’d give other returning attendees. Feel free to share your thoughts in the comments below or go to our SAS Global Forum 2016 community site, a great place for attendees to learn the latest conference news and connect with others planning to attend.


tags: SAS Global Forum

Expert tips for returning attendees to SAS Global Forum 2016 was published on SAS Users.

3月 122016

ProblemSolversBeing able to access information about your operating system from within SAS can be incredibly useful. For example, if you have a directory that contains an unknown number of CSV files that you would like to read into SAS, you certainly would not want to have to write an IMPORT procedure for each of the files. It would be great if SAS just knew the names of all the files as well how many there were. That would make this task much less tedious.

Fortunately, there is a straightforward way to accomplish this task within SAS. One of the easiest ways to gather information from your operating system from within SAS is to use a pipe (|). Pipes enable your SAS application to do the following:

  • Receive input from any operating-system command that writes to standard output.
  • Route output to any operating-system command that reads from standard input.

In other words, a pipe can be used to capture the output of an operating-system command and put that output into SAS, which gives you the ability to easily run PROC IMPORT for each CSV file in your directory.

Before we take a look at an example, let’s look at the syntax needed to use a pipe. To use a pipe on a directory-based operating system, you just need to issue a FILENAME statement with the following syntax:

FILENAME fileref PIPE 'operating-system-commandoption-list;

Here is additional information about the syntax above:

fileref: Can be any valid fileref, as described in Referencing External Files.

PIPE: Is the device-type keyword that tells SAS that you want to use an unnamed pipe.

‘operating-system-command’: Is the name of an operating-system command, executable program, or shell script that you want to route output to or that you want to read input from. You must enclose the command in quotation marks.

option-list: Can be any of the options that are valid in the FILENAME statement, such as the LRECL= or RECFM= options.  Options that affect the operating-system command can also be listed here.

So let’s take a look at a pipe in action.

Using a pipe to return the output of an operating system command to SAS® software

In the following example, I have an unknown number of CSV files in the MYFILES subdirectory. I want to read in each file and create an output data set that has the same name as the CSV file. Also, I am only interested in the files that were created within the last year.

To do this import, I am going to use a pipe, which enables me to use the DIR operating-system command to list all the files in the directory. The output of the DIR command shows the filename and its create date. Based on the create date, I can pass in each file =name to a macro that contains PROC IMPORT. Then, PROC IMPORT reads in each file and outputs it to a data set with the desired name.

Here is the code:

Note: The code below contains numbers that are enclosed in parentheses; these numbers correspond to further explanation below the code block.

%let mydir=c:myfiles;       (1)                                                                                                             
%macro mymac(fname,dsname);   (2)                                                                                                          
proc import datafile="&amp;mydir&amp;fname" out=&amp;dsname                                                                                        
dbms=csv replace;                                                                                                                       
filename myfiles pipe "dir &amp;mydir /T:C";   (3)                                                                                                  
data _null_;                                                                                                                            
  infile myfiles truncover;                                                                                                             
  input;      (4)                                                                                                                          
  if index(lowcase(_infile_),'.csv') then do;    (5)                                                                                       
    date=input(scan(_infile_,1,' '),mmddyy10.);  (6)                                                                                       
    fname=substr(_infile_,38);                   (7)                                                                                                        
    dsname=compress(tranwrd(strip(scan(fname,1,'.')),'_',' '),,'p');  (8)                                                                 
    dsname=tranwrd(strip(dsname),' ','_');                                                                                              
    if intck('year',date,today())&lt;=1 then do;    (9)                                                                                         
      call execute(call);                        (10)                                                                                       

  1. The MYDIR macro variable lists the directory that contains the CSV files.
  2. The MYMAC macro contains the PROC IMPORT code needed to read in the CSV files. The macro contains the parameters FNAME and DSNAME. FNAME is the name of the CSV file being read in, and DSNAME is the name of the output data set being created.
  3. The FILENAME statement uses a pipe, which enables me to use the DIR operating-system command. This command shows the filename and the create date. Note that the ‘/T:C’ option is needed to show the create date. The default date returned by the DIR operating-system command is the last-modified date.
  4. The INPUT statement does not list any variables, so it loads the entire record into the input record buffer.
  5. _INFILE_ is an automatic variable in the DATA step that contains the contents of the current input record buffer. This buffer is created by the INPUT statement. In this example, I use the INDEX function to determine if the file has a .csv extension.
  6. When you run the DIR command, dates are always the first value listed. I use the SCAN function to extract the date and the INPUT function to convert the extracted date to a SAS date format.
  7. Before you can extract the filename, you have to search the output from the DIR command to find the column number that contains the filename value. In this example, the filename starts in the 38th column of the output file. The SUBSTR function specifies this column number and enables me to extract the filename.
  8. DATA-step logic is used to create the output data-set name. To make sure that there are no invalid characters in the filename like spaces and punctuation other than an underscore, I use the TRANWRD and STRIP functions and specify what characters should not be included.
  9. I use the INTCK function to select values that contain a create date that are a year or less from today’s date before I execute the macro.
  10. CALL EXECUTE invokes the MYMAC macro. CALL EXECUTE enables me to pass in the values of FNAME and DSNAME as the values of the macro’s parameters and conditionally execute the macro based on DATA-step logic.

As this example shows, using a pipe to capture the output of the DIR operating-system command can be extremely useful. In this example, it enables me to dynamically run code based on an unknown number of files that meet a certain condition. Using a pipe to get the file information for you can drastically reduce the amount of time it would take to perform this task manually.



tags: Problem Solvers, SAS Programmers

Using a pipe to return the output of an operating system command to SAS® software was published on SAS Users.

3月 082016

DataFlux Data Quality profiling metrics in SAS Visual AnalyticsIn this post, I continue the journey of getting data profiling results into SAS Visual Analytics. In my first blog I described the process of collecting DataFlux Data Quality profiling metrics to load a datamart. Now we load this datamart into memory (LASR) and then plug a VA report on top to share data quality results widely and wisely.

Overall Process

The process is a 2-step process.

1.     Extract data from DataFlux repository and build a DQ profiling datamart (discussed in blog 1)

2.     Load the DQ profiling datamart into memory and plug VA reports on it

Here is the master DataFlux Process Job (called _loadProfilingInVA) that orchestrates the 2 steps.

Data Quality profiling11

The DQ profiling datamart is made of five tables which are SAS data sets. The goal is to load them into LASR. As we did the first step using a DataFlux process job, we will use another process job to do this step and to have a consistent process that we could orchestrate in a master DataFlux process job, seen above.

Loading data into LASR is very simple, especially if we assume that those tables have been loaded the first time manually in SAS Visual Analytics Administrator, in order to create all the necessary metadata (tables, jobs, etc.). If this is not the case, it’s not a big deal to do it automatically using proc METALIB. Once this initial step is done, we just want to refresh these tables in memory each time we have new data profiling results to share. That means, we have to unload old data from memory and then load new data in memory.

Even if we want to do that in DataFlux Data Management Studio, we require a SAS engine somewhere to execute SAS statements. DataFlux DMS does not provide nodes to load data directly in LASR. But it provides nodes (available in process jobs, not in data jobs) to execute SAS code against an existing SAS Workspace Server.

Data Quality profiling12

My technical environment is quite simple: everything on the same box: DataFlux Data Management Studio, LASR, VA, SAS Workspace Server, DQ profiling datamart SAS data sets, etc. No risk! But, in many SAS environments, it may not be as simple. The following questions might arise:

  • Where are my DQ profiling datamart SAS data sets? They are on the same box as DataFlux Data Management Studio/Server, which is probably not the same box as the SAS Workspace Server.
  • Is my SAS Workspace Server able to read those remote tables? Probably a data transfer is needed.
  • Is my SAS Workspace Server able to load data into LASR? I know there are limitations when the SAS Workspace Server is not located in the same box as the LASR master node but I won’t go into details, some resources are already available to deal with this.

Back to my basic mono-box configuration. Here is an example of how to setup the SAS Workspace Server to be called from DataFlux Data Management Studio/Server:
Data Quality profiling13

Here is the SAS code to unload data from memory:

Data Quality profiling14

And here is the code to load data into memory:

Data Quality profiling15

Now, I’m done with the data preparation phase. I’ve executed my master DataFlux Process Job (called _loadProfilingInVA) which:

  • Extracts data profiling results from the DataFlux repository
  • Creates a set of 5 output tables
  • Unloads the 5 old tables from LASR
  • Reloads the 5 new tables into LASR

I’m ready for the reporting phase, and I have prepared a VA report with multiples sections that leverage this DQ profiling datamart. Let’s have a look to what it looks like:

An amazing welcome screen

Data Quality profiling16

An overview of all the profiling metrics from all the profile jobs runs

Data Quality profiling17

Details of profiling metrics by table for a specific run

Data Quality profiling18

An historical view of profiling metrics

Data Quality profiling19

A view of value and pattern frequency distribution

Data Quality profiling20

A view of outliers and percentiles

Data Quality profiling21

tags: data management, SAS Professional Services, SAS Visual Analytics

Data Governance Series: Share DataFlux Data Quality profiling metrics in SAS Visual Analytics (Part 2) was published on SAS Users.

3月 052016

In my previous post, Introducing data-driven loops, I suggested a way of implementing programming loops with a list of index variables pulled from an external data table. These ordinary programming loops iterate during code execution while processing some data elements of an input data table.

SAS macro loops, on the other hand, are completely different creatures as they do not iterate during execution time, but rather during code compilation time. That makes SAS macro loops a powerful code generator tool allowing to produce multiple variations of SAS code snippets with actually writing them just once.

Syntactically speaking, while SAS programming loops or do-loops always reside within SAS data step, SAS macro loops or %do-loops are located within SAS macros. They can be either within a data step (or proc step) generating multiple data/proc step statements, or outside data/proc step generating multiple data/proc steps or global statements.

Implementing SAS macro loops

To make macro loop driven by data we can use two index macro variables: the first one (primary index) iterates from 1 to n incrementing by 1 effectively going through the observations of a driver table, the other macro variable (secondary index) gets its values from the driver variable and is being a true data-driven index for our macro loop.  The following figure illustrates this concept.


SAS macro loops containing data or proc steps

For example, we have data table sashelp.prdsale that looks like this:


Suppose, we need to produce in separate HTML files for each country - charts showing actual product sales by years.

Here is how this can be done the data-driven way without any hard-coding:

/* output files location */
filename odsout "C:PROJECTS_BLOG_SASdata-driven-macro-loopshtml";

/* get unique countries */
proc sort data=sashelp.prdsale(keep=COUNTRY) out=work.countries nodupkey;

%macro loop;

  %local num i cntry;

  /* get number of countries */
  %let dsid = %sysfunc(open(work.countries));
  %let num  = %sysfunc(attrn(&amp;dsid,nlobs));
  %let rc   = %sysfunc(close(&amp;dsid));

  %do i=1 %to #

    data _null_;
      p = &amp;i;
      set work.countries point=p;
      call symputx('cntry',COUNTRY);

    ods html path=odsout file="report_&amp;cntry..html"  style=styles.seaside;
    goptions reset=all device=actximg colors=() htext=9pt hsize=5in vsize=3in;
    title1 "Product sales by year for &amp;cntry";
    axis1 minor=none label=('Actual Sales');

    proc gchart data=sashelp.prdsale(where=(COUNTRY eq "&amp;cntry"));
      vbar YEAR /
      sumvar = ACTUAL
      width = 10
      outside = sum
      raxis = axis1
      cframe = white nozero discrete
      format ACTUAL dollar12.0;

    ods html close;


%mend loop;

The highlights of this code are:

  1. Using proc sort with nodupkey option we create a table work.countries of unique COUNTRY values. This can be done using proc sql as well.
  2. We determine the number of unique COUNTRY values, &num.
  3. Within macro called %loop, we use primary index – macro variable &i – to iterate from 1 to &num with increment 1.
  4. We use data _null_ step within that loop to sequentially read values of COUNTRY using direct access to observations of work.countries table by means of point= option. For each iteration &i of %do-loop, we create a secondary index – macro variable &cntry, which is used as a true index for our loop.
  5. During the code compilation, SAS macro processor loops through the %do-loop &i times repeatedly generating SAS code within it, each time with a new value &cntry, thus accomplishing our task.

This implementation of the macro %do-loop works perfectly fine, except in the situations when we need to use it within a data/proc step. The problem is the data _null_ statement that converts primary index &i to the secondary index &cntry, since we can’t use a data step within another data step.

SAS macro loop within data or proc step

Let’s solve the following coding problem. Suppose we have to create SALE_CATEGORY variable on our sashelp.prdsale table, something that you would routinely code like this:

data work.prdsale;
  set sashelp.prdsale;
  if ACTUAL &lt; 50 then SALE_CATEGORY = 'A'; else
  if ACTUAL &lt; 200 then SALE_CATEGORY = 'B'; else
  if ACTUAL &lt; 500 then SALE_CATEGORY = 'C'; else
  if ACTUAL &lt; 700 then SALE_CATEGORY = 'D'; else
  if ACTUAL &lt; 900 then SALE_CATEGORY = 'E'; else
  if ACTUAL &lt; 2000 then SALE_CATEGORY = 'F';

What is wrong with this code? Nothing. Except when category definition changes you would have to find every place in your code where to apply that change. Besides, if a number of categories is large, the code becomes large too.

Let’s implement this the data-driven way, without any hard-coded values. Notice, that in the code above we have multiple if-then-else statements of a certain pattern that are repeated multiple times and thus they can be generated via %do-loop.

Let’s create the following driver table that contains boundary and sale category definitions that match the above hard-coded data step:


The data-driven macro loop can be implemented using the following code:

%macro mloop;

  /* get observations number - num, variable numbers - vnum1, vnum2, */
  /* variable type - vtype2, getfunc = getvarC or getvarN            */
  %let dsid = %sysfunc(open(work.salecategory));
  %let num  = %sysfunc(attrn(&amp;dsid,nlobs));
  %let vnum1 = %sysfunc(varnum(&amp;dsid,upboundary));
  %let vnum2 = %sysfunc(varnum(&amp;dsid,salecat));
  %let vtype2 = %sysfunc(vartype(&amp;dsid,&amp;vnum2));
  %let getfunc = getvar&amp;vtype2;

data work.prdsale;
  set sashelp.prdsale;

  %do i=1 %to #

    /* get upboundaty and salecat values from driver table work.salecategory */
    /* and assign them to upper and categ macro variables */
    %let rc = %sysfunc(fetchobs(&amp;dsid,&amp;i));
       %let upper = %sysfunc(getvarn(&amp;dsid,&amp;vnum1));
        %let categ = %sysfunc(&amp;getfunc(&amp;dsid,&amp;vnum2));
        %if &amp;vtype2 eq C %then %let categ = "&amp;categ";

    /* generate if ... then ...; else statements */  
    if ACTUAL &lt; &amp;upper then SALE_CATEGORY = &amp;categ;
    %if (&amp;i ne &amp;num) %then %str(else);

  %let rc = %sysfunc(close(&amp;dsid));


%mend mloop;

With a little overhead of several %sysfunc() functions and SAS Component Language (SCL) functions we effectively generate a set of if-then-else statements based on the values in the driver table. Notably, even if the number of categories increases to hundreds the code does not have to be changed a bit.

Of course, this approach can be used for any SAS code generating efforts where there is a repetition of SAS code pattern.


As a bonus to those who was patient enough to bear with me to the end, here is a macro equivalent of the data _null_ statement used in the first section - SAS macro loops containing data or proc step:

Data step:

data _null_;
      p = &amp;i;
      set work.countries point=p;
      call symputx('cntry',COUNTRY);

Macro equivalent:
%let dsid = %sysfunc(open(work.countries));
    %let vnum = %sysfunc(varnum(&amp;dsid,COUNTRY));
    %let rc = %sysfunc(fetchobs(&amp;dsid,&amp;i));
    %let cntry = %sysfunc(getvarc(&amp;dsid,&amp;vnum));
    %let rc = %sysfunc(close(&amp;dsid));


Please share your thoughts and comments.

tags: SAS Macro, SAS Professional Services, SAS Programmers

Data-driven SAS macro loops was published on SAS Users.