Tech

10月 052016
 

streamviewerSAS Event Stream Processing (ESP) cannot only process structured streaming events (a collection of fields) in real time, but has also very advanced features regarding the collection and the analysis of unstructured events. Twitter is one of the most well-known social network application and probably the first that comes to mind when thinking about streaming data source. On the other hand, SAS has powerful solutions to analyze unstructured data with SAS Text Analytics. This post is about merging 2 needs: collecting unstructured data coming from Twitter and doing some text analytics processing on tweets (contextual extraction, content categorization and sentiment analysis).

Before moving forward, SAS ESP is based on a publish and subscribe model. Events are injected into an ESP model using an “adapter” or a “connector.” or using Python and the publisher API Target applications consume enriched events output by ESP using the same technology, “adapters” and “connectors.” SAS ESP provides lots of them, in order to integrate with static and dynamic applications.

Then, an ESP model flow is composed of “windows” which are basically the type of transformation we want to perform on streaming events. It can be basic data management (join, compute, filter, aggregate, etc.) as well as advanced processing (data quality, pattern detection, streaming analytics, etc.).

SAS ESP Twitter Adapters background

SAS ESP 4.2 provides two adapters to connect to Twitter as a data source and to publish events from Twitter (one event per tweet) to a running ESP model. There are no equivalent connectors for Twitter.

Both two adapters are publisher only and include:

  • Twitter Publisher Adapter
  • Twitter Gnip Publisher Adapter

The second one is more advanced, using a different API (GNIP, bought by Twitter) and providing additional capabilities (access to history of tweets) and performance. The adapter builds event blocks from a Twitter Gnip firehose stream and publishes them to a source window. Access to this Twitter stream is restricted to Twitter-approved parties. Access requires a signed agreement.

In this article, we will focus on the first adapter. It consumes Twitter streams and injects event blocks into source windows of an ESP engine. This adapter has free capabilities. The default access level of a Twitter account allows us to use the following methods:

  • Sample: Starts listening on random sample of all public statuses.
  • Filter: Starts consuming public statuses that match one or more filter predicates.

SAS ESP Text Analytics background

SAS ESP 4.1/4.2 provides three window types (event transformation nodes) to perform Text Analytics in real time on incoming events.

The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.

text-analytics-on-twitter

Here are the SAS ESP Text Analytics features:

  • Text Category” window:
    • Content categorization or document classification into topics
    • Automatically identify or extract content that matches predefined criteria to more easily search by, report on, and model/segment by important themes
    • Relies on “.mco” binary files coming from SAS Contextual Analysis solution
  • Text Context” window:
    • Contextual extraction of named entities (people, titles, locations, dates, companies, etc.) or facts of interest
    • Relies on “.li” binary files coming from SAS Contextual Analysis solution
  • Text Sentiment” window:
    • Sentiment analysis of text coming from documents, social networks, emails, etc.
    • Classify documents and specific attributes/features as having positive, negative, or neutral/mixed tone
    • Relies on “.sam” binary files coming from SAS Sentiment Analysis solution

Binary files (“.mco”, “.li”, “.sam”) cannot be reverse engineered. The original projects in their corresponding solutions (SAS Contextual Analysis or SAS Sentiment Analysis) should be used to perform modifications on those binaries.

The ESP project

The following ESP project is aimed to:

  • Wait for events coming from Twitter in the source Twitter window (this is a source window, the only entry point for streaming events)
  • Perform basic event processing and counting
  • Perform text analytics on tweets (in the input streaming, the tweet text is injected as a single field)

text-analytics-on-twitter02

Let’s have a look at potential text analytics results.

Here is a sample of the Twitter stream that SAS ESP is able to catch (the tweet text is collected in a field called tw_Text):

text-analytics-on-twitter03

The “Text Category” window, with an associated “.mco” file, is able to classify tweets into topics/categories with a related score:

text-analytics-on-twitter04

The “Text Context” window, with an associated “.li” file, is able to extract terms and their corresponding entity (person, location, currency, etc.) from a tweet:

text-analytics-on-twitter05

The “Text Sentiment” window, with an associated “.sam” file, is able to determine a sentiment with a probability from a tweet:

text-analytics-on-twitter06

Run the Twitter adapter

In order to inject events into a running ESP model, the Twitter adapter should be started and is going to publish live tweets into the sourceTwitter window of our model.

text-analytics-on-twitter07

Here we search for tweets containing “iphone”, but you can change to any keyword you want to track (assuming people are tweeting on that keyword…).

There are many additional options: -f allows to follow specific user ids, -p allows to specify locations of interest, etc.

Consume enriched events with SAS ESP Streamviewer

SAS ESP provides a way to render events in real-time graphically. Here is an example of how to consume real-time events in a powerful dashboard.

streamviewer

Conclusion

With SAS ESP, you can bring the power of SAS Analytics into the real-time world. Performing Text Analytics (content categorization, sentiment analysis, reputation management, etc.) on the fly on text coming from tweets, documents, emails, etc. and triggering consequently some relevant actions have never been so simple and so fast.

tags: SAS Event Stream Processing, SAS Professional Services, SAS Text Analytics, twitter

How to perform real time Text Analytics on Twitter streaming data in SAS ESP was published on SAS Users.

9月 302016
 

SecurityIn a number of my previous blogs I have discussed auditing within a SAS environment and how to identity who has accessed data or changed reports. For many companies keeping an audit trail is very important. If you’re an administrator in your environment and auditing is important at your organization, here are a few steps to take to secure the auditing setup and possibly audit any changes made to it, all the while ensuring there are no gaps in collecting this information.

In a SAS deployment the logging configuration is stored in XML files for each server configuration.  The xml files can be secured with OS permissions to prevent unauthorized changes. However, there are a number of ways in which a user can temporarily adjust logging settings which may allow them to prevent audit messages from reaching a log.

Logging can be adjusted dynamically in:

SAS Code using logging functions and macros
SAS Management Console
Using PROC IOMOPERATE

As an example of how a user could circumvent the audit trail let’s look at an environment where logging has been configured to audit access to SAS datasets using the settings described in this blog. When auditing is enabled, messages are recorded in the log when users access a table. In the test case for the blog we have a stored process that prints a SAS dataset. When the stored process runs, the log will record the table that was opened and the user that opened it.

protect-your-audit-trail-in-a-sas-environment

A user could turn auditing off by adding a call to the log4sas_logger function. The code below included at the start of the Stored Process (or any SAS program) would turn dataset audit logging off for the duration of the execution. Any access of datasets in the Stored Process would no longer be audited.

data _null_;
rc=log4sas_logger(“Audit.Data.Dataset.Open”, “level=off”);
run;

There is an option we can add to the logger to prevent this from happening. The IMMUTABILITY option specifies whether a logger’s additivity and level settings are permanent or whether they can be changed by using the SAS language. In the code above the level of the logger was changed using the log4sas logger function.

If we add the immutability=”true” setting to the logger, this will prevent a user from adjusting the logging level dynamically using the DATA step functions. Attempts to adjust the logger's level will fail with an error. More about the Logging Facility.

< appender-ref ref=”TimeBasedRollingFileAudit”/>
< level value=”Trace”/>
< /logger>

protect-your-audit-trail-in-a-sas-environment6

However, this setting does not fully prevent dynamic changes to logging configuration. IMMUTABILITY is ignored for configuration changes made by administrators using SAS Management Console or the IOMOPERATE procedure.

Users who have access to the SAS Management Console Server Manager plug-in can change logging on the fly. In SAS Management Console select Server Manager > SASApp > SAS App Logical Stored Process Server > <machine> right-click and select Connect.  This will display the process ID of an active Stored Process Server. Select the PID and select the loggers tab. This tab displays the current setting for the active loggers on the SAS server. The Audit.Data.Dataset.Open logger has a level of TRACE, this is the setting that is writing audit messages about activity on the dataset to the log.

protect-your-audit-trail-in-a-sas-environment2

An administrator can change the level of the logger by right-clicking on the logger name, and selecting Properties. In this example we will turn logging off for the Audit.Data.Dataset.Open logger.

protect-your-audit-trail-in-a-sas-environment3

Now any run of a stored process will no longer record audit messages about data access to the log.

Users could also use PROC IOMOPERATE to change the logging configuration on a server. The code below dynamically finds the serverid of the stored process server and then turns audit logging off. The key piece of code is the set attribute line near the end:

%let spawnerURI = %str(iom://server:8581;Bridge;user=sasadm@saspw,pass=???);
proc iomoperate ;
connect uri=”&amp;spawnerURI”;
list spawned;
list spawned out=obj_spawned;
quit;
data _null_;
set work.obj_spawned;
where upcase( serverComponent ) like ‘%STORED%’;
call symput(“stpid”, strip(serverid) );
run;
proc iomoperate;
connect uri=”&amp;spawnerURI”
spawned=”&amp;stpid”   ;
set attribute category=”Loggers” name=”Audit.Data.Dataset.Open” value=”OFF”;
quit;

You must be a SAS Administrator to change logging configuration dynamically with SAS Management Console and PROC IOMOPERATE. While access to this functionality can be restricted, it does leave a potential gap in the auditing setup.  As a further failsafe you can actually configure auditing so that any changes made to logging configuration results in audit messages about the change in a log file. Effective with SAS 9.4, the following loggers capture these changes:

  • Logging.Configuration.Logger
  • Logging.Configuration.Appender
  • Logging.Configuration.Config

Let’s see how we can audit changes that are made to the Stored Process server’s logging configuration file.  In the existing configuration file, add a new appender. The new appender (below) will write messages to a separate log in the audit directory of the Stored Process server (make sure this directory exists). The log will have a name similar to  AuditLogsSASApp_LoggingChange_2016-09-15_sasbi_16024.log

&lt;!– Rolling log file with default rollover of midnight for Logging Auditing –&gt;
&lt; appender class=”RollingFileAppender” name=”TimeBasedRollingFileLogging“&gt;
&lt; param name=”Append” value=”false”/&gt;
&lt; param name=”Unique” value=”true”/&gt;
&lt; param name=”ImmediateFlush” value=”true”/&gt;
&lt; rollingPolicy class=”TimeBasedRollingPolicy”&gt;
&lt; param name=”FileNamePattern” value=”D:SASvaconfigLev1SASAppStoredProcessServerAuditLogsSASApp_LoggingChange_%d_%S{hostname}_%S{pid}.log”/&gt;
&lt; /rollingPolicy&gt;
&lt; layout&gt;
&lt; param name=”HeaderPattern” value=”Host: ‘%S{hostname}’, OS: ‘%S{os_family}’, Release: ‘%S{os_release}’, SAS Version: ‘%S{sup_ver_long2}’, Command: ‘%S{startup_cmd}'”/&gt;
&lt; param name=”ConversionPattern” value=”%d %-5p [%t] %X{Client.ID}:%u – %m”/&gt;
&lt; /layout&gt;
&lt; /appender&gt;

Now we will add a logger that references the new appender. The Audit.Logging logger will write all changes to loggers and appenders to the referenced appender.  After you make changes to logging configuration make sure you validate the server in SAS Management Console; syntax errors in logging configuration files can render SAS servers inoperable.

&lt; level value=”Trace”/&gt;
&lt; appender-ref ref=”TimeBasedRollingFileLogging”/&gt;
&lt; /logger&gt;

Once you have this set up, any dynamic changes made to logging configuration, either through the SAS language, PROC IOMOPERATE or SAS Management Console, will write a record to the new logging configuration auditing log.

protect-your-audit-trail-in-a-sas-environment5

When auditing is a critical requirement for a SAS deployment it is important for administrators to close any gaps which would allow actions to fail to write audit records.

These gaps can be closed by:

  • Securing the logging configuration file using operating system permissions
  • Ensuring that key loggers are configured as immutable
  • Restricting access to the SAS Management Console Server Manager plug-in
  • Ensuring that only key administrators have access to credentials that can change auditing configuration using SAS Management Console and PROC IOMOPERATE
  • Configuring auditing for changes to the logging configuration
tags: SAS Administrators, SAS Professional Services

How to protect your audit trail in a SAS environment was published on SAS Users.

9月 282016
 

open_source_models_using_sasWith my first open source software (OSS) experience over a decade ago, I was ecstatic. It was amazing to learn how easy it was to download the latest version on my personal computer, with no initial license fee. I was quickly able to analyse datasets using various statistical methods.

Organisations might feel similar excitement when they first employ people with predominantly open source programming skills. . However, it becomes tricky to organize an enterprise-wide approach based solely on open source software. . However, it becomes tricky to organize an enterprise-wide approach based solely on open source software. Decision makers within many organisations are now coming to realize the value of investing in both OSS and vendor provided, proprietary software. Very often, open source has been utilized widely to prototype models, whilst proprietary software, such as SAS, provides a stable platform to deploy models in real time or for batch processing, monitor  changes and update - directly in any database or on a Hadoop platform.

Industries such as pharma and finance have realised the advantages of complementing open source software usage with enterprise solutions such as SAS.

A classic example is when pharmaceutical companies conduct clinical trials, which must follow international good clinical practice (GCP) guidelines. Some pharma organisations use SAS for operational analytics, taking advantage of standardized macros and automated statistical reporting, whilst R is used for the  planning phase (i.e. simulations), for the peer-validation of the results (i.e. double programming) and for certain specific analyses.

In finance, transparency is required by ever demanding regulators, intensified after the recent financial crisis. Changing regulations, security and compliance are mitigating factors to using open source technology exclusively. Basel’s metrics such as PD, LGD and EADs computation must be properly performed. A very well-known bank in the Nordics, for example, uses open source technology to build all type of models including ensemble models, but relies on SAS’ ability to co-exist and extend open source on its platform to deploy and operationalise open source models.

Open source software and SAS working together – An example

The appetite of deriving actionable insight from data is very crucial. It is often believed that when data is thoroughly tortured, the required insight will become obvious to drive business growth. SAS and open source technology is used by various organisations to achieve maximum business opportunities and ROI on all analytics investment made.

Using the flexibility of prototyping predictive model in R and the power and stable platform of SAS to handle massive dataset, parallelize analytic workload processing, a well-known financial institution is combining both to deliver instant results from analytics and take quick actions.

How does this work?

SAS embraces and extends open source in different ways, following the complete analytics lifecycle of Data, Discovery and Deployment.

open-source-models-using-sas

An ensemble model, built in R is used within SAS for objective comparison within SAS Enterprise Miner (Enterprise Miner is a drag and drop, workflow modelling application which is easy to use without the need to code) – including an R model within the ‘open source integration node.’

open-source-models-using-sas1

Once this model has been compared and the best model identified from automatically generated fit statistics, the model can be registered into the metadata repository making it available for usage on all SAS platform.

We used SAS Model Manager to monitor Probability of Default(PD) and Loss Given Default(LGD) model. All models are also visible to everyone within the organization depending on system rights and privileges and can be used to score and retrain new dataset when necessary. Alerts can also be set to monitor model degradation and automated message sent for real time intervention.

open-source-models-using-sas2

Once champion model was set and published, it was used in Real Time Decision Manager(RTDM) flow to score new customers coming in for loan. RTDM is a web application which allows instant assessment of new applications without the need to score the entire database.

As a result of this flexibility the bank was able to manage their workload and modernize their platform in order to make better hedging decisions and cost saving investments. Complex algorithms can now be integrated into SAS to make better predictions and manage exploding data volumes.

open-source-models-using-sas3

tags: open source, SAS Enterprise Miner

Operationalising Open Source Models Using SAS was published on SAS Users.

9月 262016
 

mwsug-2016-logoOver the past 37 years I've had the good fortune to be able to attend and present at hundreds of in-house, local, regional, special-interest and international SAS events. I am a conference junkie. I've not only attended thousands of presentations, Hands-On Workshops, tutorials, breakout sessions, quick tips, posters, breakfasts, luncheons, mixers and more, but have had the privilege of hearing, seeing and networking with thousands of like-minded SAS users and presenters as they share valuable tips, techniques, advice, and suggestions on how to best use the SAS software.

For me, attending, volunteering and participating at SAS conferences and events has not only brought personal satisfaction like nothing else, it has allowed me to grow myself professionally and make many life-long friends. One of my objectives while attending a conference is to identify and learn at least three new things I didn't already know about SAS software. These three new things could consist of "cool" programming tips, unique coding techniques, "best" practice conventions, or countless other SAS-related nuggets.

At the upcoming 2016 MidWest SAS Users Group (MWSUG) Educational Forum and Conference, I'll be presenting several topics near and dear to my heart including "Top Ten SAS Performance Tuning Techniques." This 50-minutes presentation highlights my personal top ten list of performance tuning techniques for SAS users to apply in their programs and applications. If you are unable to attend, here are a couple programming tips and techniques from each performance area to consider.

CPU Techniques

1. Use IF-THEN / ELSE or SELECT-WHEN / OTHERWISE in the DATA step, or a Case expression in PROC SQL to conditionally process data.

2. CPU time and elapsed time can be reduced by using the SASFILE statement to process the same data set multiple times.

I/O Techniques

1. Consider using data compression for large data sets.

2. Build and use indexed data sets to improve access to data subsets.

Data Storage Techniques

1. Use data compression strategies to reduce the amount of storage used to store data sets.

2. Use KEEP= or DROP= data set options to retain desired variables.

Memory Techniques

1. Use memory-resident DATA step constructs like Hash objects to take advantage of available memory and memory speeds.

2. Use the MEMSIZE= system option to control memory usage with the SUMMARY procedure.

Want to learn more SAS tips, techniques and shortcuts like these? Please join me at the MidWest SAS Users Group Conference October 9 – 11 at the Hyatt Regency in downtown Cincinnati, Ohio. Register now for three days of great educational opportunities, 100+ presentations, training, workshops, networking and more.

I look forward to meeting and seeing you there!

tags: MWSUG, SAS Programmers, US Regional Conferences

SAS performance tuning - A little bit goes a long way was published on SAS Users.

9月 242016
 

Every day, more than one hundred thousand SAS users visit our website looking for SAS information and resources. Given its importance to our user base, we’re constantly looking for ways to evolve the site.  Over the next few months, you’ll notice changes to the support website, changes we believe will provide you with a better user experience.

Today, we launch a beta version of six top-level support pages – accessible via your computer, smartphone or tablet by clicking on the banner on support.sas.com. The beta pages may look different from what you’re used to, but they are fully functional, and you can use them to fulfill your support needs. During the beta period, all current support.sas.com pages will still be available for your use, but we hope you’ll give the new beta pages a try.

As a company with more than 40 years of experience delivering business analytics software, we focus on making sure that you – our customers – are the beginning point for all of our work – whether we’re developing a new product or designing a new website.

Jim Goodnight, our CEO, put it best when he said, “For the past four decades, we’ve used a simple approach with our customers. We ask them what they want, and then we develop it for them.”

And that’s what’s at the heart of our support site evolution.

We’ve listened to your comments and suggestions. We’ve heard what’s most important to you.  And we’ve designed the new support pages with you in mind, focusing on your key tasks so you can access information, find answers and get help quickly and easily. Important information is now front and center and new, simplified navigation will enable you to get what you need with fewer clicks.  We believe the new pages deliver the experience you’ve asked us for, but we’ll let you be the judge.

During the beta period, we’d really love to hear what you think of the new pages. You’ll find a “give feedback” button at the top of each new page.  Please let us know what’s working well, and what isn’t; what you like, and what you don’t. And remember to give us feedback each time you use the beta pages.

Once the beta period is over and the new pages move into production, we’ll continue to evolve the site, working on lower level pages in an ongoing effort to give you a better support experience. We hope you like the changes we’re making and that you find the new support pages useful.

tags: SAS support site

The evolution of support.sas.com was published on SAS Users.

9月 212016
 

In a previous blog post I explained how end users should code and use shared locations for SAS artifacts, to avoid issues in a SAS Grid Manager environment. Still, they could still fall in some sharing issues, which could have very obscure manifestations. For example, users opening SAS studio might notice that it automatically opens to the last program that they were working on in a previous session… sometimes. Other times, they may logon and find that SAS Studio opens to a blank screen. What causes SAS Studio to “sometimes” remember a previous program and other times not? And why should this matter, when all I am looking for, are my preferences?

Where are my preferences?

SAS Studio has a Preferences window that enables end users to customize several options that change the behavior of different features of the software. By default, these preferences are stored under the end-user home directory on the server where the workspace server session is running (%AppData%/SAS/SASStudio/preferences in Windows or ~/.sasstudio/preferences in UNIX). Does this sentence ring any alarm bells? With SAS Studio Enterprise Edition running in a grid environment, there is no such thing as “the server where the workspace server session is running!” One invocation of SAS Studio could run on one grid node and the next invocation of SAS Studio could run on a different grid node.  For this reason, it might happen that a preference that we just set to a custom value reverts to its default value on the next sign-in. This issue can become worse because SAS Studio follows the same approach to store code snippets, tasks, autosave files, the WEBWORK library, and more.

Until SAS Studio 3.4, the only solution to this uncertainty was to have end users’ home directories shared across all the grid nodes. SAS Studio 3.5 removes this requirement by providing administrators with a new configuration option: webdms.studioDataParentDirectory. This option specifies the location of SAS Studio preferences, snippets, my tasks, and more. The default value is blank, which means that the behavior is the same as in previous releases. An administrator can point it to any shared location to access all of this common data from any workspace server session.

Tip #1

This option sounds cool, how can I change its value?

SAS Studio 3.5: Administrator’s Guide provides information on this topic, but the specific page contains a couple of errors – not to worry, they have been flagged and are in the process of being amended. The property is specified in the config.properties file in the configuration directory for SAS Studio. Remember that when you deploy using multiple Web Application Servers (which is common with many SAS solutions, and mandatory in modern clustered environments), SAS Studio is deployed in SASServer2_1, not in SASServer1_1. It is also worth noting that, in case of clustered middle tiers, this change should be applied to every deployed instance of SAS Studio web application.

The documentation page also incorrectly states how to enforce this change. The correct procedure is to restart the Web Application Server hosting SAS Studio, i.e. SASSerrver2_1.

Tip #2

I do not want that all my users to share a common directory, they would override each other’s settings!

This is a fair request, but, unfortunately, the official documentation can be confusing.

The correct syntax, to have a per-user directory, is to append <userid> at the end of the path. SAS Studio will substitute this token with the actual userid of the user currently logged on. For example, given the following configuration:

sas-studio-tips-for-sas-grid-manager-administrators

will lead to this directory structure:

sas-studio-tips-for-sas-grid-manager-administrators02

Tip #3

You – or your SAS Administrator – changed the value of the webdms.studioDataParentDirectory option. How can you know if the change has been correctly applied? This is the same as asking, how can I know where this option is currently pointing to? Here is a quick tip. This property influences the location of the WEBWORK directory. Simply open the Libraries pane, right click on WEBWORK to select “Properties”, and here you are. The highlighted path in the following screenshot shows the current value of the option:
sas-studio-tips-for-sas-grid-manager-administrators03

Conclusion

As you may have understood, the daily duties of SAS Administrators include ghost hunting as well as debugging weird issues. I hope that the tips contained in this post will make your lives a little easier, leaving more time for all the other paranormal activities. And I promise, more tips will follow!

tags: SAS Administrators, SAS Grid Manager, SAS Professional Services, sas studio

SAS Studio tips for SAS Grid Manager Administrators: Where are my preferences? was published on SAS Users.

9月 212016
 

mwsug-2016-logoI started out as a Psychology major. During my third year as an undergraduate, I was hired on as a research assistant for my advisor in her cognitive psychology lab. Through this and progressively more complicated psychological research experience, I quickly grew to love statistics. By the end of that year, I decided to declare it as a second major. My first introduction to SAS was as a fourth-year undergraduate psychology student - still new to my statistics degree curriculum and working on a large-scale meta-analysis project spanning years of data. I had never programmed before seeing my first SAS procedure. I broke down in tears, terrified at what I had gotten myself into. I toughed it out (with help from my statistics professor), finished my psychology honors thesis with top grades and went on later to use SAS in my statistics thesis for good measure.

About a year later, in 2011, that same statistics professor encouraged us to submit our work for presentation at MWSUG, sweetening the deal with a promise of extra credit if we did. I hopped on that opportunity and submitted both my psychology thesis as well as my statistics thesis that night. A couple of months later, I received an email…they accepted both of my papers and awarded me a FULL student scholarship to attend!

I have come a long way from presenting my first thesis projects (I just arrived home from my 27th conference last weekend). I have learned to love not only SAS, but the statistics behind each procedure. This year, at MWSUG 2016 in Cincinnati, OH. I will be presenting 3 projects. One project will be in ePoster format. As the chair of this section (yes, this is correct. I’ve gone from terrified student to a section chair!), I felt the need to support it with my own research as well. This project is dedicated to the common and very pesky concept of Multicollinearity.

What is Multicollinearity? Why, it is precisely the statistical phenomenon wherin there exists a perfect or exact relationship between the identified predictor variables and the interested outcome variable. Simply put, it is the existence of predictor co-dependence. Coincidently, it is quite easy to detect. You can do so with three very simple to utilize options and one procedure, such as those given in the below example:

/* Examination of the Correlation Matrix */
Proc corr data=temp;
Var hypertension aspirin hicholesterol anginachd smokingstatus obese_BMI exercise _AGE_G sex alcoholbinge; Run;
 
/* Multicollinearity Investigation: VIF TOL COLLIN */
Proc reg data=temp;
Model stroke = hypertension aspirin hicholesterol anginachd smokingstatus obese_BMI exercise _AGE_G sex alcoholbinge / vif tol collin;    
Run;     Quit;

Through the CORR procedure, we can examine the correlation matrix and manually check for any predictor variables that show high correlation with other variables. In our REG procedure, we can indicate the VIF, TOL, and COLLIN options in the MODEL statement to pull information measuring the variance inflation factor, tolerance, and collinearity.

Would you like to learn more about how to interpret the results produced by these procedures? Would you like to learn ways to control for multicollinearity after it has been detected? Come check out my poster at MWSUG 2016, October 9-11 in Cincinnati! I would love to chat about your multicollinearity issues, interest, or just other curious questions.

tags: MWSUG, papers & presentations, US Regional Conferences

Multicollinearity and MWSUG – a beautiful match was published on SAS Users.

9月 162016
 

ProblemSolversIf you use SAS® software to create a report that contains multiple graphs, you know that each graph appears on a separate page by default. But now you want to really impress your audience by putting multiple graphs on a page. Keep reading because this blog post describes how to achieve that goal.

With newer versions of SAS, there are many different options for putting multiple graphs on a single page. This blog post details these different options based on the following ODS destinations: ODS PDF, ODS RTF, and ODS HTML destinations.

To put multiple graphs on a page (whether you are using ODS or not), the SAS/GRAPH® procedure PROC GREPLAY is typically a good option and is mentioned several times in this blog post. For detailed information about using PROC GREPLAY, see SAS Note 48183: “Using PROC GREPLAY to put multiple graphs on the same page.”

The ODS PDF Destination

The ODS PDF destination is the most commonly used ODS destination for putting multiple graphs on a single page and it also offers the most options, which are described below.

STARTPAGE=NO Option

One way to put multiple graphs on a single PDF page is to use the STARTPAGE=NO option in the ODS PDF statement. Here is sample SAS code that demonstrates how to stack two SGPLOT graphs vertically on the same PDF page using the STARTPAGE=NO option in the ODS PDF statement:


title1; 
ods _all_ close; 
ods pdf file='c:tempsastest.pdf' startpage=no notoc dpi=300;

ods graphics / reset=all height=4in width=7in;  
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
  where make="Audi";
run;
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
  where make="BMW";
run;

ods pdf close;
ods preferences;  

Note that in the code above, the vertical height of the graphics output is reduced to 4 inches using the HEIGHT option in the ODS GRAPHICS statement. When using a traditional SAS/GRAPH procedure (such as GPLOT), you must specify the vertical height of your graph output using the VSIZE option in a GOPTIONS statement, as shown here:

goptions vsize=4in;

For sample code that demonstrates how to use the STARTPAGE=NO option in the ODS PDF statement to put four graphs on the same PDF page (two across and two down), see SAS Note 48569: “Use the STARTPAGE option in the ODS PDF statement to put multiple graphs on a single page in a PDF document.”

ODS LAYOUT

Another option for putting multiple graphs (and tables) on the same PDF page is ODS LAYOUT. With ODS PDF, absolute layout allows you to define specific regions on the page for your graph and table output. For sample code that demonstrates how to put multiple graphs and tables on the same PDF page using ODS LAYOUT, see SAS Note 55808: “ODS LAYOUT: Placing text, graphs, and images on the same PDF page.”

PROC GREPLAY

You can also put multiple graphs on the same page in a PDF document using the SAS/GRAPH procedure PROC GREPLAY. While the GREPLAY procedure has been around a long time, it is still a very useful option in many situations.

However, note that PROC GREPLAY can only replay graphics output that has previously been written to a SAS/GRAPH catalog, so it has the following limitations:

  • It cannot directly replay graphics output created with the SG procedures and ODS Graphics.
  • It cannot directly replay text-based output such as that created with Base® SAS procedures like PROC PRINT and PROC REPORT.

For sample SAS code that demonstrates how to put four GCHART graphs on the same PDF page using PROC GREPLAY, see SAS Note 44955: “Use PROC GREPLAY with the ODS PDF statement to place four graphs on the same page.”

For sample SAS code that demonstrates how to put six GCHART graphs on the same PDF page using PROC GREPLAY, see SAS Note 44973: “Use PROC GREPLAY to place size graphs on a single page in a PDF document.”

Although you can use PROC GREPLAY with graphics output created with the SG procedures and ODS Graphics, you must use the three-step process demonstrated in SAS Note 41461: “Put multiple PROC SGPLOT outputs on the same PDF page using PROC GREPLAY.”

The ODS RTF Destination

The following sample code demonstrates how to stack multiple graphs vertically on the same page using the SGPLOT procedure in combination with the STARTPAGE=NO option in the ODS RTF statement:

options nodate nonumber; 
 
ods _all_ close; 
ods rtf file='sastest.rtf' startpage=no image_dpi=300; 
 
ods graphics / reset=all outputfmt=png height=3in width=7in; 
 
title1 'Graph for Audi'; 
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
  where make="Audi";
run;
 
title1 'Graph for BMW'; 
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
  where make="BMW";
run;
 
title1 'Graph for Volvo'; 
proc sgplot data=sashelp.cars; 
  bubble x=horsepower y=mpg_city size=cylinders;
  where make="Volvo";
run;
 
ods _all_ close; 
ods preferences;

Because ODS LAYOUT is not fully supported with the ODS RTF destination, I recommend using PROC GREPLAY if you want to arrange multiple graphs on the same RTF page in a grid. For sample SAS code that demonstrates how to put four graphs on a single page in an RTF document, see SAS Note 45147: “Use PROC GREPLAY to place four graphs on a single page in an RTF document.”

For sample SAS code that demonstrates how to put six graphs on a single page in a RTF document, see SAS Note 45150: “Use PROC GREPLAY to place six graphs on a single page in an RTF document.”

The ODS HTML Destination

When you create multiple graphs with the ODS HTML destination, they are stacked vertically on the same web page by default. You can scroll up and down through the graphics output using your web browser’s scroll bar. In most situations, using PROC GREPLAY is a good option for displaying multiple graphs on a single web page.

Another option is to use the HTMLPANEL ODS tagset to display a panel of graphs via the web. For documentation about using the HTMLPANEL tagset, see “The htmlpanel Tagset Controls Paneling.” For sample code that demonstrates how to use the HTMLPANEL ODS tagset to display multiple graphs and tables via the web in a two-by-two grid, see SAS Note 38066: “Use the HTMLPANEL ODS tagset to put multiple graphs and tables on the same web page.”

In conclusion, this blog post covers just a few of the methods you can use to put multiple graphs on a page. There are more options available than those discussed above. For example, for sample code that puts multiple graphs on the same page using PROC SGPANEL, see SAS Note 35049: “Risk panel graph.” For sample code that puts two graphs side-by-side using Graph Template Language and PROC SGRENDER, see SAS Note 49696: “Generate side-by-side graphs with Y and Y2 axes with the Graph Template Language (GTL).”

tags: ods, Problem Solvers, SAS ODS, SAS Programmers

Do you need multiple graphs on a page? We have got you covered! was published on SAS Users.

9月 162016
 

The SAS Quality Knowledge Base (QKB) is a collection of files which store data and logic that define data cleansing operations such as parsing, standardization, and generating match codes to facilitate fuzzy matching. Various SAS software products reference the QKB when performing data quality operations on your data. One of these products is SAS Event Stream Processing (ESP). SAS ESP enables programmers to build applications that quickly process and analyze streaming events. In this blog, we will look at combining the power of these two products – SAS ESP and the SAS QKB.

SAS Event Stream Processing (ESP) Studio can call definitions from the SAS Quality Knowledge Base (QKB) in its Compute window. The Compute window enables the transformation of input events into output events through computed manipulations of the input event stream fields. One of the computed manipulations that can be used is calling the QKB definitions by using the BlueFusion Expression Engine Language function.

Before QKB definitions can be used in ESP projects the QKB must be installed on the SAS ESP server. Also, two environment variables must be set: DFESP_QKB and DFESP_QKB_LIC. The environment variable DFESP_QKB should be set to the path where the QKB data was installed. The environment variable DFESP_QKB_LIC should be set to the path and filename that contains the license(s) for the QKB locale(s).

In this post, I will explore the example of calling the State/Province (Abbreviation) Standardization QKB definition from the English – United States locale in the ESP Compute window. The Source window is reading in events that contain US State data that may or may not be standardized in the 2-character US State abbreviation.

SASEventStreamProcessing

As part of the event stream analysis I want to perform, I need the US_State values to be in a standard format. To do this I will utilize the State/Province (Abbreviation) Standardization QKB definition from the English – United States locale.

First, I need to initialize the call to the BlueFusion Expression Engine Language function and load the ENUSA (English – United States) locale. Note: The license file that the DFESP_QKB_LIC environment variable points to must contain a license for this locale.

SASEventStreamProcessing02

Next, I need to call the QKB definition and return its result. In this case, I am calling the BlueFusion standardize function. This function expects the following inputs: Definition Name, Input Field to Standardize, Output Field for Standardized Value. In this case the Definition Name is State/Province (Abbreviation), the Input Field to Standardize is US_State, and the Output Field for the Standardized Value is result. Note: The field result was declared in the Initialize expression pictured above. This result value is returned in the output field named US_State_STND.

SASEventStreamProcessing03

I can review the output of the Compute window by testing the ESP Studio project and subscribing to the Compute window.

SASEventStreamProcessing04

Here is the XML code for the SAS ESP project reviewed in this blog:

SASEventStreamProcessing05

Now that the US_State values have been standardized for the event stream, I can add analyses to my ESP Studio project based on those standardized values.

For more information, please refer to the product documentation:
SAS Event Stream Processing
DataFlux Expression Engine Language
SAS Quality Knowledge Base

tags: DataFlux Data Management Studio, SAS Event Stream Processing, SAS Professional Services

Using SAS Quality Knowledge Base Definitions in a SAS Event Stream Processing Compute Window was published on SAS Users.

9月 132016
 

SAS releases regular updates to software products in the form of hot fixes and maintenance releases. Hot fixes are SAS' timely response to customer-reported problems, as well as a way to deliver occasional security-related updates that can affect any software product.

At SAS we call them "hot fixes." Other companies might call these "updates," "patches" or (the old-timey term) "zaps." I like the term "hot fix," because it connotes 1) a timely release of code, and 2) something that you apply to an established production system (while it's running "hot").

An important part of the hot fix process is learning when there is a new fix available. Beginning this month, we now have a new SAS Hot Fix Announcements board -- on SAS Support Communities -- where you can learn about newly available fixes. If you're a SAS administrator, these notices provide more complete and detailed information about available hot fixes. And if you are a business user, it behooves you to stay informed about fixes so you can pass information on to your IT department.

Find your fixes: by e-mail, RSS feeds or search

Subscribe optionBy using the community board to host hot fix notices, we've now provided you with many options to subscribe to this content. Like any community board, you can click Subscribe and have the notifications e-mailed to you as soon as they are posted. Or you can use your favorite RSS reader to peruse the latest entries when the timing is right for you. Finally, you can use the Search field on the hot fix board to find just the fixes you need. Simply type a SAS product name, a version number, or text from a SAS Note to find the matching hot fixes. Visit this topic on the SAS Support Communities to learn more about how to pull the information you need, or to have it pushed to you automatically.

Here's a picture of the Hot Fix RSS feed in my instance of Microsoft Outlook:

Hot fix RSS feed in Microsoft Outlook

Hot fix RSS feed in Microsoft Outlook

TSNEWS-L: it has served us well

For decades, SAS Technical Support has used the TSNEWS-L mailing list to send brief, weekly summaries of the recent hot fixes. While this is good for raising awareness that "a fix is available" for a product you use, the information in the e-mail contains no details about what issues each hot fix addresses. Nor does it include a direct link to download the fix. For now, this information will still be delivered via TSNEWS-L -- but we hope that you find the community board to be more flexible.

A peek behind the hot fix curtain

It shouldn't surprise you that we use SAS software to help bring you the details about SAS fixes.

To build these improved notfications, the SAS Communities team and SAS Technical Support worked together to automate the publishing process for hot fix bulletins. All of the hot fix details were already tracked within internal-only operational databases and displayed on the Technical Support Hot Fix download site. Now, a SAS-based process assembles this data, creates a hot fix bulletin entry for each fix and publishes the entry automatically to SAS Support Communities. It's a wonderful mix of SAS data management, REST APIs, DATA step and more. It's a 300-line SAS program (including plenty of comments) that runs every weekday morning -- and it completes in about 15 seconds. It's time well-spent to keep SAS users in the know!

tags: SAS Administrators, sas communities, SAS Technical Support

How to stay informed about SAS hot fixes was published on SAS Users.