Tech

2月 102017
 

Small matters matter. Imagine saving (or spending wisely) just 1 second of your time every hour. One measly second! During your lifespan you would save or spend wisely (1 sec-an-hour * 24 hours-a-day * 365 days-a-year x 100 years) / (3600 seconds-an-hour * 24 hours-a-day) = 10 days, a whole two week vacation!

While truncation vs rounding may seem to be insignificant in a given instance, the cumulative effect of either could be truly enormous, whether it’s truncation vs rounding of decimal numbers or of the SAS time values presented below.

From my prior post Truncating decimal numbers in SAS without rounding, we know that SAS formats such as w.d, DOLLARw.d, and COMMAw.d do not truncate decimal numbers, but rather round them.

However, SAS time value formats are somewhat different. Let’s take a look.

Suppose we have a SAS time value of '09:35:57't. As a reminder, a SAS time value is a value representing the number of seconds since midnight of the current day. SAS time values are between 0 and 86400.

TIMEw.d Format

Let’s apply the TIMEw.d format to our time value and see what it does.

If you run the following SAS code:

data _null_;
	t = '09:35:57't;
	put t= time5.;
	put t= time2.;
run;

you will get in the SAS log:

t=9:35
t=9

which means that this format does truncate both seconds and minutes. Conversely, if rounding were taking place we would have gotten:

t=9:36
t=10

HHMMw.d Format

Let’s run the same SAS code with HHMMw.d format:

data _null_;
	t = '09:35:57't;
	put t= hhmm5.;
	put t= hhmm2.;
run;

SAS log will show:

t=9:36
t=9

What does that mean? It means that HHMMw.d format rounds seconds (in case of truncating I would expect to get t=9:35), but truncates minutes (in case of rounding I would expect to get t=10, as 35 minutes are closer to 10 than to 9). A bit inconsistent, at least for our purposes.

Truncating SAS time values

This little research above shows that out of the two formats, TIMEw.d and HHMMw.d, it is perfectly fine to use the TIMEw.d format for the purpose of SAS time value truncation, for both minutes and seconds.

Regardless of the format used, you can also truncate your time value computationally, before applying a format, by subtracting from that value a remainder of division of that value by 60 (for seconds truncation) or by 3600 (for minutes truncation). For example, the following code:

data _null_;
	t = '09:35:57't;
	t_m = t - mod(t,60);
	t_h = t - mod(t,3600);
	put t= hhmm5.;
	put t_m= hhmm5.;
	put t_h= hhmm5.;
run;

produces the following SAS log:

t=9:36
t_m=9:35
t_h=9:00

Rounding SAS time values

Now that we’ve learned both the computational method and the TIMEw.d format method of truncation, how do we go about rounding? As long as the format behavior is consistent we can use its truncating functionality to convert it into the rounding functionality. In order to do that we just need to increase the original time value by 60 (seconds) for seconds rounding, and by 3600 (seconds) for minutes rounding. Truncation of that new value is equivalent to rounding of the original value.

Let’s run the following SAS code:

data _null_;
	t = '09:35:57't;
	t_m = t + 60;
	t_h = t + 3600;
	put t_m= time5.;
	put t_h= time2.;
run;

SAS log will show:

t_m=9:36
t_h=10

which means that our original time value '09:35:57't was rounded in both cases – seconds rounding and minutes rounding.

Now you know how to truncate and how to round SAS time values. And don’t forget about your lifetime 2-week vacation opportunity by saving a second every hour; or make it 2 seconds per hour and enjoy the full month off.

tags: SAS Professional Services, SAS Programmers, tips & techniques

Truncating vs rounding SAS time values was published on SAS Users.

2月 102017
 

Since the SAS 9.4 M2 release in December 2014, there have been several refinements and updates to the middle tier that are of interest to installers and administrators. In this blog, I’m going to summarize them for you. What I’m describing here is available in the newest SAS release (9.4 M4). I’ll describe them at a high level, and refer you to the documentation for details and how to implement some of these changes.

Security enhancements

Preserve your TLS Customizations:
For security purposes, many of you will manually add TLS configurations, either to the SAS Web Server, the SAS Web Application Server, or both. In addition, you may prefer to use your own reverse proxy server (such as IIS), either instead of, or in addition to, the SAS Web Server. Before the 9.4 M4 release, when upgrading or applying maintenance, you had to undo these custom configurations, perform the upgrade, and then apply the custom configurations again. Now, the upgrade will preserve them, making the process much easier. See Middle-Tier Security in the Middle Tier Administration Guide, Fourth Edition for full details.

Newer versions of OpenSSL are now provided (see doc for specific version numbers):
A Java upgrade enables enforcement of TLSv2. TLS is now considered the security standard for https connections, (SSL is obsolete) and this can be enforced with configurations to the SAS Web Server and the SAS Web Application Server. The new version of Java SAS is using (Ver 1.7+) now allows for this. One important thing to be aware of is that certificates are completely independent of which protocol you are using, and therefore any certificates you may have been using with SSL should work equally with newer TLS protocols.

Management of the trusted CA (Certificate Authority) bundle:
SAS now has a trusted CA bundle, that can be managed by the SAS Deployment Manager, in a new location:  SASHome/SASSecurityCertificateFramework/1.1/cacerts/. The CA certificates can be root certificates, intermediate certificates, or both. Here’s what the menu item looks like:

Middle Tier Changes and Upgrades in SAS 9.4 M4

Previously it was necessary to manually add your root/intermediate certificates to the Java truststore “cacerts,” located inside the JRE; now it’s done through the new interface. If you are on Windows, you must also add trusted CAs to the Windows store (as before), which will make them available to any browsers running there. This is documented at http://www.sqlservermart.com/HowTo/Windows_Import_Certificate.aspx and elsewhere online.

Security Support for SAS Web Applications – white list external sites, and HTTP request methods:
For added security, web sites hosting SAS web applications can now maintain a white list of external URLs that are allowed to connect in. This provides protection against Cross Site Request Forgeries, and other vulnerabilities. This is what the prompt looks like in the SDW:

Middle Tier Changes and Upgrades in SAS 9.4 M4

HTTP request methods can also be specified as allowed/not allowed. The list of URLs can be specified during installation in the SDW (shown above), or using the SAS Management Console. You can disable whitelist checking entirely, and you can add a “blacklist” or specific sites to always block. You can also block based on request method–ie, GET, POST, PUT, etc. See the Middle Tier Administration Guide for details.

Forward Proxy Configuration:
You can now set up SAS web applications to forward external URL requests through a proxy–here it’s called a forward proxy server. Many organizations do this behind their firewalls. See details for how to set this up in the administration guide.

Other miscellaneous changes:
As an administrator you can now force users to Log Off using SAS Web Administration Console.    You can also send emails to one or more users from the same window.  This is what the menu looks like:

Middle Tier Changes and Upgrades in SAS 9.4 M4

Faster start-up time for the SAS Web Application Server

JMS Broker (ActiveMQ) now uses Version 5.12.2 (fixed bugs).

SAS Web Server now uses version 5.5.2 and includes an updated mod_proxy_connect module for TLS tunneling.

References

SAS 9.4 Intelligence Platform: Middle Tier Administration Guide, Fourth Edition

Encryption in SAS 9.4, Sixth Edition

 

tags: SAS 9.4, SAS Administrators, SAS Professional Services, security

Middle Tier Changes and Upgrades in SAS 9.4 M4 was published on SAS Users.

2月 032017
 

I will begin with a short story.

SAS Global Forum, Content is KingLike many employers, McDougall Scientific, my employer, requires its employees to review, with their co-workers and managers, what they learned at a conference or course. They are also asked to suggest applications of their learnings so that McDougall might realize value from the expense, both in time and money, of sending them to continuing education events.

Fei Wang, my co-worker, and I attended SAS Global Forum last year in Vegas. During her presentation to co-workers upon our return, Fei not only provided a comprehensive overview of the conference format, sessions, and learning opportunities, but she also chose one presentation to highlight that will fundamentally improve one of our business processes.

Although Fei attended many sessions and learned much, session 8480-2016, with thanks to Steven Black, will save McDougall enough time and money to dwarf the expenditure of sending Fei to SAS Global Forum.

“But John,” you might ask, “why not simply search the proceedings after the conference?” Well, because we would never think to search for CRF annotation automation. Innovation of this sort is more easily found by attending the conference. Discovering valuable nuggets like Steven’s idea is a common occurrence at SAS Global Forum.

The value that employers realize from SAS Global Forum is the reason “content is king,” a cliché first introduced by the magazine publishing industry in the mid-1970s.

Our speakers represent every region of the world!

Though there are a number of really great benefits from attending the conference, great content continues to reign supreme at SAS Global Forum.  This year’s conference is no different. The 2017 Content Advisory Team has assembled a stellar lineup of well over 600 sessions; invited speakers, contributed papers, hands-on workshops, tutorials and posters. And, I am very proud to report that 25 countries are contributing speakers this year, with every region of the world represented: North, Central, and South Africa, Europe, Australia, the Middle East, Asia and the Americas. This sort of global diversity brings new ideas and new ways of looking at and solving problems that really grows your knowledge and helps move your organization forward.

In addition to all of this great technical content, we have made special effort to organize sessions that help SAS Users better present their work. As Melissa Marshall famously claims, “Science not communicated is science not done.” Therefore, in keeping with the SAS Global Users Group’s mission to champion the needs of SAS users around the globe, here is a sampling of sessions that will help you better communicate.

The list starts with Melissa herself!

Present Your Science: Transforming Technical Talks
Session T108, Melissa Marshall, Principal, Melissa Marshall Consulting LLC

This versatile half-day workshop covers the full gamut: content strategy, slide design, and presentation delivery. With a dynamic combination of lecture, discussion, video analysis, and exercises, this workshop will truly transform how technical professionals present their work and will help foster a culture of improved communications throughout the SAS community.
Read More

How the British Broadcasting Corporation Uses Data to Tell Stories in a Visually Compelling Way
Session 0824, Amanda J Farnsworth, Head of Visual Journalism, BBC News

… data is often seen as a dry, detached, unemotional thing that's hard to understand and for many, easy to ignore. At the BBC, employees have been thinking hard about how to use data to tell stories in a visually compelling way that connects with audiences and makes them more curious about the world that we live in. And, there is an ever-increasing amount of data with which to tell those stories. Governments are publishing more big data sets about health, education, crime, and social makeup. Academics are generating huge amounts of data as a consequence of research. Businesses and other organizations conduct their own research and polling. The BBC’s aim is to take that data and make it relevant at a personal level, answering the audiences' number one question: what does this mean for me?
Read More

Convince Me: Constructing Persuasive Presentations
Session 0862, Frank Carillo, CEO and Anne Coffey, Senior Director, E.C.G. Inc.

Data outputs do not a persuasive argument make. Effective persuasion requires a combination of logic and emotion supported by facts. Statisticians dedicate their lives to analyzing data such that it is appropriate supporting evidence. While the appropriate evidence is essential to convince your listeners, you first have to be able to gain and maintain their attention and trust. Persuasive presentations fight for hearts and minds, and are not a dry, unbiased recitation of facts or analyses. This session is designed to provide suggestions for how to utilize successful structures and create emotional connections.
Read More

Data Visualization Best Practices: Practical Storytelling Using SAS®
Session T117, Greg S Nelson, CEO, Thotwave Technologies LLC.

Data means little without our ability to visually convey it. Whether building a business case to open a new office, acquiring customers, presenting research findings, forecasting or comparing the relative effectiveness of a program, we are crafting a story that is defined by the graphics that we use to tell it. Using practical, real-world examples, students will learn how to critically think about visualizations.
Read More

Presentations as Listeners Like Them: How to Tailor for an Audience
Session 0408, Frank Carillo, CEO and Anne Coffey, Senior Director, E.C.G. Inc.

Data doesn't speak for itself. We speak for it, and how we do that influences how people view and interpret that data. One of the most overlooked aspects of presenting data is analyzing the audience. At no point in history have speakers had to face such heterogeneous audiences as they do today: there might be many as five different generations in the room, cross-functional teams have broad areas of expertise, and international companies integrate different cultures and customs. This session is designed to teach attendees how to analyze not the data, but the listeners. Who is your audience? What is important to them? What is your message …?
Read More

tags: papers & presentations, SAS Global Forum

At SAS Global Forum, Content is King was published on SAS Users.

2月 012017
 

SAS® Federation Server provides a central, virtual environment for administering and securing access to your data. It also allows you to combine data from multiple sources without moving or copying the data. SAS Federation Server Manager, a web-based application, is used to administer SAS Federation Server(s).

Data privacy is a major concern for organizations and one of the features of SAS Federation Server is it allows you to effectively and efficiently control access to your data, so you can limit who is able to view sensitive data such as credit card numbers, personal identification numbers, names, etc. In this three-part series, I will explore the topic of controlling data access using SAS Federation Server.

The series covers the following topics:

SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data. Note: Permissions applied for particular data source cannot be bypassed with SAS Federation Server security. If permissions are denied at the source data, for example on a table, then users will always be denied access to that table, no matter what permissions are set in SAS Federation Server.

In this blog post, I build on the example in my previous post and demonstrate how you can use SAS Federation Server Manager to control access to columns and rows in tables and views.

Previously, I gave the Finance Users group access to the SALARY table. Robert is a member of the Finance Users group, so he has access to the SALARY table; however, I want to restrict his access to the IDNUM column on the table. To do this, first I view the SALARY table Authorizations in Federation Server Manager, then I select the arrow to the right of the table name to view its columns.

Next, I select the IDNUM column. I then add the user Robert and set his SELECT permission to Deny for the column.

Note: There are 5 columns on the SALARY table.
Since he was denied access to the IDNUM column, Robert is only able to view 4 out of 5 columns.

Susan is also a member of the Finance Users group, so she has access to the SALARY table; however, I want to restrict her access to only rows where the JOBCODE starts with a “Q.” To do this, first I view the SALARY table Authorizations in Federation Server Manager.

Next, I select the Row Authorizations tab and select New Filter. I use the SQL Clause Builder to build my condition of JOBCODE LIKE Q%.

Next, I select the Users and Groups tab and add Susan to restrict her access to the filter I just created.

Finally, I select OK to save the changes I made to Row Authorizations.

Susan is now only able to view the rows of the SALARY table where the JOBCODE begins with “Q.”

In this blog entry, I covered the second part of this series on Securing sensitive data using SAS Federation Server at the row and column level:

Part 1: Securing sensitive data using SAS Federation Server at the data source level
Part 2: Securing sensitive data using SAS Federation Server at the row and column level
Part 3: Securing sensitive data using SAS Federation Server data masking

More information on SAS Federation Server:

tags: SAS Administrators, SAS Federation Server, SAS Professional Services

Securing sensitive data using SAS Federation Server at the row and column level was published on SAS Users.

1月 302017
 

Recently, SAS shipped the fourth maintenance of SAS 9.4. Building on this foundation, SAS Studio reached a new milestone, its 3.6 release. All editions have been upgraded, including Personal, Basic and Enterprise. In this blog post, I want to highlight the new features that have been introduced. In subsequent posts I’ll discuss some of these features in more detail.

1  -  SAS Studio 3.6 includes many new features and enhancements, including:

2  -  new preferences to personalize even more of the SAS Studio user experience. In detail, it is now possible to:

  • control whether items in the navigation pane, such as libraries, files and folders, are automatically refreshed after running a program, task or query.

  • determine whether, at start up, SAS Studio attempts to restore the tabs that were open during the prior session, when it was last closed.

3  -  enhancements to the background submit feature (previously known as batch submit), with more control on the output and log files. SAS Studio 3.6 also enforces a new behavior: if the background SAS program is a FILE on the server and not an FTP reference, then the current working directory is automatically set to the directory where the code resides. This enables the use of relative paths in code to reference artifacts such as additional SAS code to include with “%include” statements (i.e. %include ./macros.sas), references to data files (i.e. libname data “.”;), or images to be included in ODS output.

4  -  ability to generate HTML graphs in the SVG format instead of the PNG format.

5  -  many new analytical tasks for power and sample size analysis, cluster analysis and network optimization.

Impressive new features to be sure, but that’s not all. Here’s a bonus feature that I personally find really interesting.

  • The navigation pane includes new categories, both in the code snippets section and in the task section, to streamline the integration between SAS 9.4 and SAS Viya. A new category of Viya Cloud Analytic Services code snippets helps you connect to SAS Viya and work with CAS tables. New Viya Machine Learning tasks enable you to run SAS code in a SAS Viya environment. You can do all this while working from your 9.4 environment.

tags: SAS Professional Services, sas studio

SAS Studio 3.6 new features was published on SAS Users.

1月 262017
 

SAS® Viya™ 3.1 represents the third generation of high performance computing from SAS. Our journey started a long time ago and, along the way, we have introduced a number of high performance technologies into the SAS software platform:

Introducing Cloud Analytic Services (CAS)

SAS Viya introduces Cloud Analytic Services (CAS) and continues this story of high performance computing.  CAS is the runtime engine and microservices environment for data management and analytics in SAS Viya and introduces some new and interesting innovations for customers. CAS is an in-memory technology and is designed for scale and speed. Whilst it can be set up on a single machine, it is more commonly deployed across a number of nodes in a cluster of computers for massively parallel processing (MPP). The parallelism is further increased when we consider using all the cores within each node of the cluster for multi-threaded, analytic workload execution. In a MPP environment, just because there are a number of nodes, it doesn’t mean that using all of them is always the most efficient for analytic processing. CAS maintains node-to-node communication in the cluster and uses an internal algorithm to determine the optimal distribution and number of nodes to run a given process.

However, processing in-memory can be expensive, so what happens if your data doesn’t fit into memory? Well CAS, has that covered. CAS will automatically spill data to disk in such a way that only the data that are required for processing are loaded into the memory of the system. The rest of the data are memory-mapped to the filesystem in an efficient way for loading into memory when required. This way of working means that CAS can handle data that are larger than the available memory that has been assigned.

The CAS in-memory engine is made up of a number of components - namely the CAS controller and, in an MPP distributed environment, CAS worker nodes. Depending on your deployment architecture and data sources, data can be read into CAS either in serial or parallel.

What about resilience to data loss if a node in an MPP cluster becomes unavailable? Well CAS has that covered too. CAS maintains a replicate of the data within the environment. The number of replicates can be configured but the default is to maintain one extra copy of the data within the environment. This is done efficiently by having the replicate data blocks cached to disk as opposed to consuming resident memory.

One of the most interesting developments with the introduction of CAS is the way that an end user can interact with SAS Viya. CAS actions are a new programming construct and with CAS, if you are a Python, Java, SAS or Lua developer you can communicate with CAS using an interactive computing environment such as a Jupyter Notebook. One of the benefits of this is that a Python developer, for example, can utilize SAS analytics on a high performance, in-memory distributed architecture, all from their Python programming interface. In addition, we have introduced open REST APIs which means you can call native CAS actions and submit code to the CAS server directly from a Web application or other programs written in any language that supports REST.

Whilst CAS represents the most recent step in our high performance journey, SAS Viya does not replace SAS 9. These two platforms can co-exist, even on the same hardware, and indeed can communicate with one another to leverage the full range of technology and innovations from SAS. To find out more about CAS, take a look at the early preview trial. Or, if you would like to explore the capabilities of SAS Viya with respect to your current environment and business objectives speak to your local SAS representative about arranging a ‘Path to SAS Viya workshop’ with SAS.

Many thanks to Fiona McNeill, Mark Schneider and Larry LaRusso for their input and review of this article.

 

tags: global te, Global Technology Practice, high-performance analytics, SAS Grid Manager, SAS Visual Analytics, SAS Visual Statistics, SAS Viya

A journey of SAS high performance was published on SAS Users.

1月 262017
 

Recently a colleague told me Google had published new, interesting data sets at BigQuery. I found a lot of Reddit data as well, so I quickly tried running BigQuery with these text data to see what I could produce.  After getting some pretty interesting results, I wanted to see if I could implement the same analysis with SAS and if using SAS Text Mining you would get deeper insights than simple queries. So, I tried SAS with Reddit comments data and I’d like to share my analyses and findings with you.

Analysis 1: Significant Words

To get started with BigQuery, I googled what others were sharing regarding BigQuery and Reddit, and I found USING BIGQUERY WITH REDDIT DATA. In this article the author posted a query statement about extracting significant words from Politics subreddit. I then wrote a SAS program to mimic this query and I got following data with the July of Reddit comments. The result is not completely same as the one from BigQuery, since I downloaded the Reddit data from another web site and used SAS Text Parsing action to parse the comments into tokens rather than just splitting tokens by white space.

Analysis 2: Daily Submissions

The words Trump and Hillary in the list raised my interest and begged for further analysis. So, I did a daily analysis to understand how hot Trump and Hillary were during this month. I filtered all comments mentioning Trump or Hillary under Politics subreddit and counted total submissions per day. The resulting time series plot is shown below.

I found several spikes in the plot, which happened on 2016/7/5, 2016/7/12, 2016/7/21, and 2016/7/26.

Analysis 3: Topics Time Line

I wondered what Reddit users were concerned about on these specific days, so I extracted the top 10 topics from all comments submitted in July, 2016 within Politics subreddit and got the following data. These topics obviously focused on several aspects, such as vote, president candidates, party, and hot news such as Hillary’s email probe.

The topics showed what people were concerned about in the whole month, but I need further investigation in order to explain which topic mostly contributed to the four spikes. The topics’ time series plot helped me find the answer.

Some topics’ time series trends are very close and it is hard to determine which topic contributed mostly, so I got the top contribution topic based on their daily percentage growth. The top growth topic on July 05 is “emails, dnc, +server, hillary, +classify”, which has 256.23 times of growth.

Its time series plot also shows a high spike on July 05. Then, I googled with “July 5, 2016 emails dnc server hillary classify” and I got following news.

There is no doubt the spike on July 05 is related to the FBI’s decision about Clinton’s email probe. In order to confirm this, I extracted the Top 20 Reddit comments submitted on July 05 according to its Reddit score. I quoted partial comment from the top one and I found the link in the comment was included in the Google’s search result.

"...under normal circumstances, security clearances would be revoked. " This is your FBI. EDIT: I took paraphrased quote, this is the actual quote as per https://www.fbi.gov/news/pressrel/press-releases/statement-by-fbi-director-james-b.-comey-on-the-investigation-of-secretary-hillary-clintons-use-of-a-personal-e-mail-system - "

Similar analysis was done on the other three days and the hot topics as follows.

Interestingly, one person did a sentiment analysis with Twitter data and the tweet submission trend of July looks the same as Reddit.

And in this blog, he listed several important events that happened in July.

  • July 5th: the FBI says it’s not going to end Clinton’s email probe and will not recommend prosecution.
  • July 12th: Bernie Sanders endorses Hillary Clinton for president.
  • July 21st: Donald Trump accepts the Republican nomination.
  • July 25-28: Clinton accepts nomination in the DNC.

It showcased that different social media data have similar response trends on the same events.

Now I know why these spikes happened. However, more questions came to my mind.

  • Who started posting these news?
  • Were there cyber armies?
  • Who were opinion leaders in the politics community?

I believe all these questions can be answered by analyzing the data with SAS.

tags: SAS R&D, SAS Text Analytics

Analyzing Trump v. Clinton text data at Reddit was published on SAS Users.

1月 212017
 

zhangEditor's note: This following post is from Xiaoyuan Zhang, presenter at an upcoming Insurance and Finance User Group (IFSUG) webinar.

Learn more about Xiaoyuan Zhang.


As a business user with limited statistical skills, I don’t think I could build a credit scorecard without the help of SAS Enterprise Miner. As you can see from the flow chart, SAS Enterprise Miner, a descriptive and predictive modeling software, does an amazing job in model developing and streamlining.

credit_score_modeling-in-sas-enterprise-minerThe flow chart presents my whole credit score modeling process, which is divided into three parts: creating the preliminary scorecard, performing reject inference, and building the final scorecard. I will cover the whole process in the Insurance and Finance Users Group (IFSUG) virtual session on Feb 3, 2017. In this blog I wanted to emphasize the second part, which is sometimes easy to ignore.

The data for preliminary scorecard is from only accepted loan applications. However, the scorecard modeler needs to apply the scorecard to all applicants, both accepted and rejected. To solve the sample bias problem reject inference is performed.

Before inferring the behavior (good or bad) of the rejected applicants, data examination is needed. I used StatExplore node to explore the data and found out that there were a significant number of missing values, which is problematic. Because in SAS Enterprise Miner regression model, the model that is used here for scorecard creation and reject inference, ignores observations that contain missing values, which reduces the size of the training data set. Less training data can substantially weaken the predictive power of the model.

To help with this problem, Impute Node is used to impute the missing values. In the Properties Panel of the node, there are a variety of choices from which the modeler could choose for the imputation. In this model, Tree surrogate is selected for class variables and Median is selected for interval variables.

However, in Impute Node data role is set as Train. In order to use the data in Reject Inference Node, data role needs to be changed into Score. A SAS Code node is put in between for this purpose, which writes as:

data &em_export_score;
      set &em_import_data;   
   run;

Last but not least, Reject Inference Node is used to infer the performance of the rejected loan applicant data. SAS Enterprise Miner offers three standard, industry-accepted methods for inferring the performance of the rejected applicant data by the use of a model that is built on the accepted applicants. We won’t explore the three methods in detail here, as the emphasis of the blog is on the process.

To hear more on this topic, please register for the IFSUG virtual session, Credit Score Modeling in SAS Enterprise Miner on February 3rd from 11am-12pm ET.


About Xiaoyuan Zhang

Xiaoyuan Zhang grew up in Zhaoyuan China on the coast of the Bohai sea. Her town is famous for its ancient gold mine, hot springs and its unusual and tasty seafood. Her undergraduate degree is from China Agricultural University in Bejing, where she majored in Marketing Intelligence and graduated with honors. She graduated, with honors, from Drexel University with a Master Degree in Finance. She has passed two CFA exams and learned Enterprise Miner in one of her courses. She specializes in efficient credit score modeling with unutilized SAS Enterprise Minor. She is using some of her post-graduation free time to study "regular SAS", to tutor and to volunteer.

 

tags: IFSUG, SAS Enterprise Miner

Credit score modeling in SAS Enterprise Miner: Reject inference to solve sample bias problem was published on SAS Users.

1月 202017
 

SAS/GRAPH 9.4 capabilitiesI remember my grandparents talking about how hard things were for them growing up. They would say, “Things were so bad that we had to walk uphill, both ways, in the freezing snow to get to school.” It was always hard for me to relate to these statements because the school bus picked me up at the end of my driveway. Fast forward to today and people are riding on hoverboards. Through the years, advancement in transportation has made it easier for us to get where we need to be.

SAS/GRAPH® Version 6

The evolution of SAS/GRAPH® is similar. In the earlier days of SAS® software, during Version 5 and 6 of SAS/GRAPH, I understood how difficult it was to create some of the graphs that customers wanted. A customer recently asked whether I could send him the code that produced the graph below, which he found in the SAS/GRAPH® Software: Reference, Volume 1 Version 6 Edition:

sasgraph-9-4-capabilities

In Version 6, the only way to create this graph, referred to as a butterfly chart, was by using SAS/GRAPH and the Annotate facility. The annotation statements added over 60 lines of code to the program.

Below is a snippet of the Version 6 program that created the bars on the left side of the graph.

Click this link to see the entire Version 6 program.

     /* female bars on left */
    %bar(39.8, 10.5, 25.0, 20.0, blue, 0, solid);
    %bar(39.8, 20.71, 15.0, 30.7, green, 0, solid);
    %bar(39.8, 31.42, 10.0, 41.42, red, 0, solid);
    %bar(39.8, 42.14, 32.0, 52.14, blue, 0, solid);
    %bar(39.8, 52.85, 33.0, 62.85, green, 0, solid);
    %bar(39.8, 63.57, 36.0, 73.57, red, 0, solid);
    %bar(39.8, 74.28, 35.0, 84.28, blue, 0, solid);
    %bar(39.8, 85.0, 33.0, 95.0, green, 0, solid);

SAS/GRAPH® 9.4

Fast forward to SAS® 9.4. ODS Graphics and SG procedures have been part of Base SAS® since SAS® 9.3, making it much easier to create high-quality graphs without using additional software. The graph that was previously created using the Annotate facility can now be created using the Graph Template Language (GTL) and the SGRENDER procedure. Here is the program to create the entire graph:

Note: The program below contains numbered annotations that correspond to a discussion below the program. So, if you copy and paste this program into SAS, be sure to delete the annotations before running this code.

data ratio;
  input Age $1-8 type $ Female Male;
datalines;
over 79  A 19 18
70-79    B 15 14
60-69    C 13 12
50-59    A 24 46
40-49    B 26 18
30-39    C 92 61
20-29    A 77 88
under 20 B 42 100
;
run;
 
proc template; 
   define statgraph population; ❶
   begingraph /border=false❷ datacolors=(green blue red); ❸
        entrytitle 'Population Tree' / textattrs=(size=15);❹
   entrytitle 'Distribution of Population by Sex';
   layout lattice ❺/ columns=2 ❻ columnweights=(.55 .45); ❼
      layout overlay ❽/ walldisplay=none y2axisopts=(reverse=true
                         tickvaluehalign=center 
                         display=(tickvalues))  
                         xaxisopts=(displaysecondary=(label) 
                         display=(tickvalues line) reverse=true 
                         labelattrs=(weight=bold));
         barchart category=age response=Female / group=type orient=horizontal 
                                                 yaxis=y2 barlabel=true;
      endlayout;
      layout overlay ❽/ walldisplay=none 
                         yaxisopts=(reverse=true display=none)
                         xaxisopts=(displaysecondary=(label)   
                         display=(tickvalues line)
                         labelattrs=(weight=bold)); 
         barchart category=age response=Male / group=type orient=horizontal 
                                               barlabel=true ;
      endlayout;
   endlayout;
   endgraph;
   end;
 
proc sgrender data=ratio template=population;
run;

As you can see, this program is much simpler than the one created using Version 6 above. Let’s take a closer look at the code.

❶ The TEMPLATE procedure creates a GTL definition called POPULATION with the DEFINE statement.

❷ In the BEGINGRAPH statement, the BORDER=FALSE option turns off the outside border.

❸ The DATACOLORS option defines the colors for the groups.

❹ The ENTRYTITLE statements define the titles for the graph.

❺ A LAYOUT LATTICE block serves as a wrapper for the LAYOUT OVERLAY statements.

❻ The COLUMNS option in the LAYOUT LATTICE block defines the layout of the cells.

❼ Because the graph in the left cell contains the bar values, the COLUMNWEIGHTS option allocates more room for this graph. The values for COLUMNWEIGHTS need to add up to 1.

❽ Two LAYOUT OVERLAY statements define the two cells in this graph.

LAYOUT OVERLAY Code

The two LAYOUT OVERLAY blocks are similar, so we will look closer at only the first one:

layout overlay / walldisplay=none ❶ 
                 y2axisopts=(reverse=true 
                 tickvaluehalign=center display=(tickvalues)) ❷ 
                 xaxisopts=(displaysecondary=(label)display=(tickvalues line)reverse=true ❺ 
                 labelattrs=(weight=bold));
   barchart category=age response=Female ❻/ group=type ❼
                           orient=horizontal ❽ yaxis=y2 ❾ barlabel=true; ❿
endlayout;

❶ The WALLDISPLAY=NONE option turns off the border around the graph.

❷ The REVERSE=TRUE option reverses the Y axis order, TICKVALUEHALIGN=CENTER centers the tick mark values, and the DISPLAY=TICKVALUES option displays only the tick mark values. These options are specified within Y2AXISOPTS.

❸ DISPLAYSECONDARY=(LABEL) displays the X axis label on the X2axis, at the top of the graph.

❹ On the X axis, at the bottom of the graph, the DISPLAY=(TICKVALUES LINE) option displays the tick mark values and the axis line.

❺ The REVERSE=TRUE option also reverses the X axis.

❻ The bar chart contains a bar for each Age. The length of the bars is based on the values of the variable Female with the CATEGORY=AGE and RESPONSE=FEMALE options in the BARCHART statement respectively.

❼ The group variable, TYPE, determines the color of the bars.

❽ The ORIENT=HORIZONTAL option in the BARCHART statement specifies that the bars are horizontal.

❾ YAXIS=Y2 specifies that the values are plotted against the right Y axis, Y2 axis.

❿ The BARLABEL=TRUE option provides labels for the bars.

Here is the graph that is created when you submit this program:

sasgraph-9-4-capabilities02

As this example shows, SAS graphing capabilities have improved over the years just as transportation options have progressed. Be sure to take advantage of these improvements to create helpful visualizations of your data! If you would like to create a similar graph with PROC SGPLOT, refer to Sanjay Matange’s blog post “Butterfly plots.”


Version 6 program

    /* set the graphics environment */
 goptions reset=global gunit=pct border
          ftext=swissb htitle=6 htext=3 dev=png;
 %annomac;
 
    /* create the Annotate data set, POPTREE */
 data poptree;
       /* length and type specification */
    %dclanno;
 
       /* set length of text variable   */
    length text $ 16;
 
 
       /* window percentage for x and y */
    %system(5, 5, 3);
 
       /* draw female axis lines */
    %move(5, 10);
    %draw(40, 10, red, 1, .5);
    %draw(40, 95, red, 1, .5);
 
       /* draw male axis lines */
    %move(56.1, 95);
    %draw(56.1, 10, red, 1, .5);
    %draw(95, 10, red, 1, .5);
 
       /* label categories */
    %label(75.0, 97.0, 'Male', green, 0, 0, 4, swissb, 5);
 
       /* at top */
    %label(25.0, 97.0, 'Female', green, 0, 0, 4, swissb, 5);
    %label(5.0, 5, '100', blue, 0, 0, 4, swissb, 5);
    %label(22.5, 5, ' 50', blue, 0, 0, 4, swissb, 5);
    %label(40.0, 5, ' 00', blue, 0, 0, 4, swissb, 5);
    %label(95.0, 5, '100', blue, 0, 0, 4, swissb, 5);
    %label(75.0, 5, ' 50', blue, 0, 0, 4, swissb, 5);
    %label(56.0, 5, ' 00', blue, 0, 0, 4, swissb, 5);
 
       /* label age */
    %label(48.0, 15.25, 'under 20', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 25.0, '20 - 29', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 36.7, '30 - 39', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 47.4, '40 - 49', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 57.8, '50 - 59', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 68.6, '60 - 69', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 79.3, '70 - 79', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 90.0, 'over 79', blue, 0, 0, 4, swissb, 5);
 
       /* male bars on right */
    %bar(56.2, 10.5, 95.0, 20.0, blue, 0, solid);
    %bar(56.2, 20.71, 90.0, 30.71, green, 0, solid);
    %bar(56.2, 31.42, 80.0, 41.52, red, 0, solid);
    %bar(56.2, 42.14, 62.0, 52.14, blue, 0, solid);
    %bar(56.2, 52.85, 72.0, 62.85, green, 0, solid);
    %bar(56.2, 63.57, 60.0, 73.57, red, 0, solid);
    %bar(56.2, 74.28, 61.0, 84.28, blue, 0, solid);
    %bar(56.2, 85.0, 63.0, 95.0, green, 0, solid);
 
       /* label male bars on right */
    %label(95.0, 20.0, '100', black, 0, 0, 4, swissb, 7);
    %label(90.0, 30.71, '88', black, 0, 0, 4, swissb, 7);
    %label(80.0, 41.52, '61', black, 0, 0, 4, swissb, 7);
    %label(62.0, 52.14, '18', black, 0, 0, 4, swissb, 7);
    %label(72.0, 62.85, '46', black, 0, 0, 4, swissb, 7);
    %label(60.0, 73.57, '12', black, 0, 0, 4, swissb, 7);
    %label(61.0, 84.28, '14', black, 0, 0, 4, swissb, 7);
    %label(62.0, 95.0, '18', black, 0, 0, 4, swissb, 7);
 
       /* female bars on left */
    %bar(39.8, 10.5, 25.0, 20.0, blue, 0, solid);
    %bar(39.8, 20.71, 15.0, 30.7, green, 0, solid);
    %bar(39.8, 31.42, 10.0, 41.42, red, 0, solid);
    %bar(39.8, 42.14, 32.0, 52.14, blue, 0, solid);
    %bar(39.8, 52.85, 33.0, 62.85, green, 0, solid);
    %bar(39.8, 63.57, 36.0, 73.57, red, 0, solid);
    %bar(39.8, 74.28, 35.0, 84.28, blue, 0, solid);
    %bar(39.8, 85.0, 33.0, 95.0, green, 0, solid);
 
       /* label female bars on left */
    %label(25.0, 20.0, '42', black, 0, 0, 4, swissb, 9);
    %label(15.0, 30.7, '77', black, 0, 0, 4, swissb, 9);
    %label(10.0, 41.42, '92', black, 0, 0, 4, swissb, 9);
    %label(32.0, 52.14, '26', black, 0, 0, 4, swissb, 9);
    %label(33.0, 62.85, '24', black, 0, 0, 4, swissb, 9);
    %label(36.0, 73.57, '13', black, 0, 0, 4, swissb, 9);
    %label(35.0, 84.28, '15', black, 0, 0, 4, swissb, 9);
    %label(33.0, 95.0, '19', black, 0, 0, 4, swissb, 9);
 run;
 
    /* define the titles */
 title1 'Population Tree';
 title2 h=4 'Distribution of Population by Sex';
 
    /* generate annotated slide */
 proc gslide annotate=poptree;
 run;
 quit;
tags: Problem Solvers, SAS 9.4, SAS Programmers

Comparing SAS/GRAPH® 9.4 capabilities with SAS/GRAPH® Version 6 was published on SAS Users.

1月 172017
 

Editor's note: Charyn Faenza co-authored this blog. Learn more about Charyn.

As the fun of the festive season ends, the buzz of the new year and the enchantment of SAS Global Forum 2017 begins. SAS Global Forum is a conference designed by SAS users, for SAS users, bringing together SAS professionals from all over the world to learn, collaborate and network in person. Sure, online communication is great, but it’s hard to beat the thrill of meeting fellow SAS users face-to-face for the first time. It feels like magic! To help you prepare for the event, Charyn and I wanted to share a few things including information on metadata security. Read on for more.

Start your SAS Global Forum journey now!

SUGAWant to stay up to date with SAS Global Forum activities, and get a head start on your conference networking? Join the SAS Global Forum 2017 online community. Here you can post questions, share ideas, and connect with others before the event. While you are at it, the SAS User Group for Administrators (SUGA) community also feels magical for me.  As part of the committee, we regularly get together (virtually!) to discuss and plan exciting events on behalf of SAS administrators around the world.  Join the SUGA community and watch for upcoming events, including a live meet-up at SAS Global Forum! That event is scheduled for Monday, April 3, from 6:30-8:00 p.m.

Security auditing

During his workshop at SAS Global Forum 2014, Gregory Nelson pointed out that the SAS administrator role has evolved over the years, and so has one of their key responsibilities: security auditing. Once you’ve set up an initial security plan, how do you ensure that the environment remains secure? Can you just “set it and forget it?” Probably not. Especially if you want to ensure regulatory compliance, to maintain business confidence and keep your SAS platform in line with its design specifications as your business grows and your SAS environment evolves.

Thinking about your own SAS platform:

  • What would happen in your organization if someone accessed data they shouldn’t?
  • When was your last SAS platform security project?
  • When was it last tested? How extensive was it? How long did it take?
  • Have there been any changes since it was last tested? Whether they are deliberate, accidental, expected or unexpected.
  • How do you know if it’s still secure today?

Presenting at SAS Global Forum

If security is important to you and your organization, please join us at this year’s magical SAS Global Forum, as I co-present with Charyn Faenza on SAS® Metadata Security 301: Auditing Your SAS Environment. Hold your horses… “301?,” Did I hear that right? “What about 101 and 201?" Glad your curious mind asked... At the last two SAS GLOBAL FORUM events, Charyn has presented SAS Metadata Security 101 and 201 papers that step through the fundamentals on authentication and authorization. Check them out at:

Our upcoming 301 paper will focus on auditing to complete the three ‘A’s (Authentication, Authorization and Auditing), including how you can use Metacoda software to regularly review your environment, so you can protect your resources, comply with security auditing requirements, and quickly and easily answer the question "Who has access to what?"

Here are the details for our paper:
Session Title: 786 -  SAS Metadata Security 301: Auditing your SAS Environment
Type: Breakout
Date: Tuesday, April 4
Time: 4:00 PM - 5:00 PM
Location: Dolphin, Dolphin Level III - Asia 4

Our security journey

sas-security-journey

Whether you’re a new SAS administrator or an experienced one, you’ll know that security is a journey rather than a destination.

To help make sure you’re on the right path, check out the SUGA virtual events, SAS administrator tagged blog posts, Twitter #sasadmin and platformadmin.com.

sas-security-journey02If you’d like to chat more about SAS security auditing, please comment below, join our chat in the SAS Global Forum community, or connect with us on Twitter at @HomesAtMetacoda, @CharynFaenza.

Looking forward to seeing you in April at SAS Global Forum 2017 in the enchanting and magical Walt Disney World Swan and Dolphin Resort, Orlando, Florida!


About Charyn Faenza

charynMs. Faenza is Vice President and Manager of Corporate Business Intelligence Systems for First National Bank, the largest subsidiary of F.N.B. Corporation (NYSE: FNB). An accountant by training, she is passionate about not only understanding the technology, but the underlying business utility of the systems her team supports. In her role she is responsible for the architecture and development of F.N.B.’s corporate profitability, stress testing, and analytics platforms and oversees the data collection and governance functions to ensure high data quality, proper data storage and transfer, risk management and data compliance.

Throughout her tenure at F.N.B. her experience in data integration and governance has been leveraged in several cross functional projects where she has been engaged as a strategic consultant regarding the design of systems and processes in the Finance, Treasury and Credit areas of the Bank.

Ms. Faenza earned her bachelor’s degree in Accounting from Youngstown State University where she is currently serving on the Business Advisory Board of the Youngstown State University Laricccia School of Accounting and Finance.

tags: papers & presentations, SAS Administrators, SAS Global Forum, SAS User Group for Administrators

Take a SAS security journey at SAS Global Forum 2017 was published on SAS Users.