Tech

2月 012017
 

SAS® Federation Server provides a central, virtual environment for administering and securing access to your data. It also allows you to combine data from multiple sources without moving or copying the data. SAS Federation Server Manager, a web-based application, is used to administer SAS Federation Server(s).

Data privacy is a major concern for organizations and one of the features of SAS Federation Server is it allows you to effectively and efficiently control access to your data, so you can limit who is able to view sensitive data such as credit card numbers, personal identification numbers, names, etc. In this three-part series, I will explore the topic of controlling data access using SAS Federation Server.

The series covers the following topics:

SAS Metadata Server is used to perform authentication for users and groups in SAS Federation Server and SAS Federation Server Manager is used to help control access to the data. Note: Permissions applied for particular data source cannot be bypassed with SAS Federation Server security. If permissions are denied at the source data, for example on a table, then users will always be denied access to that table, no matter what permissions are set in SAS Federation Server.

In this blog post, I build on the example in my previous post and demonstrate how you can use SAS Federation Server Manager to control access to columns and rows in tables and views.

Previously, I gave the Finance Users group access to the SALARY table. Robert is a member of the Finance Users group, so he has access to the SALARY table; however, I want to restrict his access to the IDNUM column on the table. To do this, first I view the SALARY table Authorizations in Federation Server Manager, then I select the arrow to the right of the table name to view its columns.

Next, I select the IDNUM column. I then add the user Robert and set his SELECT permission to Deny for the column.

Note: There are 5 columns on the SALARY table.
Since he was denied access to the IDNUM column, Robert is only able to view 4 out of 5 columns.

Susan is also a member of the Finance Users group, so she has access to the SALARY table; however, I want to restrict her access to only rows where the JOBCODE starts with a “Q.” To do this, first I view the SALARY table Authorizations in Federation Server Manager.

Next, I select the Row Authorizations tab and select New Filter. I use the SQL Clause Builder to build my condition of JOBCODE LIKE Q%.

Next, I select the Users and Groups tab and add Susan to restrict her access to the filter I just created.

Finally, I select OK to save the changes I made to Row Authorizations.

Susan is now only able to view the rows of the SALARY table where the JOBCODE begins with “Q.”

In this blog entry, I covered the second part of this series on Securing sensitive data using SAS Federation Server at the row and column level:

Part 1: Securing sensitive data using SAS Federation Server at the data source level
Part 2: Securing sensitive data using SAS Federation Server at the row and column level
Part 3: Securing sensitive data using SAS Federation Server data masking

More information on SAS Federation Server:

tags: SAS Administrators, SAS Federation Server, SAS Professional Services

Securing sensitive data using SAS Federation Server at the row and column level was published on SAS Users.

1月 302017
 

Recently, SAS shipped the fourth maintenance of SAS 9.4. Building on this foundation, SAS Studio reached a new milestone, its 3.6 release. All editions have been upgraded, including Personal, Basic and Enterprise. In this blog post, I want to highlight the new features that have been introduced. In subsequent posts I’ll discuss some of these features in more detail.

1  -  SAS Studio 3.6 includes many new features and enhancements, including:

2  -  new preferences to personalize even more of the SAS Studio user experience. In detail, it is now possible to:

  • control whether items in the navigation pane, such as libraries, files and folders, are automatically refreshed after running a program, task or query.

  • determine whether, at start up, SAS Studio attempts to restore the tabs that were open during the prior session, when it was last closed.

3  -  enhancements to the background submit feature (previously known as batch submit), with more control on the output and log files. SAS Studio 3.6 also enforces a new behavior: if the background SAS program is a FILE on the server and not an FTP reference, then the current working directory is automatically set to the directory where the code resides. This enables the use of relative paths in code to reference artifacts such as additional SAS code to include with “%include” statements (i.e. %include ./macros.sas), references to data files (i.e. libname data “.”;), or images to be included in ODS output.

4  -  ability to generate HTML graphs in the SVG format instead of the PNG format.

5  -  many new analytical tasks for power and sample size analysis, cluster analysis and network optimization.

Impressive new features to be sure, but that’s not all. Here’s a bonus feature that I personally find really interesting.

  • The navigation pane includes new categories, both in the code snippets section and in the task section, to streamline the integration between SAS 9.4 and SAS Viya. A new category of Viya Cloud Analytic Services code snippets helps you connect to SAS Viya and work with CAS tables. New Viya Machine Learning tasks enable you to run SAS code in a SAS Viya environment. You can do all this while working from your 9.4 environment.

tags: SAS Professional Services, sas studio

SAS Studio 3.6 new features was published on SAS Users.

1月 262017
 

SAS® Viya™ 3.1 represents the third generation of high performance computing from SAS. Our journey started a long time ago and, along the way, we have introduced a number of high performance technologies into the SAS software platform:

Introducing Cloud Analytic Services (CAS)

SAS Viya introduces Cloud Analytic Services (CAS) and continues this story of high performance computing.  CAS is the runtime engine and microservices environment for data management and analytics in SAS Viya and introduces some new and interesting innovations for customers. CAS is an in-memory technology and is designed for scale and speed. Whilst it can be set up on a single machine, it is more commonly deployed across a number of nodes in a cluster of computers for massively parallel processing (MPP). The parallelism is further increased when we consider using all the cores within each node of the cluster for multi-threaded, analytic workload execution. In a MPP environment, just because there are a number of nodes, it doesn’t mean that using all of them is always the most efficient for analytic processing. CAS maintains node-to-node communication in the cluster and uses an internal algorithm to determine the optimal distribution and number of nodes to run a given process.

However, processing in-memory can be expensive, so what happens if your data doesn’t fit into memory? Well CAS, has that covered. CAS will automatically spill data to disk in such a way that only the data that are required for processing are loaded into the memory of the system. The rest of the data are memory-mapped to the filesystem in an efficient way for loading into memory when required. This way of working means that CAS can handle data that are larger than the available memory that has been assigned.

The CAS in-memory engine is made up of a number of components - namely the CAS controller and, in an MPP distributed environment, CAS worker nodes. Depending on your deployment architecture and data sources, data can be read into CAS either in serial or parallel.

What about resilience to data loss if a node in an MPP cluster becomes unavailable? Well CAS has that covered too. CAS maintains a replicate of the data within the environment. The number of replicates can be configured but the default is to maintain one extra copy of the data within the environment. This is done efficiently by having the replicate data blocks cached to disk as opposed to consuming resident memory.

One of the most interesting developments with the introduction of CAS is the way that an end user can interact with SAS Viya. CAS actions are a new programming construct and with CAS, if you are a Python, Java, SAS or Lua developer you can communicate with CAS using an interactive computing environment such as a Jupyter Notebook. One of the benefits of this is that a Python developer, for example, can utilize SAS analytics on a high performance, in-memory distributed architecture, all from their Python programming interface. In addition, we have introduced open REST APIs which means you can call native CAS actions and submit code to the CAS server directly from a Web application or other programs written in any language that supports REST.

Whilst CAS represents the most recent step in our high performance journey, SAS Viya does not replace SAS 9. These two platforms can co-exist, even on the same hardware, and indeed can communicate with one another to leverage the full range of technology and innovations from SAS. To find out more about CAS, take a look at the early preview trial. Or, if you would like to explore the capabilities of SAS Viya with respect to your current environment and business objectives speak to your local SAS representative about arranging a ‘Path to SAS Viya workshop’ with SAS.

Many thanks to Fiona McNeill, Mark Schneider and Larry LaRusso for their input and review of this article.

 

tags: global te, Global Technology Practice, high-performance analytics, SAS Grid Manager, SAS Visual Analytics, SAS Visual Statistics, SAS Viya

A journey of SAS high performance was published on SAS Users.

1月 262017
 

Recently a colleague told me Google had published new, interesting data sets at BigQuery. I found a lot of Reddit data as well, so I quickly tried running BigQuery with these text data to see what I could produce.  After getting some pretty interesting results, I wanted to see if I could implement the same analysis with SAS and if using SAS Text Mining you would get deeper insights than simple queries. So, I tried SAS with Reddit comments data and I’d like to share my analyses and findings with you.

Analysis 1: Significant Words

To get started with BigQuery, I googled what others were sharing regarding BigQuery and Reddit, and I found USING BIGQUERY WITH REDDIT DATA. In this article the author posted a query statement about extracting significant words from Politics subreddit. I then wrote a SAS program to mimic this query and I got following data with the July of Reddit comments. The result is not completely same as the one from BigQuery, since I downloaded the Reddit data from another web site and used SAS Text Parsing action to parse the comments into tokens rather than just splitting tokens by white space.

Analysis 2: Daily Submissions

The words Trump and Hillary in the list raised my interest and begged for further analysis. So, I did a daily analysis to understand how hot Trump and Hillary were during this month. I filtered all comments mentioning Trump or Hillary under Politics subreddit and counted total submissions per day. The resulting time series plot is shown below.

I found several spikes in the plot, which happened on 2016/7/5, 2016/7/12, 2016/7/21, and 2016/7/26.

Analysis 3: Topics Time Line

I wondered what Reddit users were concerned about on these specific days, so I extracted the top 10 topics from all comments submitted in July, 2016 within Politics subreddit and got the following data. These topics obviously focused on several aspects, such as vote, president candidates, party, and hot news such as Hillary’s email probe.

The topics showed what people were concerned about in the whole month, but I need further investigation in order to explain which topic mostly contributed to the four spikes. The topics’ time series plot helped me find the answer.

Some topics’ time series trends are very close and it is hard to determine which topic contributed mostly, so I got the top contribution topic based on their daily percentage growth. The top growth topic on July 05 is “emails, dnc, +server, hillary, +classify”, which has 256.23 times of growth.

Its time series plot also shows a high spike on July 05. Then, I googled with “July 5, 2016 emails dnc server hillary classify” and I got following news.

There is no doubt the spike on July 05 is related to the FBI’s decision about Clinton’s email probe. In order to confirm this, I extracted the Top 20 Reddit comments submitted on July 05 according to its Reddit score. I quoted partial comment from the top one and I found the link in the comment was included in the Google’s search result.

"...under normal circumstances, security clearances would be revoked. " This is your FBI. EDIT: I took paraphrased quote, this is the actual quote as per https://www.fbi.gov/news/pressrel/press-releases/statement-by-fbi-director-james-b.-comey-on-the-investigation-of-secretary-hillary-clintons-use-of-a-personal-e-mail-system - "

Similar analysis was done on the other three days and the hot topics as follows.

Interestingly, one person did a sentiment analysis with Twitter data and the tweet submission trend of July looks the same as Reddit.

And in this blog, he listed several important events that happened in July.

  • July 5th: the FBI says it’s not going to end Clinton’s email probe and will not recommend prosecution.
  • July 12th: Bernie Sanders endorses Hillary Clinton for president.
  • July 21st: Donald Trump accepts the Republican nomination.
  • July 25-28: Clinton accepts nomination in the DNC.

It showcased that different social media data have similar response trends on the same events.

Now I know why these spikes happened. However, more questions came to my mind.

  • Who started posting these news?
  • Were there cyber armies?
  • Who were opinion leaders in the politics community?

I believe all these questions can be answered by analyzing the data with SAS.

tags: SAS R&D, SAS Text Analytics

Analyzing Trump v. Clinton text data at Reddit was published on SAS Users.

1月 212017
 

zhangEditor's note: This following post is from Xiaoyuan Zhang, presenter at an upcoming Insurance and Finance User Group (IFSUG) webinar.

Learn more about Xiaoyuan Zhang.


As a business user with limited statistical skills, I don’t think I could build a credit scorecard without the help of SAS Enterprise Miner. As you can see from the flow chart, SAS Enterprise Miner, a descriptive and predictive modeling software, does an amazing job in model developing and streamlining.

credit_score_modeling-in-sas-enterprise-minerThe flow chart presents my whole credit score modeling process, which is divided into three parts: creating the preliminary scorecard, performing reject inference, and building the final scorecard. I will cover the whole process in the Insurance and Finance Users Group (IFSUG) virtual session on Feb 3, 2017. In this blog I wanted to emphasize the second part, which is sometimes easy to ignore.

The data for preliminary scorecard is from only accepted loan applications. However, the scorecard modeler needs to apply the scorecard to all applicants, both accepted and rejected. To solve the sample bias problem reject inference is performed.

Before inferring the behavior (good or bad) of the rejected applicants, data examination is needed. I used StatExplore node to explore the data and found out that there were a significant number of missing values, which is problematic. Because in SAS Enterprise Miner regression model, the model that is used here for scorecard creation and reject inference, ignores observations that contain missing values, which reduces the size of the training data set. Less training data can substantially weaken the predictive power of the model.

To help with this problem, Impute Node is used to impute the missing values. In the Properties Panel of the node, there are a variety of choices from which the modeler could choose for the imputation. In this model, Tree surrogate is selected for class variables and Median is selected for interval variables.

However, in Impute Node data role is set as Train. In order to use the data in Reject Inference Node, data role needs to be changed into Score. A SAS Code node is put in between for this purpose, which writes as:

data &em_export_score;
      set &em_import_data;   
   run;

Last but not least, Reject Inference Node is used to infer the performance of the rejected loan applicant data. SAS Enterprise Miner offers three standard, industry-accepted methods for inferring the performance of the rejected applicant data by the use of a model that is built on the accepted applicants. We won’t explore the three methods in detail here, as the emphasis of the blog is on the process.

To hear more on this topic, please register for the IFSUG virtual session, Credit Score Modeling in SAS Enterprise Miner on February 3rd from 11am-12pm ET.


About Xiaoyuan Zhang

Xiaoyuan Zhang grew up in Zhaoyuan China on the coast of the Bohai sea. Her town is famous for its ancient gold mine, hot springs and its unusual and tasty seafood. Her undergraduate degree is from China Agricultural University in Bejing, where she majored in Marketing Intelligence and graduated with honors. She graduated, with honors, from Drexel University with a Master Degree in Finance. She has passed two CFA exams and learned Enterprise Miner in one of her courses. She specializes in efficient credit score modeling with unutilized SAS Enterprise Minor. She is using some of her post-graduation free time to study "regular SAS", to tutor and to volunteer.

 

tags: IFSUG, SAS Enterprise Miner

Credit score modeling in SAS Enterprise Miner: Reject inference to solve sample bias problem was published on SAS Users.

1月 202017
 

SAS/GRAPH 9.4 capabilitiesI remember my grandparents talking about how hard things were for them growing up. They would say, “Things were so bad that we had to walk uphill, both ways, in the freezing snow to get to school.” It was always hard for me to relate to these statements because the school bus picked me up at the end of my driveway. Fast forward to today and people are riding on hoverboards. Through the years, advancement in transportation has made it easier for us to get where we need to be.

SAS/GRAPH® Version 6

The evolution of SAS/GRAPH® is similar. In the earlier days of SAS® software, during Version 5 and 6 of SAS/GRAPH, I understood how difficult it was to create some of the graphs that customers wanted. A customer recently asked whether I could send him the code that produced the graph below, which he found in the SAS/GRAPH® Software: Reference, Volume 1 Version 6 Edition:

sasgraph-9-4-capabilities

In Version 6, the only way to create this graph, referred to as a butterfly chart, was by using SAS/GRAPH and the Annotate facility. The annotation statements added over 60 lines of code to the program.

Below is a snippet of the Version 6 program that created the bars on the left side of the graph.

Click this link to see the entire Version 6 program.

     /* female bars on left */
    %bar(39.8, 10.5, 25.0, 20.0, blue, 0, solid);
    %bar(39.8, 20.71, 15.0, 30.7, green, 0, solid);
    %bar(39.8, 31.42, 10.0, 41.42, red, 0, solid);
    %bar(39.8, 42.14, 32.0, 52.14, blue, 0, solid);
    %bar(39.8, 52.85, 33.0, 62.85, green, 0, solid);
    %bar(39.8, 63.57, 36.0, 73.57, red, 0, solid);
    %bar(39.8, 74.28, 35.0, 84.28, blue, 0, solid);
    %bar(39.8, 85.0, 33.0, 95.0, green, 0, solid);

SAS/GRAPH® 9.4

Fast forward to SAS® 9.4. ODS Graphics and SG procedures have been part of Base SAS® since SAS® 9.3, making it much easier to create high-quality graphs without using additional software. The graph that was previously created using the Annotate facility can now be created using the Graph Template Language (GTL) and the SGRENDER procedure. Here is the program to create the entire graph:

Note: The program below contains numbered annotations that correspond to a discussion below the program. So, if you copy and paste this program into SAS, be sure to delete the annotations before running this code.

data ratio;
  input Age $1-8 type $ Female Male;
datalines;
over 79  A 19 18
70-79    B 15 14
60-69    C 13 12
50-59    A 24 46
40-49    B 26 18
30-39    C 92 61
20-29    A 77 88
under 20 B 42 100
;
run;
 
proc template; 
   define statgraph population; ❶
   begingraph /border=false❷ datacolors=(green blue red); ❸
        entrytitle 'Population Tree' / textattrs=(size=15);❹
   entrytitle 'Distribution of Population by Sex';
   layout lattice ❺/ columns=2 ❻ columnweights=(.55 .45); ❼
      layout overlay ❽/ walldisplay=none y2axisopts=(reverse=true
                         tickvaluehalign=center 
                         display=(tickvalues))  
                         xaxisopts=(displaysecondary=(label) 
                         display=(tickvalues line) reverse=true 
                         labelattrs=(weight=bold));
         barchart category=age response=Female / group=type orient=horizontal 
                                                 yaxis=y2 barlabel=true;
      endlayout;
      layout overlay ❽/ walldisplay=none 
                         yaxisopts=(reverse=true display=none)
                         xaxisopts=(displaysecondary=(label)   
                         display=(tickvalues line)
                         labelattrs=(weight=bold)); 
         barchart category=age response=Male / group=type orient=horizontal 
                                               barlabel=true ;
      endlayout;
   endlayout;
   endgraph;
   end;
 
proc sgrender data=ratio template=population;
run;

As you can see, this program is much simpler than the one created using Version 6 above. Let’s take a closer look at the code.

❶ The TEMPLATE procedure creates a GTL definition called POPULATION with the DEFINE statement.

❷ In the BEGINGRAPH statement, the BORDER=FALSE option turns off the outside border.

❸ The DATACOLORS option defines the colors for the groups.

❹ The ENTRYTITLE statements define the titles for the graph.

❺ A LAYOUT LATTICE block serves as a wrapper for the LAYOUT OVERLAY statements.

❻ The COLUMNS option in the LAYOUT LATTICE block defines the layout of the cells.

❼ Because the graph in the left cell contains the bar values, the COLUMNWEIGHTS option allocates more room for this graph. The values for COLUMNWEIGHTS need to add up to 1.

❽ Two LAYOUT OVERLAY statements define the two cells in this graph.

LAYOUT OVERLAY Code

The two LAYOUT OVERLAY blocks are similar, so we will look closer at only the first one:

layout overlay / walldisplay=none ❶ 
                 y2axisopts=(reverse=true 
                 tickvaluehalign=center display=(tickvalues)) ❷ 
                 xaxisopts=(displaysecondary=(label)display=(tickvalues line)reverse=true ❺ 
                 labelattrs=(weight=bold));
   barchart category=age response=Female ❻/ group=type ❼
                           orient=horizontal ❽ yaxis=y2 ❾ barlabel=true; ❿
endlayout;

❶ The WALLDISPLAY=NONE option turns off the border around the graph.

❷ The REVERSE=TRUE option reverses the Y axis order, TICKVALUEHALIGN=CENTER centers the tick mark values, and the DISPLAY=TICKVALUES option displays only the tick mark values. These options are specified within Y2AXISOPTS.

❸ DISPLAYSECONDARY=(LABEL) displays the X axis label on the X2axis, at the top of the graph.

❹ On the X axis, at the bottom of the graph, the DISPLAY=(TICKVALUES LINE) option displays the tick mark values and the axis line.

❺ The REVERSE=TRUE option also reverses the X axis.

❻ The bar chart contains a bar for each Age. The length of the bars is based on the values of the variable Female with the CATEGORY=AGE and RESPONSE=FEMALE options in the BARCHART statement respectively.

❼ The group variable, TYPE, determines the color of the bars.

❽ The ORIENT=HORIZONTAL option in the BARCHART statement specifies that the bars are horizontal.

❾ YAXIS=Y2 specifies that the values are plotted against the right Y axis, Y2 axis.

❿ The BARLABEL=TRUE option provides labels for the bars.

Here is the graph that is created when you submit this program:

sasgraph-9-4-capabilities02

As this example shows, SAS graphing capabilities have improved over the years just as transportation options have progressed. Be sure to take advantage of these improvements to create helpful visualizations of your data! If you would like to create a similar graph with PROC SGPLOT, refer to Sanjay Matange’s blog post “Butterfly plots.”


Version 6 program

    /* set the graphics environment */
 goptions reset=global gunit=pct border
          ftext=swissb htitle=6 htext=3 dev=png;
 %annomac;
 
    /* create the Annotate data set, POPTREE */
 data poptree;
       /* length and type specification */
    %dclanno;
 
       /* set length of text variable   */
    length text $ 16;
 
 
       /* window percentage for x and y */
    %system(5, 5, 3);
 
       /* draw female axis lines */
    %move(5, 10);
    %draw(40, 10, red, 1, .5);
    %draw(40, 95, red, 1, .5);
 
       /* draw male axis lines */
    %move(56.1, 95);
    %draw(56.1, 10, red, 1, .5);
    %draw(95, 10, red, 1, .5);
 
       /* label categories */
    %label(75.0, 97.0, 'Male', green, 0, 0, 4, swissb, 5);
 
       /* at top */
    %label(25.0, 97.0, 'Female', green, 0, 0, 4, swissb, 5);
    %label(5.0, 5, '100', blue, 0, 0, 4, swissb, 5);
    %label(22.5, 5, ' 50', blue, 0, 0, 4, swissb, 5);
    %label(40.0, 5, ' 00', blue, 0, 0, 4, swissb, 5);
    %label(95.0, 5, '100', blue, 0, 0, 4, swissb, 5);
    %label(75.0, 5, ' 50', blue, 0, 0, 4, swissb, 5);
    %label(56.0, 5, ' 00', blue, 0, 0, 4, swissb, 5);
 
       /* label age */
    %label(48.0, 15.25, 'under 20', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 25.0, '20 - 29', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 36.7, '30 - 39', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 47.4, '40 - 49', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 57.8, '50 - 59', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 68.6, '60 - 69', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 79.3, '70 - 79', blue, 0, 0, 4, swissb, 5);
    %label(48.0, 90.0, 'over 79', blue, 0, 0, 4, swissb, 5);
 
       /* male bars on right */
    %bar(56.2, 10.5, 95.0, 20.0, blue, 0, solid);
    %bar(56.2, 20.71, 90.0, 30.71, green, 0, solid);
    %bar(56.2, 31.42, 80.0, 41.52, red, 0, solid);
    %bar(56.2, 42.14, 62.0, 52.14, blue, 0, solid);
    %bar(56.2, 52.85, 72.0, 62.85, green, 0, solid);
    %bar(56.2, 63.57, 60.0, 73.57, red, 0, solid);
    %bar(56.2, 74.28, 61.0, 84.28, blue, 0, solid);
    %bar(56.2, 85.0, 63.0, 95.0, green, 0, solid);
 
       /* label male bars on right */
    %label(95.0, 20.0, '100', black, 0, 0, 4, swissb, 7);
    %label(90.0, 30.71, '88', black, 0, 0, 4, swissb, 7);
    %label(80.0, 41.52, '61', black, 0, 0, 4, swissb, 7);
    %label(62.0, 52.14, '18', black, 0, 0, 4, swissb, 7);
    %label(72.0, 62.85, '46', black, 0, 0, 4, swissb, 7);
    %label(60.0, 73.57, '12', black, 0, 0, 4, swissb, 7);
    %label(61.0, 84.28, '14', black, 0, 0, 4, swissb, 7);
    %label(62.0, 95.0, '18', black, 0, 0, 4, swissb, 7);
 
       /* female bars on left */
    %bar(39.8, 10.5, 25.0, 20.0, blue, 0, solid);
    %bar(39.8, 20.71, 15.0, 30.7, green, 0, solid);
    %bar(39.8, 31.42, 10.0, 41.42, red, 0, solid);
    %bar(39.8, 42.14, 32.0, 52.14, blue, 0, solid);
    %bar(39.8, 52.85, 33.0, 62.85, green, 0, solid);
    %bar(39.8, 63.57, 36.0, 73.57, red, 0, solid);
    %bar(39.8, 74.28, 35.0, 84.28, blue, 0, solid);
    %bar(39.8, 85.0, 33.0, 95.0, green, 0, solid);
 
       /* label female bars on left */
    %label(25.0, 20.0, '42', black, 0, 0, 4, swissb, 9);
    %label(15.0, 30.7, '77', black, 0, 0, 4, swissb, 9);
    %label(10.0, 41.42, '92', black, 0, 0, 4, swissb, 9);
    %label(32.0, 52.14, '26', black, 0, 0, 4, swissb, 9);
    %label(33.0, 62.85, '24', black, 0, 0, 4, swissb, 9);
    %label(36.0, 73.57, '13', black, 0, 0, 4, swissb, 9);
    %label(35.0, 84.28, '15', black, 0, 0, 4, swissb, 9);
    %label(33.0, 95.0, '19', black, 0, 0, 4, swissb, 9);
 run;
 
    /* define the titles */
 title1 'Population Tree';
 title2 h=4 'Distribution of Population by Sex';
 
    /* generate annotated slide */
 proc gslide annotate=poptree;
 run;
 quit;
tags: Problem Solvers, SAS 9.4, SAS Programmers

Comparing SAS/GRAPH® 9.4 capabilities with SAS/GRAPH® Version 6 was published on SAS Users.

1月 172017
 

Editor's note: Charyn Faenza co-authored this blog. Learn more about Charyn.

As the fun of the festive season ends, the buzz of the new year and the enchantment of SAS Global Forum 2017 begins. SAS Global Forum is a conference designed by SAS users, for SAS users, bringing together SAS professionals from all over the world to learn, collaborate and network in person. Sure, online communication is great, but it’s hard to beat the thrill of meeting fellow SAS users face-to-face for the first time. It feels like magic! To help you prepare for the event, Charyn and I wanted to share a few things including information on metadata security. Read on for more.

Start your SAS Global Forum journey now!

SUGAWant to stay up to date with SAS Global Forum activities, and get a head start on your conference networking? Join the SAS Global Forum 2017 online community. Here you can post questions, share ideas, and connect with others before the event. While you are at it, the SAS User Group for Administrators (SUGA) community also feels magical for me.  As part of the committee, we regularly get together (virtually!) to discuss and plan exciting events on behalf of SAS administrators around the world.  Join the SUGA community and watch for upcoming events, including a live meet-up at SAS Global Forum! That event is scheduled for Monday, April 3, from 6:30-8:00 p.m.

Security auditing

During his workshop at SAS Global Forum 2014, Gregory Nelson pointed out that the SAS administrator role has evolved over the years, and so has one of their key responsibilities: security auditing. Once you’ve set up an initial security plan, how do you ensure that the environment remains secure? Can you just “set it and forget it?” Probably not. Especially if you want to ensure regulatory compliance, to maintain business confidence and keep your SAS platform in line with its design specifications as your business grows and your SAS environment evolves.

Thinking about your own SAS platform:

  • What would happen in your organization if someone accessed data they shouldn’t?
  • When was your last SAS platform security project?
  • When was it last tested? How extensive was it? How long did it take?
  • Have there been any changes since it was last tested? Whether they are deliberate, accidental, expected or unexpected.
  • How do you know if it’s still secure today?

Presenting at SAS Global Forum

If security is important to you and your organization, please join us at this year’s magical SAS Global Forum, as I co-present with Charyn Faenza on SAS® Metadata Security 301: Auditing Your SAS Environment. Hold your horses… “301?,” Did I hear that right? “What about 101 and 201?" Glad your curious mind asked... At the last two SAS GLOBAL FORUM events, Charyn has presented SAS Metadata Security 101 and 201 papers that step through the fundamentals on authentication and authorization. Check them out at:

Our upcoming 301 paper will focus on auditing to complete the three ‘A’s (Authentication, Authorization and Auditing), including how you can use Metacoda software to regularly review your environment, so you can protect your resources, comply with security auditing requirements, and quickly and easily answer the question "Who has access to what?"

Here are the details for our paper:
Session Title: 786 -  SAS Metadata Security 301: Auditing your SAS Environment
Type: Breakout
Date: Tuesday, April 4
Time: 4:00 PM - 5:00 PM
Location: Dolphin, Dolphin Level III - Asia 4

Our security journey

sas-security-journey

Whether you’re a new SAS administrator or an experienced one, you’ll know that security is a journey rather than a destination.

To help make sure you’re on the right path, check out the SUGA virtual events, SAS administrator tagged blog posts, Twitter #sasadmin and platformadmin.com.

sas-security-journey02If you’d like to chat more about SAS security auditing, please comment below, join our chat in the SAS Global Forum community, or connect with us on Twitter at @HomesAtMetacoda, @CharynFaenza.

Looking forward to seeing you in April at SAS Global Forum 2017 in the enchanting and magical Walt Disney World Swan and Dolphin Resort, Orlando, Florida!


About Charyn Faenza

charynMs. Faenza is Vice President and Manager of Corporate Business Intelligence Systems for First National Bank, the largest subsidiary of F.N.B. Corporation (NYSE: FNB). An accountant by training, she is passionate about not only understanding the technology, but the underlying business utility of the systems her team supports. In her role she is responsible for the architecture and development of F.N.B.’s corporate profitability, stress testing, and analytics platforms and oversees the data collection and governance functions to ensure high data quality, proper data storage and transfer, risk management and data compliance.

Throughout her tenure at F.N.B. her experience in data integration and governance has been leveraged in several cross functional projects where she has been engaged as a strategic consultant regarding the design of systems and processes in the Finance, Treasury and Credit areas of the Bank.

Ms. Faenza earned her bachelor’s degree in Accounting from Youngstown State University where she is currently serving on the Business Advisory Board of the Youngstown State University Laricccia School of Accounting and Finance.

tags: papers & presentations, SAS Administrators, SAS Global Forum, SAS User Group for Administrators

Take a SAS security journey at SAS Global Forum 2017 was published on SAS Users.

1月 122017
 

analytics resolutionsThe holiday season is over – and you survived. You’ve made a lot of personal resolutions for 2017 - go to the gym, eat less sugar, save more money, visit Grandma more often. These are all great personal resolutions for 2017, but what about your analytics resolutions? If you are having trouble with your analytics resolutions then let us help you out. The recent release of SAS 9.4 M4 will help you make 2017 your best analytics year yet.

Resolution 1: Build more accurate models faster!

Now you will be able to leverage the power of the two most advanced analytics platforms on the market, SAS 9 and SAS Viya from one interface. Using SAS/Connect, users can call powerful SAS Viya analytics from within a process flow in Enterprise Miner. Would you prefer to use the super-fast, autotuned gradient boosting in SAS Viya? No problem! Call SAS Viya analytics directly from Enterprise Miner using the SAS Viya Code node. Then, from the same process flow you can also call open source models, all from one interface, SAS Enterprise Miner. Do you prefer to use SAS Studio on SAS 9? You will also be able to call SAS Viya analytics from SAS Studio as well. With SAS 9 M4, SAS gives you the ability to use both of SAS’ powerful platforms from one interface.

Resolution 2: Score your unstructured models in Hadoop without moving your data!

Got Hadoop? Got a lot of unstructured data? Now SAS Contextual Analysis allows you to score models in Hadoop using the SAS Code Accelerator add-on. Identify new insights with your unstructured text without ever having to move your data. Score it all in Hadoop. Uncover new trends and topics buried in documents, emails, social media and other unstructured text that is stored in Hadoop. You will be able to do it faster because you won’t have to move that data outside of Hadoop. SAS just keeps getting better in 2017.

Resolution 3: Make better forecasts using the weather!

Through SAS/ETS, econometricians and others wanting to incorporate weather data into their models can now do so directly through two new interface engines. SASERAIN enables SAS users to retrieve weather data from the World Weather Online website. And SASENOAA provides access to severe weather data from the National Oceanic and Atmospheric Administration (NOAA) Severe Weather Data Inventory (SWDI) web service. So now you’ll know why there was that big sales spike for rock salt and snow shovels in July! Who says there is no climate change in 2017?

Resolution 4: Estimate causal effects more efficiently!

The new CAUSALTRT procedure in SAS/STAT estimates the average causal effect of a binary treatment variable T on a continuous or discrete outcome Y. Depending on the application, the variable T can represent an intervention (such as smoking cessation – which is a great 2017 resolution - versus control), an exposure to a condition (such as attending private versus public schools), or an existing characteristic of subjects (such as high versus low socioeconomic status). The CAUSALTRT procedure estimates two types of causal effects: the average treatment effect and the average treatment effect for the treated. And best of all, the causal inference methods that the CAUSALTRT procedure implements are designed primarily for use with data from nonrandomized trials or observational studies, where you observe T and Y without assigning subjects randomly to the treatment conditions.

Resolution 5: Design better factory floors!

A factory floor can be a complicated place, with raw materials coming in one side, and finished products going out the other. Options are virtually unlimited for the placement of materials and equipment – and a poorly designed layout can dramatically reduce production capability. Yet experimenting with different layouts would be extremely costly and time consuming. Thankfully, SAS Simulation Studio (a component of SAS/OR) provides a rich – and animated – environment for testing alternatives and coming up with the most appropriate design. And it can handle any kind of discrete-event simulation, integrating with JMP for experimental design and input analysis, and with JMP and SAS for source data and analysis of simulation results. How will your factory floor simulation impact your productivity in 2017?

tags: analytics, SAS 9.4, SAS Viya

Five great analytics resolutions for 2017 was published on SAS Users.

1月 062017
 

Regardless of how long they’ve used the software, there’s no better event for SAS professionals then SAS Global Forum. The event will attract thousands of users from across the globe and is an excellent place to network with and learn from users of all skill levels. To help those relatively new users of SAS experience the conference for the first time, the conference offers the Junior Professional Award program.

The program is designed exclusively for full-time SAS professionals who have used SAS on the job for three years or less, have never attended SAS Global Forum, and whose circumstances would otherwise keep them from attending. But, don’t let the word “junior” confuse you. All “new” SAS professionals regardless of age are eligible.

The Junior Professional award provides user with a waived conference registration fee, including conference meals, a free pre-conference tutorial, and great opportunities to learn from and network in a large community of SAS users. The program does not cover other costs associated with attending the event (travel and lodging are not included for example).

To apply, users need to submit fill out the online application form. Award applications must be received by January 16, 2017. Questions can be directed to the Junior Professional Program Coordinator, whose contact information can be found on the website.

To learn more about the award and its benefits, I recently sat down with one of the 2015 winners, Shavonne Standifer.


junior-professional-program

Shavonne Standifer, 2015 SAS Global Forum Junior Professional Award winner

Larry LaRusso: Hello Shavonne. First of all, let me congratulate you on winning a past award. That’s a great accomplishment, for sure. So tell me, how did you first learn about the program?
Shavonne Standifer: Interestingly, I wasn’t looking specifically for the award and didn’t even really know it existed. I was searching for a SAS proceeding paper and somehow stumbled across the application. I just applied, and got it!

LL:  That’s awesome. What made you want to attend SAS Global Forum?
SS: I knew a little bit about the event and really wanted to attend so that I could take advantage of the hands-on learning opportunities. I also thought it would be super cool if I could attend the lectures of my favorite SAS authors, and I knew many of them planned to present.

LL: What were your first impressions of the event?
SS: I was amazed by how many people were there. I was also amazed by how nice and helpful everyone was. I met so many new friends.

LL: What was the best part of your Global Forum experience?
SS: The best part of my experience by far was when I met John Amrhein. We met during a networking event in the Quad. After subjecting him to a 2-minute rant about how much I loved SAS software, and all of the reasons why, he finally had a minute to introduce himself and mentioned that he was the 2017 global forum conference chair. I was completely shocked! To my complete surprise, he encouraged me to be a part of his team, to which I later applied and was accepted.

LL: What are doing now? Are you using SAS?
SS: I currently use SAS software to provide data and statistical analysis that support the strategic business objectives of my organization. I am also a member of the conference planning team where I assist with the selection and delivery of Global Forum papers and volunteer coordination. Having the opportunity to be a part of this team has helped to increase my knowledge of SAS technologies and business trends. It’s been an incredible experience.

LL: How were you able to apply the knowledge you gained from the experience to what you’re doing now?
SS: Most definitely. I’ve used the learning from a tutorial Art Carpenter presented on Innovative SAS Techniques to help me utilize SAS more efficiently for data cleaning, scrubbing, and reshaping big datasets. The knowledge I gained has really helped improve project turnaround and provide more meaningful insights.

LL: Are you planning to attend SAS Global Forum again?
SS: Absolutely! In fact, I have returned every year since winning that award and plan to for many years to come. It’s just a great place to learn from and network with fellow SAS users.

LL: Any other comments you’d like to share about the award?
SS: I would encourage anyone who is eligible to consider applying for the award. I remember sitting in front of my laptop, hopeful, but thinking that I had a 1 in a million chance of being selected for the award. I decided to give it a try and it has changed my life! So much awesomeness has occurred in both my professional and personal life as a direct result of receiving the award. Professionally, the advice and mentorship from expert SAS users has helped me mature my SAS programming talents. Personally, the fellow JPP awardees that I’ve met along the way has provided an extended community of users whom I can call or email to ask advice. We keep in contact and support one another as needed, these relationships are invaluable. If you are eligible, Apply! It’s a great opportunity!

LL: Thanks Shavonne. Sounds like it was an awesome experience and I really enjoyed our time together.

tags: Junior Professional Program, SAS Global Forum

Junior Professional Program helps new users attend SAS Global Forum 2017 was published on SAS Users.

1月 042017
 

In my last blog, I showed you how to generate a word cloud of pdf collections. Word clouds show you which terms are mentioned by your documents and the frequency with which they occur in the documents. However, word clouds cannot lay out words from a semantic or linguistic perspective. In this blog I’d like to show you how we can overcome this constraint with new methods.

Word embedding has been widely used in Natural Language Processing, where words or phrases from the vocabulary are mapped to vectors of real numbers. There are several open source tools that can be used to build word embedding models. Two of the most popular tools are word2vec and Glove, and in my experiment I used Glove. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Suppose you have obtained the term frequencies from documents with SAS Text Miner and downloaded the word embedding model from http://nlp.stanford.edu/projects/glove/. Next you can extract vectors of terms using PROC SQL.

libname outlib 'D:temp';
* Rank terms according to frequencies;
proc sort data=outlib.abstract_stem_freq;
   by descending freq;
run;quit;

data ranking;
   set outlib.abstract_stem_freq;
   ranking=_n_;
run;

data glove;
   infile "d:tempglove_100d_tab.txt" dlm="09"x firstobs=2;
   input term :$100. vector1-vector100;
run;

proc sql;
   create table outlib.abstract_stem_vector as
   select glove.*, ranking, freq  
   from glove, ranking
   where glove.term = ranking.word;
quit;

Now you have term vectors, and there are two ways to project 100-dimension vector data in two dimensions. One is SAS PROC MDS (multiple dimensional scaling) and the other is T-SNE (t-Distributed Stochastic Neighbor Embedding). T-SNE is a machine learning algorithm for dimensionality reduction developed by Geoffrey Hinton and Laurens van der Maaten. It is a nonlinear dimensionality reduction technique that is particularly well-suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot.

Let’s try SAS PROC MDS first.  Before running PROC MDS, you need to run PROC DISTANCE to calculate distance or similarity of each pair of words. According to Glove website, the Euclidean distance (or cosine similarity) between two word vectors provides an effective method for measuring the linguistic or semantic similarity of the corresponding words. Sometimes, the nearest neighbors according to this metric reveal rare but relevant words that lie outside an average human's vocabulary. I used Euclidean distance in my experiment and showed the top 10 words on the scatter plot as seen in Figure-1.

word scatter plot with SAS

proc distance data=outlib.abstract_stem_vector method=EUCLID out=distances;
   var interval(vector1-vector100);
   id term;
run;quit;

ods graphics off;
proc mds data=distances level=absolute out=outdim;
   id term;
run;quit;

data top10;
   set outlib.abstract_stem_vector;
   if ranking le 10 then label=term;
   else label='';
   drop ranking freq;
run;

proc sql;
   create table mds_plot as
   select outdim.*, label
   from outdim
   left join top10
   on outdim.term = top10.term;
quit;

ods graphics on;
proc sgplot data = mds_plot;
   scatter x=dim1 y= dim2 
       / datalabel = label
         markerattrs=(symbol=plus size=5)
         datalabelattrs = (family="Arial" size=10pt color=red);
run;quit;

SAS does not have t-sne implementation, so I used PROC IML to call the RTSNE library. There are two libraries in R that can be used for t-sne plot: TSNE and RTSNE. RTSNE was acclaimed faster than TSNE. Just as I did with the SAS MDS plot, I showed the top 10 words only, but their font sizes are varied according to their frequencies in documents.
word-scatter-plot-with-sas02
data top10;
   length label $ 20;
   length color 3;
   length font_size 8;
   set outlib.abstract_stem_vector;
   if ranking le 10 then label=term;
   else label='+';
   if ranking le 10 then color=1;
   else color=2;
   font_size = max(freq/25, 0.5);
   drop ranking freq;
run;

proc iml;
   call ExportDataSetToR("top10", "vectors");

   submit /R;
      library(Rtsne)

      set.seed(1) # for reproducibility
      vector_matirx

If you compare the two figures, you may feel that MDS plot is more symmetric than T-SNE plot, but the T-SNE plot seems more reasonable from a linguistic/semantic perspective. In Colah’s blog, he did an exploration and comparison of various dimensionality reduction techniques with well-known computer vision dataset MNIST. I totally agree with his opinions below.

It’s easy to slip into a mindset of thinking one of these techniques is better than others, but I think they’re all complementary. There’s no way to map high-dimensional data into low dimensions and preserve all the structure. So, an approach must make trade-offs, sacrificing one property to preserve another. PCA tries to preserve linear structure, MDS tries to preserve global geometry, and t-SNE tries to preserve topology (neighborhood structure).

To learn more I would encourage you to read the following articles.

http://colah.github.io/posts/2014-10-Visualizing-MNIST/
http://colah.github.io/posts/2015-01-Visualizing-Representations
https://www.codeproject.com/Tips/788739/Visualization-of-High-Dimensional-Data-using-t-SNE

Word scatter plot with SAS was published on SAS Users.