Tech

7月 162016
 

ProblemSolversWhen you work out, you probably have a routine set of exercises that you do. But if you read health-and-fitness websites or talk to a personal trainer, you know that for optimal fitness, you should vary your workout routine. Not only does adding variety to your fitness regime help you prevent injuries, it also helps build muscles by working different parts of your body. For example, if you’re a runner, you could add weights or another resistance workout once or twice a week. If you are a weight lifter, you might add yoga or Pilates to your repertoire to enhance your personal fitness. In a similar way, it can be beneficial for you as a SAS programmer to vary your coding methods by trying new things.

SAS programmers very often use the TEMPLATE procedure to create style templates that use the CLASS and STYLE statements. I am going to help you bulk up your PROC TEMPLATE muscles by showing you how to use the COMPUTE AS and TRANSLATE INTO statements to alter the structure of tables via table definitions.

The COMPUTE AS statement helps you create or modify columns of data. You can use this statement to create a column from raw data (similar to what you can do with the REPORT procedure), or you can use it with an existing table template. The following example uses a PROC TEMPLATE step that illustrates the former. This example uses an existing data set (the CARS data set that is shipped in the SASHELP library) and computes an average-miles-per-gallon variable called MPG_AVG:

proc template;
    define table mytable; 
       column make model mpg_city mpg_highway mpg_avg;
       define mpg_avg;
         header="Avg MPG";
         compute as mean(mpg_city,mpg_highway);
         format=3.;
         style={font_weight=bold};
    end;
    define make;
       blank_dups=on;
    end;
  end;
run;
 
data _null_;
  set sashelp.cars(where=(cylinders=4 and drivetrain="All"));
  file print ods=(template='mytable');
  put _ods_;
run;

Here are partial results from the example code:

Using the COMPUTE AS and TRANSLATE INTO Statements

You’ll notice that I just could not resist sticking with part of my normal PROC TEMPLATE routine when I added a style change to the new column MPG_AVG. I also made the values in that column stand out by using bold font. In addition, because I like to prevent duplicates from appearing, I used the BLANK_DUPS=ON column attribute for the MAKE column.

If you want to try a cool new trick that is available in SAS® 9.4, try the following ODSTABLE procedure. This code does exactly what the previous example does, only with fewer keystrokes. It's similar to combining a strength and cardio move in your exercise routine!

proc odstable
   data=sashelp.cars(where=(cylinders=4 and drivetrain="All"));
   column make model mpg_city mpg_highway mpg_avg;
   define mpg_avg;
      header="Avg MPG";
      compute as mean(mpg_city,mpg_highway);
         format=3.;
         style={font_weight=bold};
   end;
   define make;
      blank_dups=on;
   end;
run;

Most procedures (except for PRINT, REPORT, and TABULATE) use an existing table template. You can determine the table template that a procedure uses by reviewing the log information that is generated when the ODS TRACE ON statement is in effect. For example, based on the log information from an ODS TRACE ON statement, we know that the following MEANS procedure code uses a table template called BASE.SUMMARY.

proc means data=sashelp.cars;
     class drivetrain;
     var mpg_city;
run;

If you submit this PROC MEANS code, you obtain the results shown below:

Using the COMPUTE AS and TRANSLATE INTO Statements02

In the following example, I use the COMPUTE AS statement to edit the MEAN column by multiplying the mean statistic by 10:

proc template;
   edit base.summary;
   edit mean 
      compute as (mean * 10);
         format=6.1;
      end;
   end;
run;

Here are the results that are generated when you run the PROC MEANS step a second time (with an explanatory footnote added):

Using the COMPUTE AS and TRANSLATE INTO Statements06

It is important to note that the default location where PROC TEMPLATE stores results is the SASUSER library. Therefore, subsequent PROC MEANS steps also multiply the mean by 10 unless you delete this table template. Just like a personal trainer recommends stretching after each workout in order to avoid sore muscles, I recommend the following PROC TEMPLATE step to delete the updated table template after the PROC MEANS step:

proc template;
   delete base.summary;
run;

The COMPUTE AS statement is useful when you need to create or edit an entire column. However, in many of the statistical procedures, the reported statistics are stacked. Technical Support has received requests from SAS programmers to make changes to just one of the statistics in a column. Therefore, when you want to change a single cell in a table, I recommend that you exercise the TRANSLATE INTO statement.

Consider the following code in which the Parameter Estimates object is requested from the RELIABILITY procedure:

ods select ParmEst;
 
proc reliability data=fan;
   distribution Weibull;
   pplot lifetime*censor( 1 ) / covb;
run;

Note: The DATA=FAN option in the code above links to the online documentation where you can find the DATA step that builds the data set FAN.

The code above displays the results shown in this table:

Using the COMPUTE AS and TRANSLATE INTO Statements04

In this table, the Estimate column displays the parameter estimates. If you need to modify just one set of parameters (for example, just the Extreme Value [EV] parameter estimates), you can use the TRANSLATE INTO statement to do the job.

The TRANSLATE INTO statement applies conditional logic to effectively change the formatting of the Estimate column only for the EV Location and EV Scale parameters. In the following example, the Estimate column for the EV Location parameter is displayed with the 10.5 format. In addition, the EV Scale parameter is displayed with a 10.6 format, while the remaining cells are displayed with the default format applied, which is specified as 10.4.

proc template;
     edit qc.reliability.ppest;
        edit estimate;
           header="Estimate";
		format=10.4;
           translate (Parameter = 'EV Location') into put(_val_, 10.5),
                   (Parameter = 'EV Scale') into put(_val_, 10.6);
      end;
     end;
run;

When you rerun the PROC RELIABILITY step that is shown above, you get these results:

Using the COMPUTE AS and TRANSLATE INTO Statements05

The changes that you make to your statistics can be much more dramatic. Just like with your workout, the more effort you put into it, the more defined your results!

The samples and SAS Notes found on support.sas.com focus on the DEFINE, STYLE, and CLASS statements in PROC TEMPLATE. These statements are fabulous tools in the PROC TEMPLATE routine that enhance your ODS results because they help change the style of your Excel, HTML, RTF, and PDF results.  By adding the COMPUTE AS and TRANSLATE INTO statements to your tools, you can round out your ODS routine to further enhance your ODS results. But if you find that you need the help of a personal trainer (that is, a Technical Support consultant), call Technical Support or send email to support@sas.com!

Are you ready to enhance your PROC TEMPLATE programming routine even more? If so, here are some additional resources:

tags: Problem Solvers, SAS Programmers

Build your PROC TEMPLATE muscles: Using the COMPUTE AS and TRANSLATE INTO statements was published on SAS Users.

7月 072016
 

The Distributed SAS 9.4 LASR Analytic Server provides a massively parallel processing solution for working very quickly with huge volumes of data. LASR was built from its earliest invocations to provide for incredible scalability and growth. One of the primary criteria which drives the considerations of scalability is load management. And LASR was built with a very specific principle to deal with managing load. LASR architecture assumes maximum efficiency is achieved when each node of the cluster is identical in terms of CPU, RAM, data to process, and other ancillary processes. LASR’s approach to load management gives external forces the ability to affect performance. So let’s look at some of the considerations we should all know about.

Load management in LASR is based on the assumption that all host machines are equal – that is each host has the same number and kind of CPUs, same RAM, same OS, and same ongoing workload as all the other hosts which are participating in the cluster. So with all those things set, there is only one item left to manage LASR’s workload: distribution of the data. With everything being equal on our hosts, then we want to distribute the data equally to each of the LASR Workers so that when they’re given an analytic job to perform, each of them can do the same tasks on the same amount of data and therefore finish in the same elapsed time.

Load Balancing == Even Data Distribution

When data is loaded into a Distributed LASR Analytic Server from SAS, the LASR Root Node accepts the incoming data and then distributes it evenly across all of the LASR Workers. Similarly for parallel data loading from a remote data provider, the SAS Embedded Process will evenly distribute blocks of data directly over to the LASR Workers as well.

LASR Load Balancing

Growing the LASR Cluster

With Massively Parallel Processing software, like the Distributed SAS LASR Analytic Server, when you need more performance or more capacity, then the expected approach is to add more machines to the cluster. Unlike traditional SMP systems where you might add in more RAM or more disk to an existing host so that it can do more on its own, with MPP we really prefer to distribute the load across as many cheap commodity-grade machines as possible. So rack up some new server hardware, install the necessary software, and then load in the data and get to work.

The MPP approach sounds simple and the SAS High-Performance Analytics Environment software is easy to deploy. But the real world doesn’t sit still and technology progresses quickly. If you initially setup your LASR cluster a year or so ago, and now you’re shopping for additional machines to add to your cluster, there’s a good chance that the vendor is selling new models which feature faster CPU and RAM than the old ones in your cluster.

LASR Load Balancing02

Is This a Problem?

No, it’s definitely not a problem. LASR will function just fine when hosted on machines with slightly different resources. Furthermore, you will certainly see an improvement in performance. In the illustrations above, we’ve taken our data from being distributed across only four LASR Workers and upgraded the cluster so that the data is now processed across six machine hosts. Therefore you can expect almost 50% improvement in LASR processing time (not total response time – we still have network i/o, middle-tier communications, browser rendering, and more to deal with).

You’re probably wondering why the improvement is “only” 50%. After all, the two new machines (W5 and W6) have faster CPUs than the other older four machines (W1 – W4). They should be able to knock out their specific tasks faster, right? And indeed, they do. But it’s not the new machines’ performance we need to look at – the old ones are setting the pace.

This comes back to LASR’s assumption that all machines hosting the LASR Workers are identical in terms of resources and performance. When dealing with future cluster expansion, try to keep the new machines as close as possible in terms of performance and capacity to the old machines simply because you don’t want to spend money unnecessarily for capacity which might be underutilized.

The Crux of the Matter

When LASR is distributing data to its Workers, there is no accommodation made for machines which have more CPU or RAM (or any other resources) than their cluster comrades. Each gets an equal portion of data simply based on dividing the data up among the number of machines in the cluster. Since each machine in the LASR cluster is operating on the same amount of data, machines which can crunch the numbers quicker (e.g. due to faster CPU or RAM) will finish their tasks sooner. However, the final answer is only available once the LASR Root Node has the responses from all of the Workers, so it must wait for the older/slower ones to finish as well.

Unbalanced Data Does Happen

Machine resources aren’t the only thing which can impact overall performance of LASR. On the flip side of the same coin is the amount of data being worked on. It is possible for data to get loaded in such a way that some LASR Workers will actually have more data to work on than others. That means that those with more data will need more time to complete their jobs.

For example, the SAS Embedded Process which provides SAS In-Database functionality has the ability to load data directly into LASR Workers. And the SAS EP attempts to distribute that data evenly. But at what level? In a Hadoop environment, the EP will distribute blocks of data, not rows, evenly across the LASR Workers. Depending on the size and number of blocks, it’s possible that some Workers will receive one more block than others will. This is unlikely to cause any notable performance lag for end-users, however a sysadmin monitoring low-level system performance might pick up on it.  It’s not a big deal, but it can be helpful to be aware of.

Another Hadoop-based example where unbalanced data might occur is with SASHDAT tables. SASHDAT is stored as blocks in HDFS which the LASR Workers read and write themselves directly to disk. Hadoop is responsible for managing those blocks once they’re committed to disk. And if an HDFS Data Node fails or if new HDFS Data Nodes (along with associated new LASR Workers) are added to the cluster after the SASHDAT file has been written out, then Hadoop might move blocks around in line with its block replication and disk utilization schemes. Later on, when LASR attempts to read those SASHDAT blocks back into memory, some Workers might end up with more blocks of data than others.

LASR Load Balancing03

In Figure 3, we can see that one host machine has gone offline. HDFS data is still available due to block replication. However, we can see that one LASR Worker must load up two blocks whereas the rest each get one. The result will be that LASR Worker will need twice as long to perform assigned tasks. Hence, LASR Root must wait twice as long to compile its final answer. Keep in mind that this example is contrived to make a point. In the real world, data volume, block sizes and distribution, as well as other factors will determine the impact on actual performance.

For more information about the HDFS block replication and SASHDAT, see the whitepaper The Impact of Hadoop Resiliency on SAS® LASR™ Analytic Server presented at SAS Global Forum 2015.

Balancing the Data

LASR offers builtin functionality whereby you can re-balance the data which resides in memory.
LASR Load Balancing04

Here’s some sample code which shows how to promote, balance, and save your LASR in-memory data:

/* Promote, Balance, and Save */
/* Setup a SASIOLA libref that specifies the LASR server */
/* where the unbalanced table resides */
libname example sasiola host="root.lasr.com" port=10010 tag='hps';

/* Use the data management capabilities of the IMSTAT */
/* procedure */
proc imstat immediate;

/* Specify the unbalanced table */
table example.the_table;
/* Print out the distribution stats */
distributioninfo;

/* Perform the balance – each node will be +/- 1 row */
/* A new temporary table (balanced!) is created */
balance;

/* Drop the original unbalanced table */
droptable;

/* Now reference the newly balanced temporary table */
table example.&_templast_;
/* Promote the temporary table to active status with the */
/* original table’s name */
promote the_table;

/* Now reference the new table by its permanent name */
table example.the_table;
/* Print out the distribution stats for confirmation */
/* of balance */
distributioninfo;

/* Save the LASR table back down to SASHDAT on HDFS */
/* and replace the old, unbalanced table there */
save path="/path/in/hdfs" replace;
quit;

Extending What We Know

The concepts of resource utilization and data balancing don’t only apply to LASR. Many similar challenges apply to the SAS High-Performance Analytics software (such as HP STAT) as well because it’s running on the same software foundation as LASR which is provided by the SAS High-Performance Analytics Environment software.

 

tags: LASR, load balancing, SAS Professional Services

About LASR Load Balancing was published on SAS Users.

7月 062016
 

Disclaimer: before you get overly excited, PROC EXPAT is not really an actual SAS procedure. Sadly, it will not transfer or translate your code based on location. But it does represent SAS’ expansion of the Customer Contact Center, and that’s good news for our users. Here’s the story behind my made-up proc.

My mission

“Buon giorno!” “Guten Tag!” “Bonjour!” Excitement is in the air, the team buzzes. I’m not at an international airport, I’m at the new SAS office in Dublin, Ireland. I’d been given a one-month assignment to help expand operations, providing training in the Customer Contact Center across channels to deliver exceptional customer support and create an enhanced customer experience around the globe. It was such a rewarding experience!

SAS is a global company with customers in 148 countries, at more than 80,000 sites. The EXPAT Procedure is what I’ve coined my month-long adventure in Dublin, training and supporting our newly expanded Customer Contact Center team. So, what does this mean for you? It means additional customer care and expanded hours for all your inquiries and requests. Win!

Bringing expanded customer service to Europe, Middle East and Africa

The expansion was announced last fall, when SAS revealed plans to open a new Inside Sales and Customer Contact Center in Dublin—an investment of around €40 million with a projected 150 new jobs to be created—to provide support across Europe, Middle East and Africa (EMEA).

The new office models the US Customer Contact Center (and this is where I come in), providing support for customers in their channel of choice—be it social media, Live Chat, phone, email and/or web inquiries. We field general questions about SAS software, training, certifications or resources, as well as specific issues, like errors in your SAS log. The Customer Contact Center is here to assist, and now our customers in EMEA can benefit from the added support as well.

And we’re not just answering inquiries, we’re listening to our customers. We’re always looking at ways to make things easier to navigate, simpler to find, and faster to share. And we love customer feedback, whether direct or indirect, to enhance your experience with SAS.

The new team in Dublin is comprised of multi-lingual individuals with loads of experience in the tech industry. They have begun covering the United Kingdom, Ireland and Italy and it’s been amazing working with such a knowledgeable, patient and fun team with a great sense of humor. I think you’ll like them, too.

While I’ve been assisting with training the team on everything SAS, I’ve gotten a little training myself, working in a new office in a different country, surrounded by colleagues from more than 15 countries across the pond. A reminder of the wide reach of SAS, impact of Big Data analytics, and importance of our worldwide SAS users.

It’s an exciting time for the Customer Contact Center, SAS and our customers. If you’re located in EMEA, don’t hesitate to reach out to us!

tags: facebook, linkedin, sas customer contact center, social media, twitter

PROC EXPAT – Expanding SAS’ global customer service was published on SAS Users.

6月 302016
 

SAS Global ForumI look forward to SAS Global Forum each year and this past conference ranked up that as one of the best I've ever attended. This year there were so many wonderful presentations, Super Demos and workshops on the topic of administration of SAS and the underlying hardware infrastructure needed for SAS applications. Below you'll find a list of the top 15 presentations based on attendance. We hope you find these papers useful.

10360 - Nine Frequently Asked Questions about Getting Started with SAS® Visual Analytics
2440 - Change Management: The Secret to a Successful SAS® Implementation
8220 - Optimizing SAS® on Red Hat Enterprise Linux (RHEL) 6 and 7
SAS6840 - Reeling Them Back In Keeping SAS® Visual Analytics Users Happy, Behind the Scenes
SAS2820 - Helpful Hints for Transitioning to SAS® 9.4
SAS6761 - Best Practices for Configuring Your I/O Subsystem for SAS®9 Applications
10861 - Best Practices in Connecting External Databases to SAS®
SAS6280 - Hands-On Workshop: SAS® Environment Manager
SAS5680 - Advanced Topics in SAS® Environment Manager
9680 - SAS Big Brother
SAS4240 - Creating a Strong Business Case for SAS® Grid Manager: Translating Grid Computing Benefits to Business Benefits
10962 - SAS® Metadata Security 201: Security Basics for a New SAS Administrator
8860 - SAS® Metadata Security 101: A Primer for SAS Administrators and Users Not Familiar with SAS
9920 - UCF SAS® Visual Analytics: Implementation, Usage, and Performance
11202 - Let the Schedule Manager Take Care of Scheduling Jobs in SAS® Management Console 9.4

Note: It's hard to believe, but we're already thinking about topics to present at SAS Global Forum 2017. If you have suggestions for SAS Administration paper topics you'd like to see at next year's conference, please share your thoughts in the comments below or contact me directly. 

tags: papers & presentations, SAS Administrators, SAS Global Forum

15 top Global Forum 2016 papers for SAS administrators was published on SAS Users.

6月 292016
 

I recently read an article in which the winner of a Kaggle Competition was not shy about sharing his technique for winning not one, but several of the analytical competitions.

“I always use Gradient Boosting,” he said. And then added, “but the key is Feature Engineering.”

A couple days later, a friend who read the same article called and asked, “What is this Feature Engineering that he’s talking about?”

It was a timely question, as I was in the process of developing a risk model for a client, and specifically, I was working through the stage of Feature Engineering.

The act, or I should say the “art” of Feature Engineering takes time – in fact, it takes much more time than building the actual model, and because many clients desire a quick return regarding analytics, I was giving some thought to explaining the value that I perceived in the effort being put forth regarding the engineering of features from their data.

But first, I needed to respond to my friend’s question, and I began by describing the environments in which we typically work; “the companies we work with are not void of analytical talent, in fact some institutions have dozens of highly qualified statisticians with many years of experience in their respective fields. Still, they may not be getting the results from their models they are expecting.”

I was quickly asked, “So, how do you get better results? Neural Nets? Random Forests?”

“It’s not the method that sets my results apart from theirs,” I answered. “In fact, the success of my modeling projects is generally solidified before I begin considering ‘which’ method.” I paused and pondered how to continue, “Before I consider methods, I simply derive more information from their data than they do, and that is Feature Engineering.”

Feature Engineering

Feature Engineering is a vital component in the process of attaining extraordinary Machine Learning results, but it doesn’t merit as many intriguing conversations, papers, books or talks at conferences. Regardless, I know the successes due my projects that are typically attributed to the application of Machine Learning techniques is actually due effective Feature Engineering that not only improves the model, but also creates information that the clients can more readily understand.

How did I justify the time spent “Feature Engineering” to the client mentioned above? First, I developed a model with PROC LOGISTIC using only the primary data given to me and called it “Model 1.”

Model 1 was then compared to “Model 2,” a model built using the primary data plus the newly developed features. Examples of those new features that proved significant were those constructed as follows:

  • Ratios of various events relative to time.
  • Continuous variables dimensionally reduced into categories.
  • Simple counts of events.

And then I proactively promoted the findings!

I often use the OUTROC option to output the necessary components such that I can create a chart of ROC curves using PROC SGPLOT. An example of this process can be viewed in Graph 1 where the value of Feature Engineering is clearly evident in the chart detailing the curves from both models: the c-statistic improved from 0.70 to 0.90.

Great machine learning starts with resourceful feature engineering

However, there is still a story to be shared as improvement is not only represented as a stronger c-statistic, but rather the in the actual classification of default. For example:

  • Model 1 correctly predicts default in 20% of accounts categorized as “most likely to default,” but it misses on 80%!
  • Model 2 correctly predicts default in 86% of those accounts it categorizes as most likely to default, but misses on only 14%.

How did this happen?

While we could drill down into a discussion of significant factors to seek an in-depth explanation of the results, a more general observation is simply this: Model 2 accomplishes its mission by correctly reclassifying a substantial portion of the population as “low risk” and leaves those with default concerns to be assessed with greater consideration.

Essentially, the default model is vastly improved by better discerning those customers who we need not worry about leaving an uncluttered pool of customers that require attention (see Graph 2).

FeatureEngineering02

Instead of assessing the default potential for 900,000 accounts, the model instead deals with 100,000 “risky” accounts. It essentially improves its chances of success by creating a more “easy to hit” target, and the result (see Graph 3) is that when it categorizes an account as “high risk” it means it!

FeatureEngineering03

Thanks to Feature Engineering, the business now possess very actionable information.

How Do I Engineer the Features Necessary to Attain Such Results?

Working from the SAS Data Step and select SAS Procedures, I work through a process in which I attempt to create useful features with the following techniques and methods:

  • Calculate statistics like the minimums, maximums, averages, medians and ranges thinking that extremes (or the lack) might help define interesting behaviors.
  • Count occurrences of events considering that I might highlight statistically interesting habitual behaviors.
  • Develop ratios seeking to add predictive value to already analytically powerful variables as well as to variables that might have previously lacked statistical vigor.
  • Develop quintiles across variables of interest seeking to create expressive segments of the population while also dealing with extreme values.
  • Apply dimensionality reduction techniques, ranks, clustering etc. expecting that grouping those with similar behaviors will be statistically beneficial.
  • Consider the element of time as an important interaction with any feature that has been developed.
  • Use regression to identify trends in continuous variables thinking that moves up or down (whether fast or slow) will be interesting.

As with the guy who won the Kaggle competitions, I’ve now revealed some of my secret weapons regarding the building of successful models.

What’s your secret weapon?

 

 

tags: machine learning

Great machine learning starts with resourceful feature engineering was published on SAS Users.

6月 282016
 

SAS CaresIt’s no secret: here at SAS, we care about our customers. In fact, we care so much that we developed a program with that exact name: our SAS Cares program is here to help you with all things SAS!

We know you may be so awesome that you never need us, but even excellent people need help sometimes (I should know). If you find yourself in a tight SAS spot, have a burning SAS question, or even just want to write some SAS poetry, we’re here to help. Our SAS Cares program is extensive and can help you with any problem, question, or comment you’d like to share with us. We make it easy for you to connect with us, no matter what you are looking for, or what channel you are on. You can find information about all the services available from the SAS Cares Program in this blog from Maggie Miller.

A little bird (tweet, tweet!) told me you’d like to know more about one of the main ways SAS connects with our customers (which also happens to be my personal favorite!), through our social customer service channels.

SAS’ social properties are excellent places for you to ask questions, make suggestions, interact with other users, and give us feedback.  We are currently on Twitter, Facebook, LinkedIn, and YouTube, constantly monitoring these channels, looking for your questions and opportunities to support you. And, because we’re only a click away, we’d like you to consider us your first stop for pretty much anything about SAS. Most of the time, we’ll be able to give you exactly what you need, and in the rare instances we don’t have the answer, we’ll point you to resources that can help you find it.

For example, if you have questions about using SAS or SAS programming, we might send you to our SAS Support Communities. Or if you are getting program errors, or have installation questions, we can help you find the right Product Documentation and SAS Notes, or, if needed, open a Technical Support track for you. If you are not sure where to start, start with us. Our social channels are perfect for real-time interaction. We are here for you, and thrilled to help!

Here’s how you can connect with us via social media:

  • Twitter: @SAS_Cares
    Our primary social customer service channel, ready to handle anything SAS related.
    Reply, mention, and direct message us. Follow us for helpful resources and links for #SASusers!
  • Facebook: SAS Software and SAS Analytics U
    We post the latest SAS news and helpful tips. Like, Follow, Comment, Message, and post to our SAS pages.
  • LinkedIn: SAS
    Primary professional landing spot for SAS users, followed by nearly 300,000 SAS professionals. Follow us and comment on our posts.
  • YouTube: SAS Software
    Subscribe, view, and comment on our videos. Many of them teach specific SAS concepts and techniques, and will help you become a more effective and efficient SAS user.

So give us a try: reach out to us when you need SAS help, have some SAS feedback, or just want to share… because who cares? SAS Cares!

tags: sas cares, sas customer contact center

SAS Cares: help just a tweet, post, or click away was published on SAS Users.

6月 242016
 

ProblemSolversXML has become one of the major standards for moving data across the Internet. Some of XML’s strengths are the abilities to better describe data and to be more extensible than any of its predecessors such as CSV. Due to the increased popularity of XML for moving data, I provide a few tips in this article that will help when you need to read XML files into SAS software.

Reading XML Files

You can read XML files into SAS using either the XML engine or the XMLV2 engine. The XML engine was the first engine that SAS created to read XML files. The XMLV2 engine includes new functionality and enhancements and is aliased to the XML92 engine in SAS® 9.2.

It is easy to read XML files using the XMLV2 engine when the XML file that you read uses the GENERIC markup type and conforms to a very rectangular definition. Here is an example of the XML file layout that is required to be read natively using the XMLV2 engine:

If the file is not in this format, the following informative error message is generated in the SAS log:

reading XML files into SAS® software

If the file is not in this format, the following informative error message is generated in the SAS log:

Reading XML files into SAS Software02

The XMLMap file referenced in the error message is a SAS specific file that instructs the engine about how to create rows, columns, and other information from the XML markup. You can use either of the following methods to generate the XMLMap file:

  • the AUTOMAP= option within the SAS session beginning with SAS® 9.3 TS1M2
  • the SAS XML Mapper, which is a stand-alone Java application

Generating Dynamic Maps within the SAS Session

Using the XMLV2 engine, you can create a dynamic XMLMap file and use it to generate SAS data sets by specifying the AUTOMAP= and the XMLMAP= options in the LIBNAME statement. The AUTOMAP= option has two possible values: REPLACE and REUSE. REPLACE updates an existing XMLMap file, whereas REUSE uses an existing XMLMap file. Both values create a new XMLMap file if one does not currently exist. When the file is automapped, one or more data sets is created. This method creates a representation that is as relational as possible based on the XML markup. In addition, it generates surrogate keys that enable you to combine the data using the SQL procedure or the DATA step. The following example demonstrates this method.

Here is the XML data that is read:

Reading XML files into SAS Software03

Here is the SAS session code:

filename datafile 'c:teststudents2.xml';
filename mapfile "c:teststudents2.map";
libname datafile xmlv2 xmlmap=mapfile automap=replace;

proc copy in=datafile out=work;
run;

Here is the output:

Reading XML files into SAS Software04

For more information about using the XMLV2 engine, refer to the “SAS® 9.4 XMLV2 LIBNAME Engine Tip Sheet.”

Using SAS XML Mapper

If you answer “yes” to the following questions, you should use SAS XML Mapper to generate the XMLMap file rather than generating this file dynamically within the SAS session:

  • Do you need to generate an XMLMap from an XML schema file?
  • Do you need to generate a custom XMLMap file or map only a portion of the file?
  • Do you need to view the data to be imported before reading the XML file?
  • Do you need to add specific metadata such as data types, informats/formats, column widths, or names?
  • Do you want to generate sample code for the import of the XML file?

The following image shows SAS XML Mapper once you click the icon that runs the automapping feature, which enables you to quickly generate tables from the XML markup:

Reading XML files into SAS Software05

Using SAS XML Mapper, you can also generate a custom XMLMap file.

For more information, refer to the following video tutorials about creating XMLMap files:

I hope this article helps you easily read XML files into SAS regardless of the file structure. This information should also help you determine the appropriate method to use when generating an XMLMap file.

tags: SAS Problem Solvers, SAS Programmers

Tips for reading XML files into SAS® software was published on SAS Users.

6月 222016
 

Report designers often discover after aggregating data by groups in the Visual Analytics Designer that it would also be nice to see additional aggregations of the data, for example, a maximum or minimum of that sum across groups. This means creating an ‘aggregation of an aggregation.’ If you plan your report objectives in advance of loading your data, you can do this by creating an aggregation initially in the Visual Data Builder, rather than the Designer.  And in many cases, it’s possible to get the desired report results by simply loading a small additional table to memory.

Here’s an example of a report with a sum aggregation on the profit measure and Facility Region as the category data item. The report also shows the maximum, minimum, and average of the regional profit values, along with the difference between each region’s profit sum and the average. It’s possible to assign a SUM aggregation to the Profit measure, but the challenge appears when trying then to create a MAX aggregation across the regions.

Visual Data Builder to pre-aggregate your data01

If you have awareness of the need for this type of report prior to loading the data, you can easily load a much smaller version of the data with the Profit data item already pre-aggregated. The data builder query, shown below, creates and loads a file containing only the FacilityRegion as a GROUP BY column and the Profit column as a SUM aggregation.

Visual Data Builder to pre-aggregate your data02

Visual Data Builder to pre-aggregate your data03

In the designer, for reporting purposes, the Profit_SUM data item is renamed to Sum of Profit for Region.

Visual Data Builder to pre-aggregate your data04

Now the Sum of Profit for Region data item can be used to calculate these aggregated measures:  Max Regional Profit Sum, Min Regional Profit Sum, and Avg Regional Profit Sum, using the formulas shown below:

Visual Data Builder to pre-aggregate your data05

Once you have the Avg Regional Profit Sum calculation, you can create the additional aggregated measure, Regional Profit – Average:

Visual Data Builder to pre-aggregate your data06

So, this is a list table displaying all of the information that was derived from the two columns and four rows of data.

Visual Data Builder to pre-aggregate your data07

To emphasize the region with the maximum profit and minimum profit, you may want to add the following display rules.

Visual Data Builder to pre-aggregate your data08

And since you have the original larger table loaded as well, you can add an additional data source and some interactions, so that the user can click on a row to view more detail data corresponding to each region of interest.

Visual Data Builder to pre-aggregate your data09

Now, for example, the user can see what facilities contributed to the max value for the Western region, and the corresponding Expenses as well!

Visual Data Builder to pre-aggregate your data10

So, although it may not be practical to pre-aggregate the values of every measure in your data, sometimes pre-aggregating and storing the much smaller table can give you the benefit of being able to get the additional aggregations that you need for a specialized report, while relying on a larger stored table for the details.

tags: SAS Professional Services, SAS Programmers, SAS Visual Analytics

Need an aggregation of an aggregation? Use Visual Data Builder to pre-aggregate your data was published on SAS Users.

6月 142016
 

As an addendum to my previous two blogs on using the SAS Environment Manager Report Center, this blog illustrates further tips and tricks you can use to help make the creation of your custom reports easier.

The Ad-Hoc Reporting section of the Report Center is specifically designed to provide a “testing ground” for reports you may want to try. It can be your most useful tool in designing and creating the reports you want. Notice that the first four “reports” are most useful if you have a good idea of what content you want your report to have; they present  specific report content, and it’s easy to see what type of data these reports would  contain, just based on the titles or names shown.

Auditing using the SAS Environment Manager Report Center

Selecting any of these four reports will generate a list of all possible reports in that category. From here, you can select the one(s) you want. Here are a few screenshots to illustrate:

APM Metadata Audit: If you make this selection in the Ad Hoc Reporting section, you will be presented with a choice of all reports in that category, under the Report Name field dropdown list in the right-hand panel. Notice that most of these report selections correspond to finished reports already created in the main report section, marked on the left below. If you just run this Ad Hoc report with all the defaults, you’ll get the same report as the one listed in the Nightly Reports >> Audit Reports section. However, you can test some additional parameters using the Ad Hoc version’s menu shown.

Auditing Using the SAS Environment Manager Report Center02

You will find similar choices for all four report groups in that section of the Ad Hoc Reporting. Thus, you have a way to tinker with many of the other standard reports, and try various modifications to them. For example, if you are interested in any of the “ARM Performance Reports,” you can open the ARM Performance section in the Ad Hoc Reporting section to get a set of reports that (mostly) mirror those reports:Auditing Using the SAS Environment Manager Report Center03

This allows you to explore different parameter options beyond what’s provided by the standard reports, such as chart type, number of time bins, and subgrouping time periods.

A different approach is to explore your data using any of the remaining Ad Hoc reports that are organized around the type of output plot you want. As discussed in the previous blog, the name of the Ad Hoc report shows the type of plot generated based on the SAS PROC name. These reports allow you to select a dataset to explore from any of the following three built-in DataMart libraries:  ACM, ARTIFACT, or KITS.  The Ad Hoc reports that can be used in this way are circled below:

Auditing Using the SAS Environment Manager Report Center04

For example, if I want a basic chart, and I want to use the dataset ARTIFACT.AUDITACCESSC, then I can select the Ad Hoc report called “evChart TimeSeq Explorer,” choose my data set from the dropdown list, and then try varying any parameters provided in the interface to explore and come up with many variations and charts based on those two choices. Notice that with this group of Ad Hoc reports, there are more parameter choices than with other reports:

You can select many reporting options, such as multiple variables, type of chart, class variables, break variables, subgroupings, statistics, as well as the more standard subsetting and output format options. You can generate literally hundreds of different reports using varying parameter selections using these Ad Hoc Reports.

As I mentioned in a previous blog in this series, if you want to create a custom report for production that you will run many times, you will want to make a permanent, new copy of the stored process being called, with the correct parameters specified on the reporting macro being called.  I outlined the steps to do this in the previous blog: Creating Custom Reports in the Report Center.

The way to do this is to model your report on an existing report, then make modifications to it. You will want to know the following about the report you are using as a model:

  • What stored process is being used?
  • What reporting macro(s) are being called?
  • What parameters are being specified to the reporting macro?
  • What data set from the Data Mart is being used?

The best way to answer these questions is by using the debugging facility that is built into the report center. All reports provide the following debugging options:

  • Debug Level
  • Session Information: Log
  • Macro Invocation Parameters

Auditing Using the SAS Environment Manager Report Center05

By browsing the SAS log and checking the called reporting macro and its parameters, you can quickly answer the four questions listed above.

If you know what data set you want to use but need to explore that data set further, there are two approaches:

  1. Try using the Ad-Hoc Reporting section as described above to explore the data sets.
  2. Use another data browsing tool–I’ve found that SAS Studio is perfect for this task.

When the Service Architecture Framework is initialized, the following data libraries are added to metadata automatically:

  • SASEV – ACM
  • SASEV – ARTIFACT
  • SASEV – EVDM
  • SASEV – KITS

For these cases you can just browse data directly using SAS Studio.  For the other Data Mart libraries, (EVDB, EVDMRAW), you will have to first define those libnames and register the tables in metadata. Once this is done you can explore the data sets in exactly the same way.

tags: SAS Administrators, SAS Environment Manager, SAS Professional Services

Auditing using the SAS Environment Manager Report Center– tips and tricks was published on SAS Users.

6月 102016
 

Learn more about machine learning"Shall we play a game?"

If you’re a child of the ’80s like me, you might recognize this famous line from the movie WarGames. This innocent-sounding question comes not from one of the movie’s human stars, but from a military super-computer named Joshua, after a bored high school student, played by Matthew Broderick, accesses the computer’s hard drive.

Thinking he’s hacked into a video game company, Broderick’s character accepts Joshua’s challenge and chooses the most intriguing game he can find: global thermonuclear war. To Joshua, though, it’s not just a game. Joshua is an intelligent computer programmed to learn through simulations like the one Broderick’s character initiates. And because the computer actually does control the arsenal of U.S. nuclear weapons, it’s a “game” that puts the planet on the brink of World War III.

If you want to see how the movie ends, I’d encourage you to check it out.

The reason I mention the movie is because, in addition to scaring the heck out of me, WarGames was my first exposure to machine learning, the idea that computers can learn and adapt based on the data they collect. Of course, machine learning has changed a lot since my 1983 Hollywood introduction.

Today, progressive organizations in all industries and disciplines are using the technology – which draws from various fields of study like artificial intelligence, data mining, statistics and optimization – to transform data into business value.

Machine learning defined

The concept is pretty straightforward. Computers “learn” from patterns in data to build a model, that when exposed to new data, can independently adapt without human intervention. The process repeats itself over and over again, with each iteration building on previous information and computations to create more accurate results or better predictions of future outcomes.

Machine learning is revolutionary for many analytically-mature organizations. Big data means even bigger possibilities. Consider what organizations like Amazon and Netflix are doing with recommendation systems based largely on machine learning principles and millions of data points.

Facebook is another great example. I’m sure you’ve noticed that when you upload a photo with people in it, Facebook will suggest that you tag certain individuals it recognizes as Facebook users, even pre-populating names to make it easier for you. That’s all done by a computer using facial recognition and an algorithm that analyzes an individual’s facial features to match faces with names.

My favorite machine learning resources

I won’t pretend to be an expert, but fascinating applications of machine learning like these push me to learn more about the science. It’s why I’m writing this post. If you’re interested in the concepts around machine learning, SAS has some excellent resources that have helped guide me in my discovery process. I believe you would find them helpful as well.

  1. For starters, SAS has an awesome web page on machine learning that includes a discussion on what it is, why it matters, who’s using it and how it works. You’ll also find a 60-second overview video and another video demonstrating how SAS Enterprise Miner uses pattern recognition to accurately recognize pictures and deliver that information to a user without human intervention. The best thing about the site, though, is that it serves as an aggregator for a number of other articles and resources to help you learn more. It probably is the best place to begin.
  2. Another excellent resource is a new O’Reilly white paper, The Evolution of Analytics: Opportunities and Challenges for Machine Learning in Business. SAS Data Scientist Patrick Hall, Solution Architect Wen Phan and Senior Marketing Specialist Katie Whitson give a really good overview of what machine learning is and what it is not. They also outline opportunities for businesses to effectively use machine learning and how to overcome some of the challenges they might encounter trying to weave it into their existing analytics strategy. This includes excellent discussions around talent scarcity, data and infrastructure challenges and the change management required to execute an analytics strategy that includes machine learning.
    Perhaps the best part of the report is the section highlighting some of the modern applications for machine learning – recommendation systems, streaming analytics, deep learning and cognitive computing. The authors highlight two companies – Geneia in the healthcare industry and Equifax in financial services – that are effectively using machine learning, walking readers through the journey these companies have taken to get to where they are today. Really interesting stuff.
  3. To discuss machine learning with your peers I recommend the SAS Data Mining Community. Conversations around machine learning are happening in this space all the time. With nearly 100,000 members from across all fields and disciplines, and thousands logged in at any given time, SAS Communities are a great place to ask and answer questions, or just follow along with the conversation. One discussion thread you should definitely check out is titled Recently Published Machine Learning Resources. Hall, one of the author mentioned above, highlights a couple of useful articles, the white paper and other resources. He also references a GitHub repo that includes a ton of good resources, including several machine learning quick reference tables outlining best practices.

So there you have it, a few good places to start if you’re looking to build your knowledge of machine learning. The possibilities machine learning presents to the world are very exciting.

Just please to be careful if that system you build asks you to play a game.

 

tags: machine learning, SAS

3 good resources for humans who want to learn more about machine learning was published on SAS Users.