performance

10月 122021
 

This article was co-written by Jane Howell, IoT Product Marketing Leader at SAS. Check out her blog profile for more information.

As artificial intelligence comes of age and data continues to disrupt traditional industry boundaries, the need for real-time analytics is escalating as organizations fight to keep their competitive edge. The benefits of real-time analytics are significant. Manufacturers must inspect thousands of products per minute for defects. Utilities need to eliminate unplanned downtime to keep the lights on and protect workers. And governments need to warn citizens of natural disasters, like flooding events, providing real time updates to save lives and protect property.

Each of these use cases requires a complex network of IoT sensors, edge computing, and machine learning models that can adapt and improve by ingesting and analyzing a diverse set of high-volume, high-velocity data.

SAS and Microsoft are partnering to inspire greater trust and confidence in every decision, by innovating with proven AI and streaming analytics in the cloud and on the edge. Together, we make it easier for companies to harness hidden insights in their diverse, high volume, high velocity IoT data, and capitalize on those insights in Microsoft Azure for secure, fast, and reliable decision making.

To take advantage of all the benefits that real-time streaming analytics has to offer, it’s important to tailor your streaming environment to your organization’s specific needs. Below, we’ll dive into how to understand the value of IoT in parallel to your organization’s business objectives and then strategize, plan, and manage your streaming analytics environment with SAS Viya on Azure.

Step 1: Understand the value of IoT

While you may already know that IoT and streaming analytics are the right technologies to enable your business’ real time analytics strategy, it is important to understand how it works and how you can benefit. You can think of streaming analytics for IoT in three distinct parts: sense, understand and act.

Sense: Sensors by design are distributed, numerous, and collect data at high fidelity in various formats. The majority of data collected by sensors has a short useful life and requires immediate action. Streaming analytics is well-suited to this distributed sensor environment to collect data for analysis.
Understand: A significant number of IoT use cases requires quick decision-making in real time or near-real time. To achieve this, we need to apply analytics to data in motion. This can be done by deploying AI models that detect anomalies and patterns as events occur.
Act: As with any analytics-based decision support, it is critical to act on the insight generated. Once a pattern is detected this must trigger an action to reach a desired outcome. This could be to alert key individuals or change the state of a device, possibly eliminating the need for any human intervention.

The value in IoT is driven by the reduced latency to trigger the desired outcome. Maybe that’s improving production quality in the manufacturing process, recommending a new product to a customer as they shop online, or eliminating equipment failures in a utility plant. Whatever it is, time is of the essence and IoT can help get you there.

Step 2: Strategize

Keeping the “sense, understand, act” framework in mind, the next step is to outline what you hope to achieve. To get the most out of your streaming analytics with SAS and Microsoft, keep your objectives in mind so you can stay focused on the business outcome instead of trying to act on every possible data point.

Some important questions to ask yourself are:

1. What is the primary and secondary outcomes you are hoping to achieve? Increase productivity? Augment safety? Improve customer satisfaction?
2. What patterns or events of interest do you want to observe?
3. If your machines and sensors show anomalous behavior what actions need to be taken? Is there an existing business process that reflects this?
4. What data is important to be stored as historical data and what data can expire?
5. What kind of infrastructure exists from the point where data is generated (edge) to cloud? Is edge processing an option for time-critical use cases or does processing needs to be centralized in cloud?
6. What are your analytics and application development platforms? Do you have access to high performance streaming analytics and cloud infrastructure to support this strategy?

Once you’ve identified your outcomes, define which metrics and KPIs you can measure to show impact. Make sure to have some baseline metrics to start from that you can improve upon.

Step 3: Plan and adopt

Now it’s time to take your strategy and plan the adoption of streaming analytics across your business.

Adoption will look different if you already have an IoT platform in place or if you are working to create a net-new solution. If you are going to be updating or iterating upon an existing solution, you will want to make sure you have access to key historical data to measure improvement and use institutional knowledge to maximize performance. If you are working with a net-new solution, you will want to give yourself some additional time to start small and then scale your operations up over time so you can tackle any unforeseen challenges.

In both cases it is important to have key processes aligned to the following considerations:

Data variety, volume and accuracy: Focus here on the “sense” part of the “sense, understand, act” framework. Accessing good data is the foundation to the success of your streaming projects. Make sure you have the right data needed to achieve your desired business outcome. Streaming analytics helps you understand the signals in IoT data, so you can make better decisions. But if you can’t access the right data, or your data is not clean, your project will not be successful. Know how much data you will be processing and where. Data can be noisy, so it is important to understand which data will give you the most insight.
Reliability: Ensure events are only processed once so you’re not observing the same events multiple times. When equipment fails or defects occur on the production line, ensure there are processes in place to auto-start to maximize uptime for operations.
Scalability: Data science resources are scarce, so choose a low-code, no-code solution that can address your need to scale. When volume increases, how are you going to scale up and out? Azure simplifies scale with its PaaS offerings, including the ability to auto-scale SAS Viya on Azure.
Operations: Understand how you plan to deploy your streaming analytics models, govern them and decide which processes can be automated to save time.
Choose the right partners and tools: This is critical to the success of any initiative. SAS and Microsoft provide a best-in-class solution for bringing streaming analytics on the most advanced platform for integrated cloud and edge analytics.

Now that you have created your plan, it is time to adopt. Remember to start small and add layers of capability over time.

Step 4: Manage

To get the most value from IoT and streaming analytics, organizations must implement processes for continuous iteration, development, and improvement. That means having the flexibility to choose the most powerful models for your needs – using SAS, Azure cloud services, or open source. It also means simplifying DevOps processes for deploying and monitoring your streaming analytics to maximize uptime for your business systems.

With SAS Viya on Azure, it is easy to do this and more. Seamlessly move between your SAS and Microsoft environment with single sign on authentication. Develop models with a host of no-code, low-code tools, and monitor the performance of your SAS and open-source models from a single model management library.

Maximizing value from your IoT and streaming analytics systems is a continuous, agile process. That is why it is critical to choose the most performant platform for your infrastructure and analytics needs. Together, SAS and Microsoft make it easier for organizations of all sizes and maturity to rapidly build, deploy, and scale IoT and streaming analytics, maximizing up time to better serve customers, employees, and citizens.

If you want to learn more about SAS and streaming analytics and IoT capabilities as well as our partnership with Microsoft, check out the resources below:

• Learn about SAS Viya’s IoT and streaming analytics capabilities
• Discover out all the exciting things SAS and Microsoft are working to achieve together at SAS.com/Microsoft
• See how SAS and Microsoft work together to help the town of Cary, North Carolina warn citizens of flood events: Smart city uses analytics and IoT to predict and manage flood events

Your guide for analyzing real time data with streaming analytics from SAS® Viya® on Azure was published on SAS Users.

10月 012021
 

I'll admit it - I'm often impatient while waiting for results, so my code needs to run as fast as possible! In this video, I show you how you can get faster results, too.

I'll demonstrate several different techniques that produce identical results, and compare processing speeds. For a more robust assessment, I'll test the techniques while reading from both SAS data sets and database tables. Analysis of the results clearly shows that when the techniques produce identical results, these choices usually produce faster run times:

  1. Use a WHERE statement instead of a subsetting IF statement.
  2. Use the KEEP= dataset option on input data sets instead of a KEEP statement.
  3. Should I use SQL or DATA step?
    • If producing a single result, use SQL.
    • If producing multiple results, use the DATA step.
  4. If your process is CPU bound:
    • If you have access to CAS, run it in CAS.
    • Otherwise, refactor in DS2.

The programs I used to create the video are available for download from GitHub if you want to experiment.

Until next time, may the SAS be with you!
Mark

Jedi SAS Tricks: 5 Ways to Make Your SAS Code Run Faster was published on SAS Users.

6月 152021
 

Dealing with big dataIn this fast-paced data age, when the sheer volume of data (generated, collected, and waiting to be processed and analyzed) grows at a breathtaking rate, the speed of data processing becomes critically important. In many cases, if data is not processed within an allotted time frame, we lose all its value as it becomes obsolete and ultimately irrelevant. That is why computing power becomes of the essence.

However, computing power itself does not guarantee timely processing. How we use that power makes all the difference. Way too often good old sequential processing just does not cut it anymore and different computing methods are required. One  such method is parallel processing.

In my previous post Using shell scripts for massively parallel processing I demonstrated a script-centered technique of running in parallel multiple independent SAS processes in SAS environments lacking SAS/CONNECT.

In this post, we will take a shot at a slightly different task and solution. Instead of having several totally independent processes, now we have some common “pre-processing” part, then we run several independent processes in parallel, and then we combine the results of parallel processing in the “post-processing” portion of our program.

Problem: monthly data ingestion use case

For simplification, we are going to use a scenario similar to one in the previous blog post:

Each month, shortly after the end of the previous month we needed to ingest a number of CSV files pertinent to transactions during the previous month and produce daily SAS data tables for each day of the previous month. Only now, we will go a step further: combining all those daily tables into a monthly table.

Solution: combining sequential and parallel processing

The solution is comprised of the three major components:

  • Shell script running the main SAS program.
  • Main SAS program, consisting of three parts: pre-parallel processing, parallel processing, and post-parallel processing.
  • Single thread SAS program responsible for a single day data ingestion.

1. Shell script running main SAS program

Below shell script mainprog.sh runs the main SAS program mainprog.sas:

#!/bin/sh
 
# HOW TO CALL:
# nohup sh /path/mainprog.sh YYYYMM &
 
now=$(date +%Y.%m.%d_%H.%M.%S)
 
# getting YYYYMM as a parameter in script call
ym=$1
 
pgmname=/path/mainprog.sas
logname=/path/saslogs/mainprog_$now.log
sas $pgmname -log $logname -set inDate $ym -set logname $logname

The script is run a background mode as indicated by the ampersand at the end of its invocation command:

nohup sh /path/mainprog.sh YYYYMM &

We pass a parameter YYYYMM (e.g. 202106) indicating year and month of our request.

When we call SAS program mainprog.sas within the script we indicate the name of the SAS log file to be created (-log $logname) and also pass on inDate parameter (-set inDate $ym, which has the same value YYYYMM as parameter specified in the script calling command), and logname parameter (-set logname $logname). As you will see further, we are going to use these two parameters within mainprog.sas program.

2. Main SAS program

Here is an abridged version of the mainprog.sas program:

/* ======= pre-processing ======= */
 
/* parameters passed from shell script */
%let inDate = %sysget(inDate);
%let logname = %sysget(logname);
 
/* year and month */
%let yyyy = %substr(inDate,1,4);
%let mm = %substr(inDate,5,2);
 
/* output data library */
libname SASDL '/data/target';
 
/* number of days in month mm of year yyyy */
%let days = %sysfunc(day(%sysfunc(mdy(&mm+1,1,&yyyy))-1));
 
/* ======= parallel processing ======= */
%macro loop;
   %local threadprog looplogdir logdt workpath tasklist i z threadlog cmd;
   %let threadprog = /path/thread.sas;
   %let looplogdir = %substr(&logname,1,%length(&logname)-4)_logs;
   x "mkdir &looplogdir"; *<- directory for loop logs;
   %let logdt = %substr(&logname,%length(&logname)-22,19);
   %let workpath = %sysfunc(pathname(WORK));
   %let tasklist=;
   %do i=1 %to &days;
      %let z = %sysfunc(putn(&i,z2.));
      %let threadlog = &looplogdir/thread_&z._&logdt..log;
      %let tasklist = &tasklist DAY&i;
      %let cmd = sas &threadprog -log &threadlog -set i &i -set workpath &workpath -set inDate &inDate;
      systask command "&cmd" taskname=DAY&i;
   %end;
 
   waitfor _all_ &tasklist;
 
%mend loop;
%loop
 
/* ======= post-processing ======= */
 
/* combine daily tables into one monthly table */
data SASDL.TARGET_&inDate;
   set WORK.TARGET_&inDate._1 - WORK.TARGET_&inDate._&days;
run;

The key highlights of this program are:

  • We capture values of the parameters passed to the program (inDate and logname).
  • Based on these parameters, assign source directory and target data library SASDL.
  • Calculate number of days in a specific month defined by year and month.
  • Create a directory to hold SAS logs of all parallel threads; the directory name is matching the log name of the mainprog.sas.
  • Capture the WORK library location of the main SAS session running mainprog.sas as:

    %let workpath = %sysfunc(pathname(WORK));We use that location in the thread sessions to pass back to the main session data produced by the thread sessions.

  • Macro %do-loop generates a series of SYSTASK statements to spawn additional SAS sessions in the background mode, each ingesting data for a single day of a month:

    systask command "&cmd" taskname=DAY&i;The SYSTASK statement enables you to execute host-specific commands from within your SAS session or application. Unlike the X statement, the SYSTASK statement runs these commands as asynchronous tasks, which means that these tasks execute independently of all other tasks that are currently running. Asynchronous tasks run in the background, so you can perform additional tasks (including launching other asynchronous tasks) while the asynchronous task is still running.

    Restriction: SYSTASK statement is not supported on the CAS server.

  • Also, we generate a cumulative list of all tasknames assigned to each thread sessions:

    %let tasklist = &tasklist DAY&i;

  • Outside the macro %do-loop we use WAITFOR statement which suspends execution of the main SAS session until the specified tasks finish executing. Since we created a list of all daily thread sessions (&tasklist), this will synchronize all our parallel threads and continue mainprog.sas session only when all threads finished executing.
  • At the end of the main SAS session we concatenate all our daily data tables that have been created by parallel threads in the location of the WORK library of the main SAS session.

Using SAS macro loop to generate a series of SYSTASK statements for parallel processing is not the only method available. Alternatively, you can achieve this within a data step using CALL EXECUTE. In this case, each data step iteration will generate a single global SYSTASK statement and push it out of the data step boundaries where they will be sequentially executed (just like in the case of macro implementation). Since option NOWAIT is the default for SYSTASK statements, despite all of them being launched sequentially, their corresponding OS commands will be still running in parallel.

3. Single thread SAS program

Here is an abridged version of the thread.sas program:

/* inDate parameter */
%let inDate = %sysget(inDate);
 
/* parent program's WORK library */
%let workpath = %sysget(workpath);
libname MAINWORK "&workpath";
 
/* thread number */
%let i = %sysget(i);
 
/* year and month */
%let yyyy = %substr(inDate,1,4);
%let mm = %substr(inDate,5,2);
 
/* source data directory */
%let srcdir = /datapath/&yyyy/&mm;
 
/* create varlist macro variable to list all input variable names */
proc sql noprint;
   select name into :varlist separated by ' ' from SASHELP.VCOLUMN
   where libname='PARMSDL' and memname='DATA_TEMPLATE';
quit;
 
/* create fileref inf for the source file */
filename inf "&srcdir/source_data_&inDate._day&i..cvs";
 
/* create daily output data set */
data MAINWORK.TARGET_&inDate._&i; 
   if 0 then set PARMSDL.DATA_TEMPLATE;
   infile inf missover dsd encoding='UTF-8' firstobs=2 obs=max;
   input &varlist;
run;

This program ingests a single .csv file corresponding to the &i-th day of &inDate (year and month) and creates a SAS data table MAINWORK.TARGET_&inDate._&i. To be available in the main SAS session the MAINWORK library is defined here in the same physical location as the WORK library of the main parental SAS session.

We also use a pre-created SAS data template PARMSDL.DATA_TEMPLATE - a zero-observations data set that contains descriptions of all the variables and their attributes.

Additional resources

Thoughts? Comments?

Do you find this post useful? Do you have processes that may benefit from parallelization? Please share with us below.

Using SYSTASK and SAS macro loops for massively parallel processing was published on SAS Users.

5月 302019
 

Human behavior is fascinating. We come in so many shapes, sizes and backgrounds. Doesn’t it make sense that any tests we write also accommodate our wonderful differences?

This picture is of Miko, a northern rescue and a recent addition to my family. He’s learning to live in an urban household and doing great with some training. He’s going through so many new tests as he adapts to life in the city, which is quite different from being free in the northern territories. Watch for a later post on his training successes.

I’m so happy to share how SAS has been helping candidates by offering a variety of certification credentials geared towards testing for differences and preferences in thought. If you are wondering – I’ve been addicted to psychometrics for a while now, anything human behavior-related interests me. I thought I would begin with sharing some different types of testing roles that I have held in the past.

1. Psychometric testing

Before I joined SAS, I worked at CSI. To answer that unspoken thought dear reader, CSI has been providing financial training and accreditation since 1964 – way before CSI the TV show became popular.

My role as Test Manager was super exciting for someone with a curiosity for analytics and helping people succeed. In a team of four we scored over 200 exams to provide credentials. Psychometrics was the most exciting part of my job analyzing the performance of test takers to constantly innovate our tests. Psychometric tests are used to identify a candidate's skills, knowledge and personality.

2. Multiple-choice testing

While setting multiple choice exam questions, I learned that it was ideal for the four answer choices to be similar in length, and complexity (e.g. if candidates typically chose option A for a question whose right response was B, we would dig deeper to compare the lengths of the options, the language of the options, and then change the option if that was what the review committee agreed upon).

3. Adaptive testing

Prior to CSI, I worked at the test center of Devry Institute of technology. In adaptive testing, the test’s difficulty adapts to candidate performance. A correct response leads into a more complex question. On the flip side, an incorrect response leads to an easier next question. So that, eventually, we could help candidates decide which engineering program would be the right skill fit.

This is where I met the student who asked, “can my boyfriend write my exam?”

4. Performance testing

With SAS at the forefront of analytics, it should come as no surprise that certification exams have evolved to the next level. As a certification candidate you can now try out performance-based testing.

A performance test requires a candidate to actually perform a task, rather than simply answering questions. An example is writing SAS code. Instead of answering a knowledge-level multiple choice exam about SAS code, the candidate is asked to actually write code to arrive at answers.

Certification at SAS

SAS Certified Specialist: Base Programming Using SAS 9.4 is great for those who can demonstrate ease in putting into practice the knowledge learned in the Foundation Programming classes 1 and 2. During this performance-based exam, candidates will access a SAS environment. Coding challenges will be presented, and you will need to write and execute SAS code to determine the correct answers to a series of questions.

SAS® Certified Base Programmer for SAS®9 credential remains, but the exam will be retired in June 2019.

While writing this post I came across this on Wikipedia: it shows how the study of adaptive behavior goes back to Darwin’s time. It’s a good read for anyone intrigued by the science and art of testing.

“Charles Darwin was the inspiration behind Sir Francis Galton who led to the creation of psychometrics. In 1859, Darwin published his book The Origin of Species, which pertained to individual differences in animals. This book discussed how individual members in a species differ and how they possess characteristics that are more adaptive and successful or less adaptive and less successful. Those who are adaptive and successful are the ones that survive and give way to the next generation, who would be just as or more adaptive and successful. This idea, studied previously in animals, led to Galton's interest and study of human beings and how they differ one from another, and more importantly, how to measure those differences.”

Are you fascinated by the science and art of human behavior as it relates to testing? Are you as excited as I am about the possibilities of performance-based testing? I would love to hear your comments below.

New at SAS: Psychometric testing was published on SAS Users.

10月 102018
 

Deep learning (DL) is a subset of neural networks, which have been around since the 1960’s. Computing resources and the need for a lot of data during training were the crippling factor for neural networks. But with the growing availability of computing resources such as multi-core machines, graphics processing units (GPUs) accelerators and hardware specialized, DL is becoming much more practical for business problems.

Financial institutions use a large number of computations to evaluate portfolios, price securities, and financial derivatives. For example, every cell in a spreadsheet potentially implements a different formula. Time is also usually of the essence so having the fastest possible technology to perform financial calculations with acceptable accuracy is paramount.

In this blog, we talk to Henry Bequet, Director of High-Performance Computing and Machine Learning in the Finance Risk division of SAS, about how he uses DL as a technology to maximize performance.

Henry discusses how the performance of numerical applications can be greatly improved by using DL. Once a DL network is trained to compute analytics, using that DL network becomes drastically faster than more classic methodologies like Monte Carlo simulations.

We asked him to explain deep learning for numerical analysis (DL4NA) and the most common questions he gets asked.

Can you describe the deep learning methodology proposed in DL4NA?

Yes, it starts with writing your analytics in a transparent and scalable way. All content that is released as a solution by the SAS financial risk division uses the "many task computing" (MTC) paradigm. Simply put, when writing your analytics using the many task computing paradigm, you organize code in SAS programs that define task inputs and outputs. A job flow is a set of tasks that will run in parallel, and the job flow will also handle synchronization.

Fig 1.1 A Sequential Job Flow

The job flow in Figure 1.1 visually gives you a hint that the two tasks can be executed in parallel. The addition of the task into the job flow is what defines the potential parallelism, not the task itself. The task designer or implementer doesn’t need to know that the task is being executed at the same time as other tasks. It is not uncommon to have hundreds of tasks in a job flow.

Fig 1.2 A Complex Job Flow

Using that information, the SAS platform, and the Infrastructure for Risk Management (IRM) is able to automatically infer the parallelization in your analytics. This allows your analytics to run on tens or hundreds of cores. (Most SAS customers run out of cores before they run out of tasks to run in parallel.) By running SAS code in parallel, on a single machine or on a grid, you gain orders of magnitude of performance improvements.

This methodology also has the benefit of expressing your analytics in the form of Y= f(x), which is precisely what you feed a deep neural network (DNN) to learn. That organization of your analytics allows you to train a DNN to reproduce the results of your analytics originally written in SAS. Once you have the trained DNN, you can use it to score tremendously faster than the original SAS code. You can also use your DNN to push your analytics to the edge. I believe that this is a powerful methodology that offers a wide spectrum of applicability. It is also a good example of deep learning helping data scientists build better and faster models.

Fig 1.3 Example of a DNN with four layers: two visible layers and two hidden layers.

The number of neurons of the input layer is driven by the number of features. The number of neurons of the output layer is driven by the number of classes that we want to recognize, in this case, three. The number of neurons in the hidden layers as well as the number of hidden layers is up to us: those two parameters are model hyper-parameters.

How do I run my SAS program faster using deep learning?

In the financial risk division, I work with banks and insurance companies all over the world that are faced with increasing regulatory requirements like CCAR and IFRS17. Those problems are particularly challenging because they involve big data and big compute.

The good news is that new hardware architectures are emerging with the rise of hybrid computing. Computers are increasing built as a combination of traditional CPUs and innovative devices like GPUs, TPUs, FPGAs, ASICs. Those hybrid machines can run significantly faster than legacy computers.

The bad news is that hybrid computers are hard to program and each of them is specific: you write code for GPU, it won’t run on an FPGA, it won’t even run on different generations of the same device. Consequently, software developers and software vendors are reluctant to jump into the fray and data scientist and statisticians are left out of the performance gains. So there is a gap, a big gap in fact.

To fill that gap is the raison d’être of my new book, Deep Learning for Numerical Applications with SAS. Check it out and visit the SAS Risk Management Community to share your thoughts and concerns on this cross-industry topic.

Deep learning for numerical analysis explained was published on SAS Users.

8月 042016
 

Testing EMC Storage and Veritas shared file systemsIn my current role I have the privilege of managing the Performance Lab in SAS R&D. Helping users work through performance challenges is a critical part of the Lab’s mission. This spring, my team has been actively testing new and enhanced storage arrays from EMC along with the Veritas clustered file system.  We have documented our findings on the SAS Usage note 42197 “List of Useful Papers.”

The two different flash based storages we tested from EMC are the new DSSD D5 appliance and XtremIO array.  The bottom line: both storages performed very nicely with a mixed analytics workload.  For more details on the results of the testing along with the tuning guidelines for using SAS with this storage, please review these papers:

As with all storage, please validate that the storage can deliver all the “bells and whistles;” you will need to support your failover and high availability needs of your SAS applications.

In addition to the storage testing, we tested with the latest version of Veritas InfoScale clustered file system.  We had great results in a distributed SAS environment with several SAS compute nodes all accessing data in this clustered file system.  A lot of information was learned in this testing and captured in the following paper:

My team plans to continue testing of new storage and file system technologies throughout the remainder 2016.  If there is a storage array or technology you would like to have tested, please let us know by sharing it in the comments section below or contacting me directly.

 

tags: performance, storage

Testing EMC Storage and Veritas shared file systems was published on SAS Users.

4月 222016
 

Our society lives under a collective delusion that burning out is a requirement to success. We’ve been conditioned to believe that sacrificing family, relationships and what’s personally important opens the door to achievement. But how can you be an effective leader, run a successful company or properly manage employees when […]

Sleep: Your legal performance enhancer was published on SAS Voices.

11月 182015
 

From time to time we’ll hear from customers who are encountering performance issues. SAS has a sound methodology for resolving these issues and we are always here to keep your SAS system humming. However, many problems can be resolved with some simple suggestions. This blog will discuss different types of performance issues you might encounter, with some suggestions on how to effectively resolve them.

Situation: You are a new SAS customer or are simply running a new SAS application on new hardware
Suggestion: Be sure you’ve read and applied all the guidelines in the various tuning papers that have been written:

Making sure you understand the performance issues will help us determine what next steps are. It’s worth noting, 90% of performance issues are because your hardware, operating system and/or storage has not been configured based on the tuning guidelines listed above.  In a recent case we were able to get a 20% performance gain from a long running ETL process by adjusting two RHEL kernel parameters that have been documented for many years in our tuning paper.

Situation: Your SAS application has been running and over time gets slower
Suggestion: Determine if the number of concurrent SAS sessions/users has increased and/or the volume of data (both input and lookup tables) have increased.  This is the top reason for a gradual slowdown.

Situation: Your SAS application took a significant performance hit overnight or in a short time frame.
Suggestion: The first thing you want to do is see if any maintenance (tweaking of your system, hotfix, patch, …) have been made to your operating system, VMware, and/or storage arrays.  A lot of customers have applied maintenance (not to SAS) and SAS all of a sudden is running 2-5 times longer. You’ll want to check that all the operating system settings, mount options, and VMware settings are the same after the maintenance as they were before maintenance.

In conclusion, if you are having performance issues, check the suggested tuning guidelines. Also, be sure to keep track of all the settings for the hardware and storage infrastructure when applying maintenance to make sure these settings are the same afterwards as they were before.

Of course, if you have followed the guidelines and maintenance is not the reason for your performance issues, please contact us. We are here to help.

tags: performance, SAS Administrators, tuning

Tips to keep your SAS system humming was published on SAS Users.

3月 112015
 

SAS FULLSTIMER is a SAS system option that takes operating system information that is being collected by SAS process runs and writes that information to the SAS log. Using it can add up to 10 lines additional lines to your SAS log for each SAS step in your SAS log—so why would I recommend turning it on?

This additional information includes memory utilization, date/time stamp for when each step finished, context switch information, along with some other operating-specific information regarding the SAS step that just finished.  Why would you need this much information?

This data is very useful in helping your SAS administrator and SAS support personnel determine why a SAS process may be running slower than expected. Having this information collected every time a SAS job is run means that data can be used to help determine which SAS step ran slower and at what time and under what circumstances.

Since the IT staff for most organizations are collecting hardware monitor data on daily basis, they can then use the information from the SAS log to pinpoint what time of day the performance issue occurred, on what system and using what file systems.

Again, this is just one way SAS users can be proactive in trying to solve any future performance issues. And all you need to do is add –FULLSTIMER to your SAS configuration file or to the SAS command line that you use to invoke SAS.

If you have any questions on the above, please let us know.  Here are additional resources if you want to learn more about SAS FULLSTIMER and its use:

SAS timer - the key to writing efficient SAS code

Improving performance: Determine the cause

Tune your SAS system for max performance

Troubleshoot Your Performance Issues: SAS® Technical Support Shows You How

Increasing IT’s awareness of SAS: A few good practices

tags: performance, SAS Administrators
1月 212015
 

New Year to me is always a stark reminder of the inexorability of Time. In a day-to-day life, time is measured in small denominations - minutes, hours, days… But come New Year, and this inescapable creature – Time – makes its decisive leap – and in a single instant, we become officially older and wiser by the entire year’s worth.

What’s a better time to re-assess ourselves, personally and professionally! What’s a better time to Resolve to improve your SAS programming skills, as skillfully crafted by Michael A. Raithel in his latest blog post.

I thought I could write a post showing how to be efficient and kill two birds with one stone.  The birds here are two New Year’s Raithel’s proposed resolutions:

#2 Volunteer to help junior SAS programmers.

#12 Reduce processing time by writing more efficient programs.

To combine the two, I could have titled this post “Helping junior SAS programmers to reduce processing time by writing more efficient programs”. However, I am not going to “teach” you efficient coding techniques which are a subject deserving of a multi-volume treatise. I will just give you a simple tool that is a must-have for any SAS programmer (not just junior) who considers writing efficient SAS code important. This simple tool has been the ultimate judge of any code’s efficiency and it is called timer.

What is efficient?

Setting aside hardware constraints and limitations (which are increasingly diminishing nowadays), efficient means fast or at least fast enough not to exceed ever-shrinking user tolerance of wait time.

Of course, if you are developing a one-time run code to generate some ad-hoc report or produce results for uniquely custom computations, your efficiency criteria might be different, such as “as long as it ends before the deadline” or at least “does not run forever”.

However, in most cases, SAS code is developed for some applications, in many cases interactive applications, where many users run the code over and over again. It may run behind the scenes of a web application with a user waiting (or rather not wanting to wait) for results. In these cases, SAS code must be really fast, and any improvement in its efficiency is multiplied by the number of times it is run.

What is out there?

SAS provides the following SAS system options to measure the efficiency of SAS code:

STIMER. You may not realize that you use this option every time you run a SAS program. This option is turned on by default (NOSTIMER to turn it off) and controls information written to the SAS Log by each SAS step. Each step of a SAS program by default generates the following sample NOTE in SAS Log:

NOTE: DATA statement used (Total process time):
      real time           1.31 seconds
      cpu time            1.10 seconds

FULLSTIMER. This option (NOFULLSTIMER to turn it off) provides much more information on used resources for each step. A sample Log output of a FULLSTIMER option for a SAS Data Step is listed below:

NOTE: DATA statement used:
real time                   0.06 seconds
user cpu time               0.02 seconds
system cpu time             0.00 seconds
Memory                      88k
Page Faults                  10
Page Reclaims                 0
Page Swaps                    0
Voluntary Context Switches   22
Involuntary Context Switches  0
Block Input Operations       10
Block Output Operations      12

While the FULLSTIMER option provides plenty of information for SAS code optimization, in many cases it is more than you really need. On the other hand, STIMER may provide quite valuable information about each step, thus identifying the most critical steps of your SAS program.

Get your own SAS timer

If your efficiency criteria is how fast your SAS program runs as a whole, than you need an old-fashioned timer, with start and stop events and time elapsed between them. To achieve this in SAS programs, I use the following technique.

  1. At the very beginning of your SAS program, place the following line of code that effectively starts the timer and remembers the start time:
  2. /* Start timer */
    %let _timer_start = %sysfunc(datetime());
    

  3. At the end of your SAS program place the following code snippet that captures the end time, calculates duration and outputs it to the SAS Log:
  4. /* Stop timer */
    data _null_;
      dur = datetime() - &_timer_start;
      put 30*'-' / ' TOTAL DURATION:' dur time13.2 / 30*'-';
    run;
    

    The resulting output in the SAS log will look like this:

    ------------------------------
     TOTAL DURATION:   0:01:31.02
    ------------------------------
    

    Despite its utter simplicity, this little timer is a very convenient little tool to improve your SAS code efficiency. You can use it to compare or benchmark your SAS programs in their entirety.

    Warning. In the above timer, I used the datetime() function, and I insist on using it instead of the time() function as I saw in many online resources. Keep in mind that the time() function resets to 0 at midnight. While time() will work just as well when start and stop times are within the same date, it will produce completely meaningless results when start time falls within one date and stop time falls within another date. You can easily trap yourself in when you submit your SAS program right before midnight while it ends after midnight, which will result in an incorrect, even negative, duration.

    I hope using this SAS timer will help you writing more efficient SAS programs.

tags: efficient SAS code, performance, SAS Professional Services, SAS Programmers, SAS system options