Tech

11月 252020
 

Helping customers is my passion. Having a good understanding of the hot-fix process can help you keep your SAS environment running smoothly. To help you better manage your environment, I am writing a series of blog posts with best practices and tips to successfully administer SAS hot fixes for SAS®9 environments. This first entry in the series focuses on the SAS Hot Fix Analysis, Download and Deployment Tool.

Importance of installing hot fixes

During my time helping SAS customers, I have seen many sites that did not apply hot fixes correctly or at all, which in turn caused them unnecessary downtime on their production environment(s). These issues could have been avoided if they had kept up-to-date with the hot fixes that were available. I've also seen many cases where the dependencies for hot fixes were not considered or the post-installation steps were not performed, again causing unnecessary downtime. Hopefully, some of the tips in this blog series will help SAS administrators realize just how important hot fixes are.

Benefits of the SASHFADD Tool

The focus of this post is the SAS Hot Fix Analysis, Download and Deployment Tool, commonly known as SASHFADD. This is the tool that SAS recommends for you to always use in SAS® 9.3 and SAS® 9.4 environments when applying hot fixes. Why? There are many reasons!

  • It creates a report that shows the products that are installed in your environment that have outstanding hot fixes. The report goes into detail about dependencies that you need to address before installing hot fixes. My next post provides a detailed analysis of an example report.
  • After you run the SASHFADD.exe file, it creates a DOWNLOAD<name> directory with some additional scripts that allow you to automatically download all your hot fixes into the DEPLOY<name> directory, so that you do not have to manually download each file. If you want to download only the alert hot fixes, there is a script available in the DOWNLOAD<name> directory to do that as well.
  • After you install SASFHADD, you need to run it periodically to check for hot fixes. One of the posts in this series provides more information about scheduling SASHFADD runs.
  • Overall, it provides a great picture of what is needed to keep your environment running in top shape.

Note: There are certain items that SASHFADD does not support. For details, see SAS Note 527, "SAS® hot fixes and patches that are not supported by the SAS Hot Fix Analysis, Download and Deployment Tool."

The SAS Hot Fix Analysis, Download and Deployment (SASHFADD) Tool Usage Guide clearly describes the steps to install and use SASHFADD. If you haven't already installed this tool, you should do it right away! Always remember that if you run into an issue, this guide has a wonderful troubleshooting section.

Workaround if you are not the server administrator

This section primarily applies to consultants who are working onsite with the SAS administrator. If you are in a situation in which you do not have administrator access to the server or there are proxy issues, you can run SASHFADD on a different system. Doing this allows you to create the report on another system that you can use to plan your hot-fix implementation.

Here are the steps that you should follow on a Microsoft Windows system.

On the system where you plan to run the SASHFADD tool:

1. Download and execute the SASHFADD tool following the instructions in the Usage Guide.

2. Go to page 16 of the Usage Guide and click the appropriate HFADD_data.xml file to get the most recent list of hot fixes. Once it opens the file, save it in the SASHFADD

On the server that you want to apply the hot fixes to:

3. Go to the deploymntreg folder in SASHome.

4. Double-click the sas.tools.viewregistry.jar file. This action creates a create the Deployment Registry and accompanying txt file in the deploymntreg folder:

5. Copy the DeploymentRegistgry.txt file.

On the system where you plan to run the SASHFADD tool:

6. Paste the DeploymentRegistgry.txt file in the SASHFADD

7. In the SASHFADD folder, run the SASHFADD.exe file and provide a name for the tier that you are running it on. In the display below, the file is being run on the metadata tier:

Note that you must run the file on each tier. This action creates the report in a folder whose name corresponds to the name of the tier that you ran the EXE file on. You can then share this report (or multiple reports) with the SAS administrator who will be installing the hot fixes:

Helpful resources

See the following links for the detailed and thorough documentation:

Coming soon

The second post in the series is to help you understand dependencies covered in the SASHFADD report.

Thank you and have an awesome day!

 

The SAS Hot Fix Analysis, Download and Deployment (SASHFADD) Tool was published on SAS Users.

11月 202020
 

The following is an excerpt from Cautionary Tales in Designed Experiments by David Salsburg. This book is available to download for free from SAS Press. The book aims to explain statistical design of experiments (DOE) to readers with minimal mathematical knowledge and skills. In this excerpt, you will learn about the origin of Thomas Bayes’ Theorem, which is the basis for Bayesian analysis.

A black and white portrait of Thomas Bayes in a black robe with a white collar.

Source: Wikipedia

The Reverend Thomas Bayes (1702–1761) was a dissenting minister of the Anglican Church, which means he did not subscribe to the full body of doctrine espoused by the Church. We know of Bayes in the 21st century, not because of his doctrinal beliefs, but because of a mathematical discovery, which he thought made no sense whatsoever. To understand Bayes’ Theorem, we need to refer to this question of the meaning of probability.

In the 1930s, the Russian mathematician Andrey Kolomogorov (1904–1987) proved that probability was a measure on a space of “events.” It is a measure, just like area, that can be computed and compared. To prove a theorem about probability, one only needed to draw a rectangle to represent all possible events associated with the problem at hand. Regions of that rectangle represent classes of sub-events.

For instance, in Figure 1, the region labeled “C” covers all the ways in which some event, C, can occur. The probability of C is the area of the region C, divided by the area of the entire rectangle. Anticipating Kolomogorov’s proof, John Venn (1834–1923) had produced such diagrams (now called “Venn diagrams”).

Two overlapping circular shapes. One is labeled C, the other labeled D. The area where the shapes overlap is labeled C+D

Figure 1: Venn Diagram for Events C and D

Figure 1 shows a Venn diagram for the following situation: We have a quiet wooded area. The event C is that someone will walk through those woods sometime in the next 48 hours. There are many ways in which this can happen. The person might walk in from different entrances and be any of a large number of people living nearby. For this reason, the event C is not a single point, but a region of the set of all possibilities. The event D is that the Toreador Song from the opera Carmen will resound through the woods. Just as with event C, there are a number of ways in which this could happen. It could be whistled or sung aloud by someone walking through the woods, or it could have originated from outside the woods, perhaps from a car radio on a nearby street. Some of these possible events are associated with someone walking through the woods, and those possible events are in the overlap between the regions C and D. Events associated with the sound of the Toreador Song that originate outside the woods are in the part of region D that does not overlap region C.

The area of region C (which we can write P(C) and read it as “P of C”) is the probability that someone will walk through the woods. The area of region D (which we can write P(D)) is the probability that the Toreador Song will be heard in the woods. The area of the overlap between C and D (which we can write P(C and D) is the probability that someone will walk through the woods and that the Toreador Song will be heard.

If we take the area P(C and D) and divide it by the area P(C), we have the probability that the Toreador Song will be heard when someone walks through the woods. This is called the conditional probability of D, given C. In symbols

P(D|C) = P(C and D)÷ P(C)

Some people claim that if the conditional probability, P(C|D), is high, then we can state “D causes C.” But this would get us into the entangled philosophical problem of the meaning of “cause and effect.”

To Thomas Bayes, conditional probability meant just that—cause and effect. The conditioning event, C, (someone will walk through the woods in the next 48 hours) comes before the second event D, (the Toreador Song is heard). This made sense to Bayes. It created a measure of the probability for D when C came before.

However, Bayes’ mathematical intuition saw the symmetry that lay in the formula for conditional probability:

P(D|C) = P(D and C)÷ P(C) means that

P(D|C)P(C) = P(D and C) (multiply both sides of the equation by P(C)).

But just manipulating the symbols shows that, in addition,

P(D and C) = P(C|D) P(D), or

P(C|D) = P(C and D)÷ P(D).

This made no sense to Bayes. The event C (someone walks through the woods) occurred first. It had already happened or not before event D (the Toreador Song is heard). If D is a consequence of C, you cannot have a probability of C, given D. The event that occurred second cannot “cause” the event that came before it. He put these calculations aside and never sent them to the Royal Society. After his death, friends of Bayes discovered these notes and only then were they sent to be read before the Royal Society of London. Thus did Thomas Bayes, the dissenting minister, become famous—not for his finely reasoned dissents from church doctrine, not for his meticulous calculations of minor problems in astronomy, but for his discovery of a formula that he felt was pure nonsense.

P(C|D) P(D) = P(C and D) = P(D|C) P(C)

For the rest of the 18th century and for much of the 19th century, Bayes’ Theorem was treated with disdain by mathematicians and scientists. They called it “inverse probability.” If it was used at all, it was as a mathematical trick to get around some difficult problem. But since the 1930s, Bayes’ Theorem has proved to be an important element in the statistician’s bag of “tricks.”

Bayes saw his theorem as implying that an event that comes first “causes” an event that comes after with a certain probability, and an event that comes after “causes” an event that came “before” (foolish idea) with another probability. If you think of Bayes’ Theorem as providing a means of improving on prior knowledge using the data available, then it does make sense.

In experimental design, Bayes’ Theorem has proven very useful when the experimenter has some prior knowledge and wants to incorporate that into his or her design. In general, Bayes’ Theorem allows the experimenter to go beyond the experiment with the concept that experiments are a means of continuing to develop scientific knowledge.

To learn more about how probability is used in experimental design, download Cautionary Tales in Designed Experiments now!

Thomas Bayes’ theorem and “inverse probability” was published on SAS Users.

11月 202020
 

If you’re like me and the rest of the conference team, you’ve probably attended more virtual events this year than you ever thought possible. You can see the general evolution of virtual events by watching the early ones from April or May and compare them to the recent ones. We at SAS Global Forum are studying the virtual event world, and we’re learning what works and what needs to be tweaked. We’re using that knowledge to plan the best possible virtual SAS Global Forum 2021.

Everything is virtual these days, so what do we mean by virtual?

Planning a good virtual event takes time, and we’re working through the process now. One thing is certain -- we know the importance of providing quality content and an engaging experience for our attendees. We want to provide attendees with the opportunity as always, but virtually, to continue to learn from other SAS users, hear about new and exciting developments from SAS, and connect and network with experts, peers, partners and SAS. Yes, I said network. We realize it won’t be the same as a live event, but we are hopeful we can provide attendees with an incredible experience where you connect, learn and share with others.

Call for content is open

One of the differences between SAS Global Forum and other conferences is that SAS users are front and center, and the soul of the conference. We can’t have an event without user content. And that’s where you come in! The call for content opened November 17 and lasts through December 21, 2020. Selected presenters will be notified in January 2021. Presentations will be different in 2021; they will be 30 minutes in length, including time for Q&A when able. And since everything is virtual, video is a key component to your content submission. We ask for a 3-minute video along with your title and abstract.

The Student Symposium is back

Calling all postsecondary students -- there’s still time to build a team for the Student Symposium. If you are interested in data science and want to showcase your skills, grab a teammate or two and a faculty advisor and put your thinking caps on. Applications are due by December 21, 2020.

Learn more

I encourage you to visit the SAS Global Forum website for up-to-date information, follow #SASGF on social channels and join the SAS communities group to engage with the conference team and other attendees.

Connect, learn and share during virtual SAS Global Forum 2021 was published on SAS Users.

11月 192020
 
SAS loves data. It's our raison d'être. We've been dealing with Big Data long before the term was first used in 2005. A brief history of Big Data*:

  • In 1887, Herman Hollerith invented punch cards and a reader to organize census data.
  • In 1937, the US government had a punch-card reading machine created to keep track of 26 M Americans and 3 M employers as a result of the Social Security Act.
  • In 1943, Colossus was created to decipher Nazi codes during World War II.
  • In 1952, the National Security Agency was created to confront decrypting intelligence signals during the Cold War.
  • In 1965, the US Government built the first data center to store 742 M tax returns and 175 M sets of fingerprints.
  • In 1989, British computer scientist Tim Berners-Lee coined the phrase "World Wide Web" combining hypertext with the Internet.
  • In 1995, the first super-computer is built.
  • In 2005 Roger Mougalas from O'Reilly Media coined the term Big Data.
  • In 2006, Hadoop is created.

From

To


The story goes on to the tune of 90 percent of available data today has been created in the last two years!

As SAS (and the computing world) moves to the cloud, the question of, "How do I deal with my data (Big and otherwise), which used to be on-prem, in the cloud?" is at the forefront of many organizations. I ran across a series of relevant articles by my colleague, Nicolas Robert, on the SAS Support Communities on SAS and data access and storage on Google Cloud Storage (GCS). This post organizes the articles so you can quickly get an overview of the various options for SAS to access data in GCS.

Accessing Google Cloud Storage (GCS) with SAS Viya 3.5 – An overview

As the title suggests, this is an overview of the series. Some basic SAS terminology and capabilities are discussed, followed by an overview of GCS data options for SAS. Options include:

  • gsutil - the "indirect" way
  • REST API - the "web" way
  • gcsfuse - the "dark" way
  • BigQuery - the "smart" way.

In the overview Nicolas provides the pros and cons of each offering to help you decide which option works best for your situation. Below is a list of subsequent articles providing technical details, specific steps for usage, and sample code for each option.

Accessing files on Google Cloud Storage (GCS) using REST

The Google Cloud Platform (GCP) provides an API for manipulating objects in Google Cloud Storage. In this article, Nicolas provides step-by-step instructions on using this API to access GCS files from SAS.

Accessing files on Google Cloud Storage (GCS) using SAS Viya 3.5 and Cloud Storage FUSE (gcsfuse)

Cloud Storage FUSE provides a command-line utility, named “gcsfuse”, which helps you mount a GCS bucket to a local directory so the bucket’s contents are visible and accessible locally like any other file. In this article, Nicolas presents rules for CLI usage, options for mounting a GCS bucket to a local directory, and SAS code for accessing the data.

SAS Viya 3.5 and Google Cloud Storage (GCS) Performance Feedback

In this article, Nicolas provides the results of a performance test of GCS integrated with SAS when accessed from cloud instances. New releases of SAS will only help facilitate integration and improve performance.

Accessing files on Google Cloud Storage (GCS) through Google BigQuery

Google BigQuery naturally interacts with Google Cloud Storage using popular big data file formats (Avro, Parquet, ORC) as well as commodity file formats like CSV and JSON. And since SAS can access Google BigQuery, SAS can access those GCS resources under the covers. In the final article, Nicolas debunks the myth that using Google BigQuery as middleware between SAS and GCS is cumbersome, not direct and requires data duplication.

Finally

Being able to access a wide variety of data on the major cloud providers' object storage technologies has become essential if not already mandatory. I encourage you to browse through the various articles, find your specific area of interest, and try out some of the detailed concepts.

* Big Data history compiled from A Short History Of Big Data, by Dr Mark van Rijmenam.

Accessing Google Cloud Storage (GCS) with SAS Viya was published on SAS Users.

11月 192020
 
SAS loves data. It's our raison d'être. We've been dealing with Big Data long before the term was first used in 2005. A brief history of Big Data*:

  • In 1887, Herman Hollerith invented punch cards and a reader to organize census data.
  • In 1937, the US government had a punch-card reading machine created to keep track of 26 M Americans and 3 M employers as a result of the Social Security Act.
  • In 1943, Colossus was created to decipher Nazi codes during World War II.
  • In 1952, the National Security Agency was created to confront decrypting intelligence signals during the Cold War.
  • In 1965, the US Government built the first data center to store 742 M tax returns and 175 M sets of fingerprints.
  • In 1989, British computer scientist Tim Berners-Lee coined the phrase "World Wide Web" combining hypertext with the Internet.
  • In 1995, the first super-computer is built.
  • In 2005 Roger Mougalas from O'Reilly Media coined the term Big Data.
  • In 2006, Hadoop is created.

From

To


The story goes on to the tune of 90 percent of available data today has been created in the last two years!

As SAS (and the computing world) moves to the cloud, the question of, "How do I deal with my data (Big and otherwise), which used to be on-prem, in the cloud?" is at the forefront of many organizations. I ran across a series of relevant articles by my colleague, Nicolas Robert, on the SAS Support Communities on SAS and data access and storage on Google Cloud Storage (GCS). This post organizes the articles so you can quickly get an overview of the various options for SAS to access data in GCS.

Accessing Google Cloud Storage (GCS) with SAS Viya 3.5 – An overview

As the title suggests, this is an overview of the series. Some basic SAS terminology and capabilities are discussed, followed by an overview of GCS data options for SAS. Options include:

  • gsutil - the "indirect" way
  • REST API - the "web" way
  • gcsfuse - the "dark" way
  • BigQuery - the "smart" way.

In the overview Nicolas provides the pros and cons of each offering to help you decide which option works best for your situation. Below is a list of subsequent articles providing technical details, specific steps for usage, and sample code for each option.

Accessing files on Google Cloud Storage (GCS) using REST

The Google Cloud Platform (GCP) provides an API for manipulating objects in Google Cloud Storage. In this article, Nicolas provides step-by-step instructions on using this API to access GCS files from SAS.

Accessing files on Google Cloud Storage (GCS) using SAS Viya 3.5 and Cloud Storage FUSE (gcsfuse)

Cloud Storage FUSE provides a command-line utility, named “gcsfuse”, which helps you mount a GCS bucket to a local directory so the bucket’s contents are visible and accessible locally like any other file. In this article, Nicolas presents rules for CLI usage, options for mounting a GCS bucket to a local directory, and SAS code for accessing the data.

SAS Viya 3.5 and Google Cloud Storage (GCS) Performance Feedback

In this article, Nicolas provides the results of a performance test of GCS integrated with SAS when accessed from cloud instances. New releases of SAS will only help facilitate integration and improve performance.

Accessing files on Google Cloud Storage (GCS) through Google BigQuery

Google BigQuery naturally interacts with Google Cloud Storage using popular big data file formats (Avro, Parquet, ORC) as well as commodity file formats like CSV and JSON. And since SAS can access Google BigQuery, SAS can access those GCS resources under the covers. In the final article, Nicolas debunks the myth that using Google BigQuery as middleware between SAS and GCS is cumbersome, not direct and requires data duplication.

Finally

Being able to access a wide variety of data on the major cloud providers' object storage technologies has become essential if not already mandatory. I encourage you to browse through the various articles, find your specific area of interest, and try out some of the detailed concepts.

* Big Data history compiled from A Short History Of Big Data, by Dr Mark van Rijmenam.

Accessing Google Cloud Storage (GCS) with SAS Viya was published on SAS Users.

11月 132020
 

Through one lens, creating software is a lonely process: People coding away in silent, siloed isolation, exacerbated by the coronavirus pandemic. The SAS view: That’s no way to work if you want quality software. As my colleagues and I mark World Quality Day 2020, one thing is clear: Carefully orchestrated communication creates the kind of culture that fosters quality software.

Want to prioritize quality in your organization? This white paper gives our take on it:

READ NOW | OUR COMMITMENT TO QUALITY

Silo-busting, employee connections key for quality culture

Quality Week is a big deal here. Yearly since 2014, we have set aside one week for employees to share knowledge, practice skills and meet new people. The focus is not just on the technical side of quality, but the human side as well. Quality Week events are designed to break down silos and foster new connections that can lead to innovation.

But how is that done? How can silos of information be broken down?

SAS does it by bringing together disparate groups of people whose paths wouldn't cross on a typical workday. They connect through venues and formats that encourage conversations focused on defining:

  • A quality organization
  • Quality software
  • What quality means to you

How we're baking quality into our culture

Planning and delivering this year’s Quality Week events had special challenges, as all activities had to be  virtual. During the planning phase, we made extra effort to foster conversations and help people forge new network connections within the company. Some examples:

Lightning Talks: 5- to 10-minute talks, followed by participant questions. With Q&A, the total time investment for participants is fewer than 25 minutes. Benefits: highly interactive, minimal time commitment for presenters and attendees and easy to log on and listen.

Innovation Storms: Practical workshops that allow participants to not just learn but do. Leaders go beyond demonstrations to teach their peers how to install and use a tool or solution. Benefit: Small sessions (five to 15 participants) allow for hands-on coaching and extensive interaction.

Virtual Conference: A session following a more traditional conference format. In a one-day, two-track event, expert presenters provide in-depth expositions on their topics, followed by Q&A.

Executive AMA: An online forum with executives from different divisions answering attendees’ questions. Quality starts at the top of an organization. The transparency and willingness to engage in tough conversations reinforces the importance of quality.

Quality culture taps the power of diversity, inclusion

A variety of perspectives is critical. The following events helped us cast the widest-possible net:

#SheHacks: A hackathon for girls 8-18.  Encouraging and developing the next generation of quality coders, SAS teamed up with Girl Geek Academy.

Global events: SAS offices in Pune and Beijing hold Quality Week events in conjunction with the US event. Presentations from all venues are recorded and shared so that cross-pollination occurs not only across different areas of expertise, but also cultures.

Our theme this year, The Many Voices of Quality, celebrates the diversity that makes quality possible. Quality benefits from all forms of diversity. To fulfill that mission, Quality Week topics this year cover not only software development, but also the broader landscape of sales, marketing, communication, compliance, among others.

This year’s Quality Week highlights those voices in sessions on our commitment to supplier diversity, environmental sustainability, and customer satisfaction. Collaboration between Quality Week and SAS’ Black Initiatives Group bring additional perspectives to the table.

The journey towards higher quality belongs to many voices. Quality Week events allow those voices to be highlighted and heard.

Quality software starts with communication was published on SAS Users.

11月 132020
 

Through one lens, creating software is a lonely process: People coding away in silent, siloed isolation, exacerbated by the coronavirus pandemic. The SAS view: That’s no way to work if you want quality software. As my colleagues and I mark World Quality Day 2020, one thing is clear: Carefully orchestrated communication creates the kind of culture that fosters quality software.

Want to prioritize quality in your organization? This white paper gives our take on it:

READ NOW | OUR COMMITMENT TO QUALITY

Silo-busting, employee connections key for quality culture

Quality Week is a big deal here. Yearly since 2014, we have set aside one week for employees to share knowledge, practice skills and meet new people. The focus is not just on the technical side of quality, but the human side as well. Quality Week events are designed to break down silos and foster new connections that can lead to innovation.

But how is that done? How can silos of information be broken down?

SAS does it by bringing together disparate groups of people whose paths wouldn't cross on a typical workday. They connect through venues and formats that encourage conversations focused on defining:

  • A quality organization
  • Quality software
  • What quality means to you

How we're baking quality into our culture

Planning and delivering this year’s Quality Week events had special challenges, as all activities had to be  virtual. During the planning phase, we made extra effort to foster conversations and help people forge new network connections within the company. Some examples:

Lightning Talks: 5- to 10-minute talks, followed by participant questions. With Q&A, the total time investment for participants is fewer than 25 minutes. Benefits: highly interactive, minimal time commitment for presenters and attendees and easy to log on and listen.

Innovation Storms: Practical workshops that allow participants to not just learn but do. Leaders go beyond demonstrations to teach their peers how to install and use a tool or solution. Benefit: Small sessions (five to 15 participants) allow for hands-on coaching and extensive interaction.

Virtual Conference: A session following a more traditional conference format. In a one-day, two-track event, expert presenters provide in-depth expositions on their topics, followed by Q&A.

Executive AMA: An online forum with executives from different divisions answering attendees’ questions. Quality starts at the top of an organization. The transparency and willingness to engage in tough conversations reinforces the importance of quality.

Quality culture taps the power of diversity, inclusion

A variety of perspectives is critical. The following events helped us cast the widest-possible net:

#SheHacks: A hackathon for girls 8-18.  Encouraging and developing the next generation of quality coders, SAS teamed up with Girl Geek Academy.

Global events: SAS offices in Pune and Beijing hold Quality Week events in conjunction with the US event. Presentations from all venues are recorded and shared so that cross-pollination occurs not only across different areas of expertise, but also cultures.

Our theme this year, The Many Voices of Quality, celebrates the diversity that makes quality possible. Quality benefits from all forms of diversity. To fulfill that mission, Quality Week topics this year cover not only software development, but also the broader landscape of sales, marketing, communication, compliance, among others.

This year’s Quality Week highlights those voices in sessions on our commitment to supplier diversity, environmental sustainability, and customer satisfaction. Collaboration between Quality Week and SAS’ Black Initiatives Group bring additional perspectives to the table.

The journey towards higher quality belongs to many voices. Quality Week events allow those voices to be highlighted and heard.

Quality software starts with communication was published on SAS Users.

11月 112020
 

Data visualization has never been more widespread and consumed by a global audience as it has been this year with the Coronavirus pandemic. If you are interested in the statistics behind many of the numbers you see displayed in data visualizations then please reference my colleague’s blog series:

One visualization that is commonly used to display metrics of Coronavirus is a bar line chart where the bars display the actual values and the line is a moving average metric.

The screenshot below shows the number of web visits per week and a 5 week moving average value. I’ve named it Visits (5 Week Moving Avg Bracket) since I am including both past and present data points into this calculation.

I will demonstrate how to easily generate this moving average metric to use in your SAS Visual Analytics reports.

Bar Line With Moving Average

Moving Average

Moving average is essentially a block of data points averaged together, creating a series of averages for the data. This is commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles.

Traditionally, the moving average block takes the current data point and then moves forward in the series, which is often the case in financial applications. However, it is more common in science and engineering to take equal data points before and after the current position, creating what I am calling a moving average bracket.

SAS’ Visual Analytics calculation gives you the flexibility to define how to average the data points for your moving average: how many positions prior and/or after the current data point.

Steps to generate the Moving Average

From the Data pane, right-click on the measure you want to generate the moving average and select New calculation.

New Calculation

Next, in the Create Calculation window:

  • Name: Visits (5 Week Moving Avg Bracket) (a meaningful name of your choice)
  • Type: Moving average
  • Number of cells to average: 5 (or the number of your choice)

Click OK.

New Moving Average

Notice that you did not have the option to specify how many data points to include before and after the current position. To do that, we will right-click on the generated calculation and select Edit.

Edit Moving Average Scope

From the Edit Calculated Item window, make your starting and ending point position selections. The current position is denoted as zero. In my example, I want a 5 point moving average bracket, therefore I want to average the two positions before current, (-2), current, and then the two positions after current (2).

Aggregate Cells Average

Success. We now have generated our 5 point moving average bracket. You may be wondering, where do you specify the time frame? Granted I named this metric Visits (5 Week Moving Average Bracket) but I did not specify the week time metric anywhere. This is because, for this aggregated measure, the time duration is directly dependent on the data items you have assigned to the object.

In my example, for my web visit data, I want to look at the aggregation at the week level, therefore this Aggregate Cell measure will be grouped by week. I wanted to name my measure appropriately so that any user reading the visual knows the time frame that is being averaged.

If you are unsure of how you want to aggregate your time series, or you are allowing your report viewers the ability to change the role assignments through self-service reporting (see blog or YouTube for more information) then it would be best if you name your measure as 5 Point Moving Average Bracket and leave off the aggregation of the time series.

Bar Line Role Assignments

Let’s take a look at what the numbers are behind the scene. I’ve expanded the object to its Maximize mode so that I can see the summary table of the metrics that make up this object.

The data points will average until the full bracket size is met, then it will slide that bracket down the time series. Keep in mind that our bracket size is five points, two previous data points (-2), current data point (0), and two future data points (2).

Let’s look at week 01. Week 01 is our current, or zero, position. We do not have any prior data points so the only points that can be used from the bracket are positions 1 and 2. See the first highlighted yellow Visits (5 Week Moving Avg Bracket) and how it corresponds to the Visits.

Next, week 02 is able to include a -1 position into the bracket. The full moving average bracket isn’t met till week 03 where all of the data points: -2 through 2 are available. Then the moving average bracket slides along the rest of the time series.

Moving Average Bracket Details

Seeing the object’s summary data and the breakdown of how the moving average bracket is calculated for the data should drive home the fact that the time period aggregation is solely driven by the object’s role assignments.

In my second example, I am looking at retail data and this is aggregated to the day level.

By Day Example

I chose to demonstrate a 14 day moving average bracket.

14 Day Moving Average

Bar Line Chart

In both of my examples, I used the Dual axis bar-line chart object. Default Y axis behavior for bar charts is to start at zero, but line charts since they are intended to show trend can start at a value other than zero. Use the Options pane to select your desired line chart baseline. I choose to have both my bar and line charts fixed at zero. Explore the other Options to further customize your visual.

Bar Line Chart

Summary

Calculating the moving average is offered as a SAS Visual Analytics out-of-the-box Derived Item. See the SAS Documentation for more information about Derived Items.

Once you’ve specified the window of data points to average for the moving average you can go back and edit the starting and ending points around the current position. Remember that you will not be specifying a time frame for the aggregation but it will be dynamically determined based on the role assignments to the visual.

Learn more

If you are interested in other uses for the moving average operator Aggregate Cells take a look at this article: VA Report Example: Moving 30 Day Rolling Sum.

Also, check out these resources:

SAS Visual Analytics example: moving average was published on SAS Users.

11月 102020
 

As a SAS consultant I have been an avid user of SAS Enterprise Guide for as long as I can remember. It has been not just my go-to tool, but that of many of the SAS customers I have worked with over the years.

It is easy to use, the interface intuitive, a Swiss Army knife when it comes to data analysis. Whether you’re looking to access SAS data or import good old Excel locally, join data together or perform data analysis, a few clicks and ta-dah, you’re there! Alternatively, if you insist on coding or like me, use a bit of both, the ta-dah point still holds.

SAS Enterprise Guide, or EG as it is commonly known as, is a mature SAS product with many years of R&D, an established user base, a reliable and trusted product. So why move to SAS Studio? Why should I leave the comfort of what works?

For the last nine months I have been working with one of the UK’s largest supermarket answering that exact question as they make that journey from SAS Enterprise Guide to SAS Studio. EG is used widely across several supermarket operations, including:

  • supply chain (to look at wastage and stock availability)
  • marketing analytics (to look at customer behaviour and build successful campaigns)
  • fraud detection (to detect misuse of vouchers).

What is SAS Studio?

Firstly, let's answer the "what is SAS Studio" question. It is the browser-based interface for SAS programmers to run code or use predefined tasks to automatically generate SAS code. Since there is nothing to install on your desktop, you can access it from almost any machine: Windows or Mac. And SAS Studio is brought to you by the same SAS R&D developers who maintain SAS Enterprise Guide.

SAS Studio with Ignite (dark) theme

1. Still does the regular stuff

It allows you to access your data, libraries and existing programs and import a range of data sources including Excel and CSV. You can code or use the tasks to perform analysis. You can build queries to join data, create simple and complex expressions, filter and sort data.

But it does much more than that... So what cool things can you do with SAS Studio?

2. Use the processing power of SAS Viya

SAS Studio (v5.2 onwards) works on SAS Viya. Previously SAS 9 had the compute server aka the workspace server as the processing engine. SAS Viya has CAS, the next generation SAS run time environment which makes use of both memory and disk. It is distributed, fault tolerant, elastic and can work on problems larger than the available RAM. It is all centrally managed, secure, auditable and governed.

3. Cool new functionality

SAS Studio comes with many enhancements and cool new functionality:

  • Custom tasks. You can easily build your own custom tasks (software developer skills not required) so others without SAS coding skills can utilise them. Learn more in this Ask the Expert session.
  • Code snippets. It comes with pre-defined code snippets, commonly used bits of code that you can save and reuse. Additionally, you can create your own which you can share with colleagues. Coders love that these code snippets can be used with keystroke abbreviations.
  • Background submit.  This allows you to run code in the background whilst you continue to work.
  • DATA step debugger. First added into SAS Enterprise Guide, SAS Studio now offers an interactive DATA step debugger as well.
  • Flexible layout for your workspace, You can have multiple tabs open for each program, and open multiple datasets and items.
  • FEDSQL. The query window

    DATA step debugger in SAS Studio

4. Seamlessly access the full suite of SAS Viya capabilities

A key benefit of SAS Studio is the ease of which you can move from writing code to doing some data discovery, visualisation and model building. Previously in the SAS 9 world you may have used EG to access and join your data and then move to SAS Enterprise Miner, a different interface, installed separately to build a model. Those days are long gone.

To illustrate the point, if I wanted to build a campaign to see who would respond to a supermarket voucher, I could access my customer data and join that to my transaction and products data in SAS Studio. I could then move into SAS Visual Analytics to identify the key variables I would need to build an analytical model and even the best model to build. From there I would move to SAS Visual Data Mining and Machine Learning to build the model. I could very easily use the intuitive point-and-click pipeline interface to build several models, incorporating an R or Python to find the best model. This would all be done within one browser-based interface and the data being loaded only once.

This tutorial from Christa Cody illustrates this coding workflow in action.

The Road to SAS Studio

SAS Studio clearly has a huge number of benefits, it does the regular stuff you would expect, but additionally brings a host of cool new functionality and the processing power of SAS Viya, not to mention allowing you to move seamlessly to the next steps of the analytical and decisioning journey including model building, creating visualisations, etc.

Change management + technical enablement = success

Though adoption of modern technology can bring significant benefits to enterprise organisations as this supermarket is seeing, it is not without its challenges. Change is never easy and the transition from EG to Studio will take time. Especially with a mature, well liked and versatile product like EG.

The cultural challenge that new technology provides should not be underestimated and can provide a barrier to adoption. Newer technology requires new approaches, a different way of working across diverse user communities many of whom have well established working practices that may in some cases, resist change. The key is to invest the time with the communities, explain how newer technology can support their activities more efficiently and provide them with broader capability.

Learn more

Visit the Learn and Support center for SAS Studio.

Moving from SAS Enterprise Guide to SAS Studio was published on SAS Users.

11月 042020
 

Removing duplicate charactersIn this blog post we are going to tackle a data cleansing task of removing unwanted repeated characters in SAS character variables.

Character repetition can stem from various stages of data life cycle: from data collection, to data transmission, to data transformation. It can be accidental or intentional by design. It can be sporadic or consistent. In either case, it needs to be addressed by robust data cleansing processes to ensure adequate data quality that is imperative for the data usability.

Character repetition examples

Example 1. Data entry, especially manual data entry, can be a high-risk factor for accidental character duplication. Have you ever pressed a key on your computer keyboard for a bit longer than intended, so it started automatically typing multiple characters???????????????

Keyboard properties adjustment

 Tip: You can adjust your Keyboard Properties to control “Repeat delay” and “Repeat rate” settings (on Windows computer, search for “Keyboard” and click on Keyboard in Control Panel).

Example 2. Recently, I had to deal with the data that contained multiple consecutive double quotation marks all over the character string values. Even though we don’t know the exact cause of it, still for each of these duplicated quotation marks occurrences we needed to replace them with a single quotation mark.

Removing repeated blanks

There is a very useful Removing unwanted characters from text strings by Amadeus Software we developed a prototype using

data D;
   c = ','; *<- character to un-duplicate;
   cc = c||c; *<- double character;
   string = 'Many,,,,,, commas,,,,, in,,, this,, sentence.,'; *<- source string;
   put 'BEFORE:' string=; *<- output initial string;
   do while (find(string,cc)); *<- loop through while there are doubles;
      string = tranwrd(string,cc,c); *<- replace double with a single character;
   end;
   put 'AFTER: ' string=; *<- output unduplicated string;
run;

This code will produce the following in the SAS log:

BEFORE:string=Many,,,,,, commas,,,,, in,,, this,, sentence.,
AFTER: string=Many, commas, in, this, sentence.,

which shows that this approach correctly un-duplicates the source string removing and replacing all repeated characters (commas in our example) with a single one.

User-defined SAS function for removing any repeated characters

Let’s use

libname funclib 'c:\projects\functions';
proc fcmp outlib=funclib.userfuncs.package1;
   function undupc(str $, clist $) $;
      length x $32767 c $1 cc $2;
      x = str; 
      do i=1 to length(clist);
         c = char(clist,i);
         cc = c||c;
         do while (find(trim(x),cc));
            x = tranwrd(x,cc,c);
         end;
      end;
      return (x); 
   endfunc; 
run;

Code highlights

  • We introduce an interim variable x to which we will iteratively apply replacing double characters with a single one.
  • We assign length attribute of this variable to be maximum allowable character length of 32767 bytes to accommodate any character length used in the calling program.
  • Outer do-loop loops through the clist containing characters we want to unduplicate.
  • Variable c is assigned a single character from clist, variable cc is assigned double of the cc value.
  • Inner do-loop iterates through trimmed characters in x while doubles are found; using trim(x) is essential as it not only speeds up processing while searching through a shorter string (without trailing blanks), it also prevents from falling into an infinite loop in case clist contains blank character to unduplicate (cc equals to double blanks which are always going to be found among trailing blanks).

Let’s test our newly minted UNDUPC function on the following data:

data SOURCE;
   infile datalines truncover;
   input str $50.;
   datalines;
"""Repeated "double quotes""""
Repeated,,,,,commas,,,,,,,,,,,
[[[""Mixed""]]   characters,,,
;

Since our user-defined function is permanently stored in the location specified in the

options cmplib=funclib.userfuncs;
data TARGET;
   set SOURCE;
   length new_str $50;
   new_str = undupc(str, ' ,"][');
run;

This code will remove and replace all repeated sequences of characters ' ',',', '"', ']', and '['. The order of these characters listed in the second argument doesn’t matter. Here is what we get:

Duplicate characters removal results
As you can see, we get what we wanted including the functionality of the COMPBL function.

User-defined CALL routine for removing any repeated characters

As much as I love user-defined functions, I have an issue with the above undupc user-defined function implementation. It has to do with how the PROC FCMP handles interim character variables length attribute assignment. It does not implicitly inherit their length attribute from another variable as SAS data step does. For example, if you run the following data step:

data a;
   length x $99;
   y = x;
run;

variable y will have the length attribute $99 implicitly inherited from the x variable.

In PROC CFMP function, you can either assign the length attribute to a character variable explicitly with LENGTH or ATTRIB statement (as we did by using length x $32767 ), or it will be set to $33 if you use any other way of implicit assignment. (I leave it up to you guessing why 33 and why not any other number.) Since we wanted to accommodate SAS character strings of any length, we had to explicitly assign our interim variable x length attribute the maximum valid value of $32767. This will inevitably take tall on the function performance as we will have to process longer strings.

However, we can avoid this issue by using CALL routine instead:

libname funclib 'c:\projects\functions';
proc fcmp outlib=funclib.usercalls.package1;
   subroutine undupc(str $, clist $, x $);
      outargs x;
      length c $1 cc $2;
      x = str;
      do i=1 to length(clist);
         c = char(clist,i);
         cc = c||c;
         do while (find(trim(x),cc));
            x = tranwrd(x,cc,c);
         end;
      end;
   endsub; 
run;

This code is very similar to the user-defined function above with a slight difference. Here, x variable is listed as an argument in the subroutine definition and refers to a SAS data step variable whose length attribute is assigned in the calling data step. Unlike SAS function, SAS subroutine does not return a value; instead, it uses

options cmplib=funclib.usercalls;
data TARGET;
   set SOURCE;
   length new_str $50;
   call undupc(str, ' ,"][', new_str);
run;

And we will get the same results as with the UNDUPC function above.

Store user-defined functions and subroutines separately

You can create and have both, user-defined function and call routine with the same name. However, to avoid confusion (and errors) do not store their definitions in the same data table (outlib= option of the PROC FCMP). If they are stored in the same data table, then when used in a DATA step, SAS will pull the latest definition by its name only and that may not be the entity you want.

Performance benchmarking

To compare performances of the UNDUPC function vs UNDUPC subroutine we created a rather large data table (1 Million observations) with randomly generated strings (1000 characters long):

libname SASDL 'C:\PROJECTS\TESTDATA';
 
data SASDL.TESTDATA (keep=str);
   length str $1000;
   do i=1 to 1000000;
      str = '';
      do j=1 to 1000;
         str = cats(str,byte(int(rank(' ')+38*rand('UNIFORM'))));
      end;
      output;
   end;
run;

Then we ran the following 2 data steps, one using the undupc() function, and the other using undupc() call routine:

options cmplib=funclib.userfuncs;
 
data SASDL.TESTDATA_UNDUPC_FUNC;
   set SASDL.TESTDATA;
   length new_str $1000;
   new_str = undupc(str, '#+');
run;
 
options cmplib=subrlib.usercalls;
 
data SASDL.TESTDATA_UNDUPC_CALL;
   set SASDL.TESTDATA;
   length new_str $1000;
   call undupc(str, '#+', new_str);
run;

A quick SAS log inspection reveals that CALL UNDUPC works as much as 3 times faster than UNDUPC function (10 seconds vs. 30 seconds). The time savings may vary depending on your data composition and computing environment, but in any case, if you process high volumes of data you may consider using CALL routine over function. This is not a blanket statement, as it only pertains to this particular algorithm of eliminating character repetitions where we had to explicitly assign the highest possible length attribute value to the interim variable in the function, but not in the CALL routine.

When we reduced declared length of x from $32767 to $1000 within the user-defined function definition its performance became on par with the CALL routine.

Additional Resources for SAS character strings processing

Your thoughts?

Have you found this blog post useful? Would you vote for implementing UNDUPC as a native built-in SAS function? Please share your thoughts and feedback in the comments section below.

Removing repeated characters in SAS strings was published on SAS Users.