SAS programmers

12月 042019
 

If you’re like me, you struggle to buy gifts. Most folks in my inner circle already have everything they need and most of what they want. Most folks, that is, except the tech-lovers. That’s because there’s always something new on the horizon. There’s always a new gadget or program. Or a new way to learn those things. If you know folks who fall into this category, and who love SAS, boy do I have some gift ideas for you:

Try SAS for free

If you know someone who is interested in SAS but doesn’t work with it on a daily basis, or someone looking to experiment in a different area of SAS, let them know about our free software trials. Who doesn’t love free? And you’ll look like a rock star when you point out that they can explore areas like SAS Data Preparation, SAS Visual Forecasting, SAS Visual Statistics and lots more, for free.

SAS University Edition

Let’s keep things in the vein of free, shall we? SAS University Edition includes SAS Studio, Base SAS, SAS/STAT, SAS/IML, SAS/ACCESS, and several time series forecasting procedures from SAS/ETS. It’s the same software used by sites around the world; that means it’s the most up-to-date statistical and quantitative methods. And it’s free for academic, noncommercial use.

Registration to SAS Global Forum 2020

Oh my gosh, you would be your loved one’s favorite person! I’ve been to many SAS Global Forums and I love seeing the excitement on someone’s face as they explore the booths, sit in on sessions, and network to their heart’s content. SAS Global Forum is the place to be for SAS professionals, thought leaders, decision makers, partners, students, and academics. There truly is something for everyone at this event for analytics enthusiasts. And who wouldn’t want to be in Washington, D.C. in the Spring? It’s cherry blossom time! Be sure to register before Jan. 29, 2020 to get early bird pricing and save $600 off the on-site registration fee.

SAS books

I worked for many years in SAS Press and saw, firsthand, how excited folks get over a new book. I mean, who doesn’t want the 6th edition of the Little SAS Book or the latest from Ron Cody? Browse more than 100 titles and find the perfect gift. Don’t forget, SAS Press is offering 25% off everything in the bookstore for the month of December! Use promo code HOLIDAYS25 at checkout when placing your order. Offer expires at midnight Tuesday, December 31, 2019 (US).

A SAS Learning subscription

Give the gift of unlimited learning with access to our selection of e-learning and video tutorials. You can give a monthly or yearly subscription. Help the special people in your life get a head start on a new career. The gift of education is one of the best gifts you can give.

SAS OnDemand for Academics

For those who don't want to install anything, but run SAS in the cloud, we offer SAS OnDemand for Academics. Users get free access to powerful SAS software for statistical analysis, data mining and forecasting. Point-and-click functionality means there’s no need to program. But if you like to program, you do can do that, too! It’s really the best of both worlds.

SAS blogs

I began this post with something free, and I’ll end it with something free: learning from our SAS blogs. Pick from areas such as customer intelligence, operations research, data science and more. You can also read content from specific regions such as Korea, Latin America, Japan and others. To get you started, here are a few posts from three of our most popular blog authors:

Chris Hemedinger, author of The SAS Dummy:

Rick Wicklin, author of The DO Loop:

Robert Allison, major contributor to Graphically Speaking:

Happy holidays and happy learning. Now go help someone geek out. And I won’t tell if you purchase a few of these things for yourself!

Gifts to give the SAS fan in your life was published on SAS Users.

12月 042019
 

If you’re like me, you struggle to buy gifts. Most folks in my inner circle already have everything they need and most of what they want. Most folks, that is, except the tech-lovers. That’s because there’s always something new on the horizon. There’s always a new gadget or program. Or a new way to learn those things. If you know folks who fall into this category, and who love SAS, boy do I have some gift ideas for you:

Try SAS for free

If you know someone who is interested in SAS but doesn’t work with it on a daily basis, or someone looking to experiment in a different area of SAS, let them know about our free software trials. Who doesn’t love free? And you’ll look like a rock star when you point out that they can explore areas like SAS Data Preparation, SAS Visual Forecasting, SAS Visual Statistics and lots more, for free.

SAS University Edition

Let’s keep things in the vein of free, shall we? SAS University Edition includes SAS Studio, Base SAS, SAS/STAT, SAS/IML, SAS/ACCESS, and several time series forecasting procedures from SAS/ETS. It’s the same software used by sites around the world; that means it’s the most up-to-date statistical and quantitative methods. And it’s free for academic, noncommercial use.

Registration to SAS Global Forum 2020

Oh my gosh, you would be your loved one’s favorite person! I’ve been to many SAS Global Forums and I love seeing the excitement on someone’s face as they explore the booths, sit in on sessions, and network to their heart’s content. SAS Global Forum is the place to be for SAS professionals, thought leaders, decision makers, partners, students, and academics. There truly is something for everyone at this event for analytics enthusiasts. And who wouldn’t want to be in Washington, D.C. in the Spring? It’s cherry blossom time! Be sure to register before Jan. 29, 2020 to get early bird pricing and save $600 off the on-site registration fee.

SAS books

I worked for many years in SAS Press and saw, firsthand, how excited folks get over a new book. I mean, who doesn’t want the 6th edition of the Little SAS Book or the latest from Ron Cody? Browse more than 100 titles and find the perfect gift. Don’t forget, SAS Press is offering 25% off everything in the bookstore for the month of December! Use promo code HOLIDAYS25 at checkout when placing your order. Offer expires at midnight Tuesday, December 31, 2019 (US).

A SAS Learning subscription

Give the gift of unlimited learning with access to our selection of e-learning and video tutorials. You can give a monthly or yearly subscription. Help the special people in your life get a head start on a new career. The gift of education is one of the best gifts you can give.

SAS OnDemand for Academics

For those who don't want to install anything, but run SAS in the cloud, we offer SAS OnDemand for Academics. Users get free access to powerful SAS software for statistical analysis, data mining and forecasting. Point-and-click functionality means there’s no need to program. But if you like to program, you do can do that, too! It’s really the best of both worlds.

SAS blogs

I began this post with something free, and I’ll end it with something free: learning from our SAS blogs. Pick from areas such as customer intelligence, operations research, data science and more. You can also read content from specific regions such as Korea, Latin America, Japan and others. To get you started, here are a few posts from three of our most popular blog authors:

Chris Hemedinger, author of The SAS Dummy:

Rick Wicklin, author of The DO Loop:

Robert Allison, major contributor to Graphically Speaking:

Happy holidays and happy learning. Now go help someone geek out. And I won’t tell if you purchase a few of these things for yourself!

Gifts to give the SAS fan in your life was published on SAS Users.

11月 252019
 

The SAS Global Certification Program started in 1999 and has issued over 150,000 credentials to SAS users. Today, the program offers 23 different credentials across seven categories. According to Medium.com, “SAS Certification gives recognition of competency, shows commitment to the profession, and helps with job advancement.”

The SAS Global Certification Program and SAS Documentation have partnered together to work on SAS® Certified Specialist Prep Guide: Base Programming Using SAS® 9.4 and SAS® Certified Professional Prep Guide: Advanced Programming Using SAS® 9.4. This partnership has improved the prep guides and offers additional assets to support their use.

Changes to the Prep Guides

We have implemented changes to the certification guide based on the changes to the exam. We have taken the opportunity to make numerous general improvements to the certification guides. Both prep guides have been streamlined, shortened (significantly), and include more easy-to-use examples. For example, through reorganization and removing redundant topics, we have shortened SAS® Certified Professional Prep Guide: Advanced Programming Using SAS® 9.4 by more than 400 pages (428 pages to be exact).

Other changes to the prep guides include:

  • streamlining the examples and making them easier to read
  • aligning the quiz questions so that they are more like what the user will see on the exam
  • developing a workbook to provide hands-on practice for the programming portion of the exam

Tip Sheets


In 2018, I attended my first SAS Global Forum and I spoke with professors who use the certification guides in their class. They expressed interest in having a downloadable document with syntax that students can use for quick reference. Based on this feedback, I worked with SAS Training to develop tip sheets that summarize key concepts and syntax covered in the certification guides.

You can download the tip sheets from each guide’s book page. The tip sheets are expected to be incorporated into the extended learning pages for the related classes offered through Training.

We also made changes to how you can download sample data.

Sample Data Changes

In the past, we received feedback that the sample data that users were trying to download and import into SAS was generating errors. This was happening because of the copy and paste functionality that we had before. Due to some changes internal to SAS, we decided to streamline the process of downloading sample data.

We wanted to make the sample data easy to access.

The sample data now is “environment agnostic” and can run on any SAS environment. These changes are particularly helpful to customers who use SAS University Edition, SAS Studio, and SAS On-Demand. Incorporating these changes required modification to virtually all the previous examples in both prep guides.

The sample data now lives on the sas-cert-prep-data repository, which is on the SAS Software page in the GitHub repository. The repository is organized by the name of each prep guide. This enables our users to provide feedback by simply creating an issue on GitHub. The GitHub repository itself contains a lot of additional resources such as other SAS Press books, links to SAS documentation, Base SAS Glossary, and the SAS User YouTube channel with How-To videos.

With all of these changes and more, we hope that you find the new prep guides easier to use on the path to becoming SAS Certified.

Are you a SAS certified professional? was published on SAS Users.

11月 122019
 

The t-test is a very useful test that compares one variable (perhaps blood pressure) between two groups. T-tests are called t-tests because the test results are all based on t-values. T-values are an example of what statisticians call test statistics. A test statistic is a standardized value that is calculated from sample data during a hypothesis test. It is used to determine whether there is a significant difference between the means of two groups. With all inferential statistics, we assume the dependent variable fits a normal distribution. When we assume a normal distribution exists, we can identify the probability of a particular outcome. The procedure that calculates the test statistic compares your data to what is expected under the null hypothesis. There are several SAS Studio tasks that include options to test this assumption. Let's use the t-test task as an example.

You start by selecting:

Tasks and Utilities □ Tasks □ Statistics □ t Tests

On the DATA tab, select the Cars data set in the SASHELP library. Next request a Two-sample test, with Horsepower as the Analysis variable and Cylinders as the Groups variable. Use a filter to include only 4- or 6-cylinder cars. It should look like this:

On the OPTIONS tab, check the box for Tests for normality as shown below.

All the tests for normality for both 4-cylinder and 6-cylinder cars reject the null hypothesis that the data values come from a population that is normally distributed. (See the figure below.)

Should you abandon the t-test results and run a nonparametric test analysis such as a Wilcoxon Rank Sum test that does not require normal distributions?

This is the point where many people make a mistake. You cannot simply look at the results of the tests for normality to decide if a parametric test is valid or not. Here is the reason: When you have large sample sizes (in this data set, there were 136 4-cylinder cars and 190 6-cylinder cars), the tests for normality have more power to reject the null hypothesis and often result in p-values less than .05. When you have small sample sizes, the tests for normality will not be significant unless there are drastic departures from normality. It is with small sample sizes where departures from normality are important.

The bottom line is that the tests for normality often lead you to make the wrong decision. You need to look at the distributions and decide if they are somewhat symmetrical. The central limit theory states that the sampling distribution of means will be normally distributed if the sample size is sufficiently large. "Sufficiently large" is a judgment call. If the distribution is symmetrical, you may perform a t-test with sample sizes as small as 10 or 20.

The figure below shows you the distribution of horsepower for 4- and 6-cylinder cars.

With the large sample sizes in this data set, you should feel comfortable in using a t-test. The results, shown below, are highly significant.

If you are in doubt of your decision to use a parametric test, feel free to check the box for a nonparametric test on the OPTIONS tab. Running a Wilcoxon Rank Sum test (a nonparametric alternative to a t-test), you also find a highly significant difference in horsepower between 4- and 6-cylinder cars. (See the figure below.)

You can read more about assumptions for parametric tests in my new book, A Gentle Introduction to Statistics Using SAS Studio.

For a sneak preview check out the free book excerpt. You can also learn more about SAS Press, check out the up-and-coming titles, and receive exclusive discounts. To do so, make sure to subscribe to the newsletter.

Testing the Assumption of Normality for Parametric Tests was published on SAS Users.

11月 062019
 

I have been programming SAS for a LONG time and have never seen much in the way of programming standards. For example, most SAS programmers indent DATA and PROC statements (I like three spaces). Most programmers do not like to see more than one statement on a line and most agree that there should be blank lines between program boundaries (DATA and PROC steps).

I thought I would share some of my thoughts on programming standards, with the hope that others will chime in with their ideas.

    • I like to indent all the statements in a DO group or DO loop. If there are nested groups, each one gets indented as well.
    • I prefer variable names in proper case.
    • I am not a fan of camel-case. For example, I prefer Weight_Kg to WeightKg. The reason that some programmers like camel-case is that SAS will automatically split a variable name at a capital letter in some headings.
    • I like my TITLE statements in open code, not inside a PROC. To me, that makes sense because TITLE statements are global.
    • There should be no conversion messages (character to numeric or numeric to character) in the SAS log. For example use Num = INPUT(Char_Num,12.); instead of Num = 1*Char_Num;. The latter statement forces an automatic character to numeric conversion and places a message in the log.
    • I always use the statement ODS NOPROCTITLE;. This eliminates the default SAS procedure name at the top of the output.
    • Although fewer and fewer people are reading raw text data, I like my @ signs to all line up in my INPUT statement.
    • I like to use the /* and */ comments to define all macro variables. For example:

Notice that I prefer named parameters in my macros, instead of positional parameters.

If this seems like too much work - SAS Studio has an automatic formatting tool that can help standardize your programs. For example, look at the code below:

Really ugly, right? Here is how you can use the automatic formatting tool in SAS Studio.

When you click this icon, the program now looks like this:

That’s pretty much the way I would write it. By the way, if you don't like how Studio formatted your code, enter a control-z to undo it.

For more tips on writing code and how to get started in SAS Studio – check out my book, Learning SAS by Example: A Programmer’s Guide, Second Edition. You can also download a free book excerpt. To also learn more about SAS Press, check out the up-and-coming titles, and receive exclusive discounts make sure to subscribe to the SAS Books newsletter.

Making your SAS code more readable was published on SAS Users.

10月 292019
 

Thank you to Lora Delwiche and Susan Slaughter for providing the following information:

Six editions is a lot! If you had told us back when we wrote the first edition of The Little SAS Book that someday we would write a sixth, we would have wondered how we could possibly find that much to say. After all, it is supposed to be The Little SAS Book, isn’t it? But the developers at SAS are constantly hard at work inventing new and better ways of analyzing and visualizing data. And some of those ways turn out to be so fundamental that they belong even in a little book about SAS.

Interface independence

One of the biggest changes to SAS software in recent years is the proliferation of interfaces. SAS programmers have more choices than ever before. Previous editions contained some sections specific to the SAS windowing environment (also called Display Manager). We wrote this edition for all SAS programmers whether you use SAS Studio, SAS Enterprise Guide, the SAS windowing environment, or run in batch. That sounds easy, but it wasn’t. There are differences in how SAS behaves with different interfaces, and these differences can be very fundamental. In particular, the system option that sets the rules for names of variables varies depending on how you run SAS. So old sections had to be rewritten, and we added a whole new section showing how to use variable names containing blanks and special characters.

New ways to read and write Microsoft Excel files

Previous editions already covered how to read and write Microsoft Excel files, but SAS developers have created new ways that are even better. This edition contains new sections about the XLSX LIBNAME engine and the ODS EXCEL destination.

More PROC SQL

From the very first edition, The Little SAS Book always covered PROC SQL. But it was in an appendix, and over time we noticed that most people ignore appendices. So for this edition, we removed the appendix and added new sections on using PROC SQL to:

• Subset your data
• Join data sets
• Add summary statistics to a data set
• Create macro variables with the INTO clause

For people who are new to SQL, these sections provide a good introduction; for people who already know SQL, they provide a model of how to leverage SQL in your SAS programs.

Updates and additions throughout the book

Almost every section in this edition has been changed in some way. We added new options, made sure everything is up-to-date, and ran every example in every SAS interface noting any differences. For example, PROC SGPLOT has some new options, the default ODS style for PDF has changed, and the LISTING destination behaves differently in different interfaces. Here’s a short list, in no particular order, of new or expanded topics in the sixth edition:

• More examples with permanent SAS data sets, CSV files, or tab-delimited files
• More log notes throughout the book showing what to look for
• LIKE or sounds-like (=*) operators in WHERE statements
• CROSSLIST, NOCUM, and NOPRINT options in PROC FREQ
• Grouping data with a user-defined format and the PUT function
• Iterative DO groups
• DO WHILE and DO UNTIL statements
• %DO statements

Even though we have added a lot to this edition, it is still a little book. In fact, this edition is shorter than the last—by 12 pages! We think this is the best edition yet. For a sneak preview check out the free book excerpt. You can also learn more about SAS Press, check out the up-and-coming titles, and to exclusive discounts -- make sure to subscribe to the newsletter.

The Little SAS Book 6.0: The best-selling SAS book gets even better was published on SAS Users.

9月 282019
 

This article continues a series that began with Machine learning with SASPy: Exploring and preparing your data (part 1). Part 1 showed you how to explore data using SASPy with Python. Here, in part 2, you will learn how to begin to prepare your data to use it within a machine-learning model.

Review part 1 if needed and ensure you still have the ADULT data set ready to use. (The data set is available from the UCI Machine Learning Repository.) If not, take some time to download and explore the data again, as described in part 1.

Preparing your data

Preparing data is a necessary step to perform before applying the data toward a model. There are string values, skewed data, and missing data points to consider. In the data set, be sure to clear missing values, so you can jump into other methods.

For this exercise, you will explore how to transform skewed features using SASPy and Pandas.

First, you must separate the income data from the data set, because the income feature will later become your target variable to model.

Drop the income data and turn the pandas data frame back into a SAS data object, with the following code:

Now, let's take a second look at the numerical features. You will use SASPy to create a histogram of all numerical features. Typically, the Matplotlib library is used, but SASPy provides great opportunities to visualize the data.

The following graphs represent the expected output.

Taking a look at the numerical features, two values stick out. CAPITAL_GAIN and CAPITAL_LOSS are highly skewed. Highly skewed features can affect your model, as most models try to maintain a normally distributed curve. To fix this, you will apply a logarithmic transformation using pandas and then visualize the change using SASPy.

Transforming skewed features

First, you need to change the SAS data object back into a pandas data frame and assign the skewed features to a list variable:

Then, use pandas to apply the logarithmic transformation and convert the pandas data frame back into a SAS data object:

Display transformed data

Now, you are ready to visualize these changes using SASPy. In the previous section, you used histograms to display the data. To display this transformation, you will use the SASPy SASUTIL class. Specifically, you will use a procedure typically used in SAS, the UNIVARIATE procedure.

To use the SASUTIL class with SASPy, you first need to create a Python object that uses the SASUTIL class:

 

Now, use the univariate function from SASPy:

 

Using the UNIVARIATE procedure, you can set axis limits to the output histograms so that you can see the data in a clearer format. After running the selected code, you can use the dir() function to verify successful submission:

 

Here is the output:

 

 

 

The function calculates various descriptive statistics and plots. However, for this example, the focus is on the histogram.

 

Here are the results:

Wrapping up

You have now transformed the skewed data. Pandas applied the logarithmic transformation and SASPy displayed the histograms.

Up next

In the next and final article of this series, you will continue preparing your data by normalizing numerical features and one-hot encoding categorical features.

Machine learning with SASPy: Exploring and preparing your data (part 2) was published on SAS Users.

9月 262019
 

In part 1 of this post, we looked at setting up Spark jobs from Cloud Analytics Services (CAS) to load and save data to and from Hadoop. Now we are moving on to the next step in the analytic cycle, scoring data in Hadoop and executing SAS code as a Spark job. The Spark scoring jobs execute using SAS In-Database Technologies for Hadoop.

The integration of the SAS Embedded Process and Hadoop allows scoring code to run directly on Hadoop. As a result, publishing and scoring of both DS2 and Data step models occur inside Hadoop. Furthermore, access to Spark data exists through the SAS Workspace Server or the SAS Compute Server using SAS/ACCESS to Hadoop, or from the CAS server using SAS Data Connectors.

Scoring Data from CAS using Spark

SAS PROC SCOREACCEL provides an interface to the CAS server for DS2 and DATA step model publishing and scoring. Model code is published from CAS to Spark and then executed via the SAS Embedded Process.

PROC SCOREACCEL supports a file interface for passing the model components (model program, format XML, and analytic stores). The procedure reads the specified files and passes their contents on to the model-publishing CAS action. In this case, the files must be visible from the SAS client.

CAS publishModel and runModel actions publish and execute score data in Spark:

 
%let CLUSTER="/opt/sas/viya/config/data/hadoop/lib:
  /opt/sas/viya/config/data/hadoop/lib/spark:/opt/sas/viya/config/data/hadoop/conf";
proc scoreaccel sessref=mysess1;
publishmodel
   target=hadoop
   modelname="simple01"
   modeltype=DS2
/* 
   filelocation=local */
programfile="/demo/code/simple.ds2"
username="cas"
modeldir="/user/cas"
classpath=&CLUSTER.
; runmodel 
    target=hadoop
    modelname="simple01"
    username="cas"
    modeldir="/user/cas"
    server=hadoop.server.com'
    intable="simple01_scoredata"
    outtable="simple01_outdata"
    forceoverwrite=yes
    classpath=&CLUSTER.
    platform=SPARK
; 
quit;

In the PROC SCOREACCEL example above, a DS2 model is published to Hadoop and executed with the Spark processing engine. The CLASSPATH statement specifies a link to the Hadoop cluster. The input and output tables, simple01_scoredata and simple01_outdata, already exist on the Hadoop cluster.

Score data in Spark from CAS using SAS Scoring Accelerator

SAS Scoring Accelerator execution status in YARN as a Spark job from CAS

As you can see in the image above, the model is scored in Spark using the SAS Scoring Accelerator. The Spark job name reflects the input and output tables.

Scoring Data from MVA SAS using Spark

Steps to run a scoring model in Hadoop:

  1. Create a traditional scoring model using SAS Enterprise Miner or an analytic store scoring model, generated using SAS Factory Miner HPFOREST or HPSVM components.
  2. Specify the Hadoop connection attributes: %let indconn= user=myuserid;
  3. Use the INDCONN macro variable to provide credentials to connect to the Hadoop HDFS and Spark. Assign the INDCONN macro before running the %INDHD_PUBLISH_MODEL and the %INDHD_RUN_MODEL macros.
  4. Run the %INDHD_PUBLISH_MODEL macro.
  5. With traditional model scoring, the %INDHD_PUBLISH_MODEL performs multiple tasks using some of the files created by the SAS Enterprise Miner Score Code Export node. Using the scoring model program (score.sas file), the properties file (score.xml file), and (if the training data includes SAS user-defined formats) a format catalog, this model performs the following tasks:

    • translates the scoring model into the sasscore_modelname.ds2 file, runs scoring inside the SAS Embedded Process
    • takes the format catalog, if available, and produces the sasscore_modelname.xml file. This file contains user-defined formats for the published scoring model.
    • uses SAS/ACCESS Interface to Hadoop to copy the sasscore_modelname.ds2 and sasscore_modelname.xml scoring files to HDFS
  6. Run the %INDHD_RUN_MODEL macro.

The %INDHD_RUN_MODEL macro initiates a Spark job that uses the files generated by the %INDHD_PUBLISH_MODEL to execute the DS2 program. The Spark job stores the DS2 program output in the HDFS location specified by either the OUTPUTDATADIR= argument or by the element in the HDMD file.

Here is an example:

 
option set=SAS_HADOOP_CONFIG_PATH="/opt/sas9.4/Config/Lev1/HadoopServer/conf";
option set=SAS_HADOOP_JAR_PATH="/opt/sas9.4/Config/Lev1/HadoopServer/lib:/opt/sas9.4/Config/Lev1/HadoopServer/lib/spark";
 
%let scorename=m6sccode;
%let scoredir=/opt/code/score;
option sastrace=',,,d' sastraceloc=saslog;
option set=HADOOPPLATFORM=SPARK;
 
%let indconn = %str(USER=hive HIVE_SERVER=’hadoop.server.com');
%put &indconn;
%INDHD_PUBLISH_MODEL( dir=&scoredir., 
        datastep=&scorename..sas,
        xml=&scorename..xml,
        modeldir=/sasmodels,
        modelname=m6score,
        action=replace);
 
%INDHD_RUN_MODEL(inputtable=sampledata, 
outputtable=sampledata9score, 
scorepgm=/sasmodels/m6score/m6score.ds2, 
trace=yes, 
    platform=spark);
Score data in Spark from MVA SAS using Spark

SAS Scoring Accelerator execution status in YARN as a Spark job from MVA SAS

To execute the job in Spark, either set the HADOOPPLATFORM= option to SPARK or set PLATFORM= to SPARK inside the INDHD_RUN_MODEL macro. The SAS Scoring Accelerator uses SAS Embedded Process to execute the Spark job with the job name containing the input table and output table.

Executing user-written DS2 code using Spark

User-written DS2 programs can be complex. When running inside a database, a code accelerator execution plan might require multiple phases. Because of Scala program generation that integrates with the SAS Embedded Process program interface to Spark, the many phases of a Code Accelerator job reduces to one single Spark job.

In-Database Code Accelerator

The SAS In-Database Code Accelerator on Spark is a combination of generated Scala programs, Spark SQL statements, HDFS files access, and DS2 programs. The SAS In-Database Code Accelerator for Hadoop enables the publishing of user-written DS2 thread or data programs to Spark, executes in parallel, and exploits Spark’s massively parallel processing. Examples of DS2 thread programs include large transpositions, computationally complex programs, scoring models, and BY-group processing.

Below is a table of required DS2 options to execute as a Spark job.

DS2ACCEL Set to YES
HADOOPPLATFORM Set to SPARK

 

There are six different ways to run the code accelerator inside Spark, called CASES. The generation of the Scala program by the SAS Embedded Process Client Interface depends on how the DS2 program is written. In the following example, we are looking at Case 2, which is a thread and a data program, neither of them with a BY statement:

 
proc ds2 ds2accel=yes;
thread work.workthread / overwrite=yes; 
        method run();
          set hdplib.cars;
          output;
end; endthread; run; 
  data hdplib.carsout (overwrite=yes); dcl thread work.workthread m;
  dcl double count;
  keep count make model;
method run(); set from m; count+1; output; 
end; enddata; run; quit;

The entire DS2 program runs in two phases. The DS2 thread program runs during Phase One, and its tasks execute in parallel. The DS2 data program runs during Phase Two using a single task.

Finally

With SAS Scoring Accelerator and Spark integration, users have the power and flexibility to process and score modeled data in Spark. SAS Code Accelerator and Spark integration takes the flexibility further to process any Spark data using DS2 code. Furthermore it is now possible for business to respond to use cases immediately and with higher reliability in the big data space.

Data and Analytics Innovation using SAS & Spark - part 2 was published on SAS Users.

7月 102019
 

Posted on behalf of SAS Press author Derek Morgan.


I was sitting in a model railroad club meeting when one of our more enthusiastic young members said, "Wouldn't it be cool if we could make a computer simulation, with trains going between stations and all. We could have cars and engines assigned to each train and timetables and…"

So, I thought to myself, “Timetables… I bet SAS can do that easily… sounds like something fun for Mr. Dates and Times."

As it turns out, the only easy part of creating a timetable is calculating the time. SAS handles the concept of elapsed time smoothly. It’s still addition and subtraction, which is the basis of how dates and times work in SAS. If a train starts at 6:00 PM (64,800 seconds of the day) and arrives at its destination 12 hours (43,200 seconds) later, it arrives at 6:00 AM the next day. The math is start time+duration=end time (108,000 seconds,) which is 6:00 AM the next day. It doesn’t matter which day, that train is always scheduled to arrive at 6:00 AM, 12 hours from when it left.

It got a lot more complicated when the group grabbed onto the idea. One of the things they wanted to do was to add or delete trains and adjust the timing so multiple trains don’t run over the same track at the same time. This wouldn’t be that difficult in SAS; just create an interactive application, but… I’m the only one who has SAS. So how do I communicate my SAS database with the “outside world”? The answer was Microsoft Excel, and this is where it gets thorny.

It’s easy enough to send SAS data to Excel using ODS EXCEL and PROC REPORT, but how could I get Excel to allow the club to manipulate the data I sent?
I used the COMPUTE block in PROC REPORT to display a formula for every visible train column. I duplicated the original columns (with corresponding names to keep it all straight) and hid them in the same spreadsheet. The EXCEL formula code is in line 8.

Compute Block Code:

I also added three rows to the dataset at the top. The first contains the original start time for each train, the second contains an offset, which is always zero in the beginning, while the third row was blank (and formatted with a black background) to separate it from the actual schedule.


Figure 1: Schedule Adjustment File

The users can change the offset to change the starting time of a train (Column C, Figure 2.) The formula in the visible columns adds the offset to the value in each cell of the corresponding hidden column (as long as it isn’t blank.) You can’t simply add the offset to the value of the visible cell, because that would be a circular reference.

The next problem was moving a train to an earlier starting time, because Excel has no concept of negative time (or date—a date prior to the Excel reference date of January 1, 1900 will be a character value in Excel and cause your entire column to be imported into SAS as character data.) Similarly, you can’t enter -1:00 as an offset to move the starting time of our 5:35 AM train to 4:35 AM. Excel will translate “-1:00” as a character value and that will cause a calculation error in Excel. In order to move that train to 4:35 AM, you have to add 23 hours to the original starting time (Column D, Figure 2.)


Figure 2: Adjusting Train Schedules

After the users adjust the schedules, it’s time to return our Excel data to SAS, which creates more challenges. In the screenshot above, T534_LOC is the identifier of a specific train, and the timetable is kept in SAS time values. Unfortunately, PROC IMPORT using DBMS=XLSX brings the train columns into SAS as character data. T534_LOC also imports as the actual Excel value, time as a fraction of a day.


Figure 3: How the Schedule Adjustment File Imports to SAS

While I can fix that by converting the character data to numeric and multiplying by 86,400, I still need the original column name of T534_LOC for the simulation, so I would have to rename each character column and output the converted data to the original column name. There are currently 146 trains spread across 12 files, and that is a lot of work for something that was supposed to be easy! Needless to say, this “little” side project, like most model railroads, is still in progress. However, this exercise in moving time data between Microsoft Excel and SAS gave me even more appreciation for the way SAS handles date and time data.

Figure 4 is a partial sample of the finished timetable file, generated as an RTF file using SAS. The data for trains 534 and 536 are from the spreadsheet in Figure 1.


Figure 4: Partial Sample Timetable

Want to learn more about how to use and manipulate dates, times, and datetimes in SAS? You'll find the answers to these questions and much more in my book The Essential Guide to SAS Dates and Times, Second Edition. Updated for SAS 9.4, with additional functions, formats, and capabilities, the Second Edition has a new chapter dedicated to the ISO 8601 standard and the formats and functions that are new to SAS, including how SAS works with Universal Coordinated Time (UTC). Chapter 1 is available as a free preview here.

For updates on new SAS Press books and great discounts subscribe to the SAS Press New Book Newsletter.

SAS Press Author Derek Morgan on Timetables and Model Trains was published on SAS Users.

7月 032019
 

One of my favorite parts of summer is a relaxing weekend by the pool. Summer is the time I get to finally catch up on my reading list, which has been building over the year. So, if expanding your knowledge is a goal of yours this summer, SAS Press has a shelf full of new titles for you to explore. To help navigate your selection we asked some of our authors what SAS books were on their reading lists for this summer?

Teresa Jade


Teresa Jade, co-author of SAS® Text Analytics for Business Applications: Concept Rules for Information Extraction Models, has already started The DS2 Procedure: SAS Programming Methods at Work by Peter Eberhardt. Teresa reports that the book “is a concise, well-written book with good examples. If you know a little bit about the SAS DATA step, then you can leverage what you know to more quickly get up to speed with DS2 and understand the differences and benefits.”
 
 
 

Derek Morgan

Derek Morgan, author of The Essential Guide to SAS® Dates and Times, Second Edition, tells us his go-to books this summer are Art Carpenter’s Complete Guide to the SAS® REPORT Procedure and Kirk Lafler's PROC SQL: Beyond the Basics Using SAS®, Third Edition. He also notes that he “learned how to use hash objects from Don Henderson’s Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study.”
 

Chris Holland

Chris Holland co-author of Implementing CDISC Using SAS®: An End-to-End Guide, Revised Second Edition, recommends Richard Zink’s JMP and SAS book, Risk-Based Monitoring and Fraud Detection in Clinical Trials Using JMP® and SAS®, which describes how to improve efficiency while reducing costs in trials with centralized monitoring techniques.
 
 
 
 
 

And our recommendations this summer?

Download our two new free e-books which illustrate the features and capabilities of SAS® Viya®, and SAS® Visual Analytics on SAS® Viya®.

Want to be notified when new books become available? Sign up to receive information about new books delivered right to your inbox.

Summer reading – Book recommendations from SAS Press authors was published on SAS Users.