free e-book

3月 192020
 

At SAS Press, we agree with the saying “The best things in life are free.” And one of the best things in life is knowledge. That’s why we offer free e-books to help you learn SAS or improve your skills. In this blog post, we will introduce you to one of our amazing titles that is absolutely free.

SAS Programming for R Users

Many data scientists today need to know multiple programming languages including SAS, R, and Python. If you already know basic statistical concepts and how to program in R but want to learn SAS, then SAS Programming for R Users by Jordan Bakerman was designed specifically for you! This free e-book explains how to write programs in SAS that replicate familiar functions and capabilities in R. This book covers a wide range of topics including the basics of the SAS programming language, how to import data, how to create new variables, random number generation, linear modeling, Interactive Matrix Language (IML), and many other SAS procedures. This book also explains how to write R code directly in the SAS code editor for seamless integration between the two tools.

The book is based on the free, 14-hour course of the same name offered by SAS Education available here. Keep reading to learn more about the differences between SAS and R.

SAS versus R

R is an object-oriented programming language. Results of a function are stored in an object and desired results are pulled from the object as needed. SAS revolves around the data table and uses procedures to create and print output. Results can be saved to a new data table.

Let’s briefly compare SAS and R in a general way. Look at the following table, which outlines some of the major differences between SAS and R.

Here are a few other things about SAS to note:

  • SAS has the flexibility to interact with objects. (However, the book focuses on procedural methods.)
  • SAS does not have a command line. Code must be run in order to return results.

SAS Programs

A SAS program is a sequence of one or more steps. A step is a sequence of SAS statements. There are only two types of steps in SAS: DATA and PROC steps.

  • DATA steps read from an input source and create a SAS data set.
  • PROC steps read and process a SAS data set, often generating an output report. Procedures can be called an umbrella term. They are what carry out the global analysis. Think of a PROC step as a function in R.

Every step has a beginning and ending boundary. SAS steps begin with either of the following statements:

  • a DATA statement
  • a PROC statement

After a DATA or PROC statement, there can be additional SAS statements that contain keywords that request SAS perform an operation or they can give information to the system. Think of them as additional arguments to a procedure. Statements always end with a semicolon!

SAS options are additional arguments and they are specific to SAS statements. Unfortunately, there is no rule to say what is a statement versus what is an option. Understanding the difference comes with a little bit of experience. Options can be used to do the following:

  • generate additional output like results and plots
  • save output to a SAS data table
  • alter the analytical method

SAS detects the end of a step when it encounters one of the following statements:

  • a RUN statement (for most steps)
  • a QUIT statement (for some procedures)

Most SAS steps end with a RUN statement. Think of the RUN statement as the right parentheses of an R function. The following table shows an example of a SAS program that has a DATA step and a PROC step. You can see that both SAS statements end with RUN statements, while the R functions begin and end with parentheses.

If you want to learn more about this book or any other free e-books from SAS Press, visit https://support.sas.com/en/books/free-books.html. Subscribe to our newsletter to get the latest information on new books.

Free e-book: SAS Programming for R Users was published on SAS Users.

3月 052020
 

Have you heard that SAS offers a collection of new, high-performance CAS procedures that are compatible with a multi-threaded approach? The free e-book Exploring SAS® Viya®: Data Mining and Machine Learning is a great resource to learn more about these procedures and the features of SAS® Visual Data Mining and Machine Learning. Download it today and keep reading for an excerpt from this free e-book!

In SAS Studio, you can access tasks that help automate your programming so that you do not have to manually write your code. However, there are three options for manually writing your programs in SAS® Viya®:

  1. SAS Studio provides a SAS programming environment for developing and submitting programs to the server.
  2. Batch submission is also still an option.
  3. Open-source languages such as Python, Lua, and Java can submit code to the CAS server.

In this blog post, you will learn the syntax for two of the new, advanced data mining and machine learning procedures: PROC TEXTMINE and PROCTMSCORE.

Overview

The TEXTMINE and TMSCORE procedures integrate the functionalities from both natural language processing and statistical analysis to provide essential functionalities for text mining. The procedures support essential natural language processing (NLP) features such as tokenizing, stemming, part-of-speech tagging, entity recognition, customized stop list, and so on. They also support dimensionality reduction and topic discovery through Singular Value Decomposition.

In this example, you will learn about some of the essential functionalities of PROC TEXTMINE and PROC TMSCORE by using a text data set containing 1,830 Amazon reviews of electronic gaming systems. The data set is named Amazon. You can find similar data sets of Amazon reviews at http://jmcauley.ucsd.edu/data/amazon/.

PROC TEXTMINE

The Amazon data set has already been loaded into CAS. The review content is stored in the variable ReviewBody, and we generate a unique review ID for each review. In the proc call shown in Program 1 we ask PROC TEXTMINE to do three tasks:

  1. parse the documents in table reviews and generate the term by document matrix
  2. perform dimensionality reduction via Singular Value Decomposition
  3. perform topic discovery based on Singular Value Decomposition results

Program 1: PROC TEXTMINE

data mycaslib.amazon;
    set mylib.amazon;
run;

data mycaslib.engstop;
    set mylib.engstop;
run;

proc textmine data=mycaslib.amazon;
    doc_id id;
    var reviewbody;

 /*(1)*/  parse reducef=2 entities=std stoplist=mycaslib.engstop 
          outterms=mycaslib.terms outparent=mycaslib.parent
          outconfig=mycaslib.config;

 /*(2)*/  svd k=10 svdu=mycaslib.svdu outdocpro=mycaslib.docpro
          outtopics=mycaslib.topics;

run;

(1) The first task (parsing) is specified in the PARSE statement. Parameter “reducef” specifies the minimum number of times a term needs to appear in the text to be included in the analysis. Parameter “stop” specifies a list of terms to be excluded from the analysis, such as “the”, “this”, and “that”. Outparent is the output table that stores the term by document matrix, and Outterms is the output table that stores the information of terms that are included in the term by document matrix. Outconfig is the output table that stores configuration information for future scoring.

(2) Tasks 2 and 3 (dimensionality reduction and topic discovery) are specified in the SVD statement. Parameter K specifies the desired number of dimensions and number of topics. Parameter SVDU is the output table that stores the U matrix from SVD calculations, which is needed in future scoring. Parameter OutDocPro is the output table that stores the new matrix with reduced dimensions. Parameter OutTopics specifies the output table that stores the topics discovered.

Click the Run shortcut button or press F3 to run Program 1. The terms table shown in Output 1 stores the tagging, stemming, and entity recognition results. It also stores the number of times each term appears in the text data.

Output 1: Results from Program 1

PROC TMSCORE

PROC TEXTMINE is used with large training data sets. When you have new documents coming in, you do not need to re-run all the parsing and SVD computations with PROC TEXTMINE. Instead, you can use PROC TMSCORE to score new text data. The scoring procedure parses the new document(s) and projects the text data into the same dimensions using the SVD weights derived from the original training data.

In order to use PROC TMSCORE to generate results consistent with PROC TEXTMINE, you need to provide the following tables generated by PROC TEXTMINE:

  • SVDU table – provides the required information for projection into the same dimensions.
  • Config table – provides parameter values for parsing.
  • Terms table – provides the terms that should be included in the analysis.

Program 2 shows an example of TMSCORE. It uses the same input data layout used for PROC TEXTMINE code, so it will generate the same docpro and parent output tables, as shown in Output 2.

Program 2: PROC TMSCORE

Proc tmscore data=mycaslib.amazon svdu=mycaslib.svdu
        config=mycaslib.config terms=mycaslib.terms
        svddocpro=mycaslib.score_docpro outparent=mycaslib.score_parent;
    var reviewbody;
    doc_id id;
run;

 

Output 2: Results from Program 2

To learn more about advanced data mining and machine learning procedures available in SAS Viya, including PROC FACTMAC, PROC TEXTMINE, and PROC NETWORK, you can download the free e-book, Exploring SAS® Viya®: Data Mining and Machine Learning. Exploring SAS® Viya® is a series of e-books that are based on content from SAS® Viya® Enablement, a free course available from SAS Education. You can follow along with examples in real time by watching the videos.

 

Learn about new data mining and machine learning procedures in SAS Viya was published on SAS Users.